Loading…

Loading grant details…

Completed STANDARD GRANT National Science Foundation (US)

Framework: Awkward Arrays - Accelerating scientific data analysis on irregularly shaped data

$10.67M USD

Funder National Science Foundation (US)
Recipient Organization Princeton University
Country United States
Start Date Sep 01, 2021
End Date Aug 31, 2025
Duration 1,460 days
Number of Grantees 3
Roles Principal Investigator; Former Principal Investigator; Former Co-Principal Investigator
Data Source National Science Foundation (US)
Grant ID 2103945
Grant Description

Nowadays, scientists need to be programmers as well as experts in their fields. Any software that simplifies the computational gymnastics of analyzing data is welcomed, as it allows scientists to focus more of their attention on what they want to compute, rather than how, but there is usually a tension between ease of use and computational speed. Faster computation means more data analyzed, but error-free code matters most.

Awkward Array is a software library that performs calculations on data that do not fit into neat rows. For instance, a fleet of undersea probes may each measure ocean temperatures at a different number of depths, but data analysis tools like Excel, Pandas, and SQL require measurements to be arranged in rectangular tables with the same number of columns in each row.

The problem is more acute when variable numbers of measurements are nested within variable numbers of entities. Traditionally, scientists have either used slow scripting languages or unforgiving "bare metal" languages to perform calculations on these non-tabular datasets. Awkward Array generalizes array concepts so that the easy and fast expressions that once applied only to rectangular tables now apply to irregularly shaped data, allowing bare metal speed in a high-level scripting language.

This project broadens the applicability of Awkward Array beyond the problem domain it was originally intended for, particle physics, to a wide variety of scientific fields, including oceanography, astronomy, genetics, chemistry, and health care. It also integrates the Awkward Array library with popular data science and machine learning tools and extends the implementation to GPUs for extremely fast processing of the same array idioms.

Scientists use NumPy-based tools, such as Pandas, to analyze large and regularly shaped data efficiently. Packing tables of numbers into contiguous arrays allows operations to be precompiled and fast; and expressing operations as concise commands with implicit loops makes them easier to read and quicker to type during data exploration. However, scientific data often has a complexity that does not fit well into tabular format.

This forces scientists to write programs with explicit loops, which are slow in Python. But what about analysis of large and irregularly shaped data? Awkward Array is a generalization of NumPy; it is a Python library defining arrays of objects with arbitrary types and a suite of generic operations on them.

Like NumPy arrays, Awkward Arrays are packed into contiguous buffers for efficiency, but unlike NumPy arrays, they can include variable-length lists, nested fields, missing values, and mixed types. Awkward Arrays use 10 times less memory than equivalent Python objects and are up to 100 times faster in computations. This project generalizes Awkward Array as a foundational library for science.

It is a cross-disciplinary effort, extending Awkward Array to efficiently solve challenges faced by scientists in a variety of fields and data science in industry. The project members work with scientific collaborators to solve specific problems, adding features to Awkward Array if necessary, and industry collaborators at Anaconda and Nvidia update Python’s standard tools to recognize these new array types for CPU and GPU workloads.

It is also an educational project, teaching new approaches to complex problems using array idioms, both to practicing scientists and to students.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

Princeton University

Advertisement
Discover thousands of grant opportunities
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant