Active STANDARD GRANT National Science Foundation (US)

SHF: Small: Expediting the Execution of Machine Learning Applications on Multi-GPU Infrastructure with Architecture Awareness and Runtime Support

$6M USD

Funder	National Science Foundation (US)
Recipient Organization	University of Pittsburgh
Country	United States
Start Date	Jun 15, 2022
End Date	May 31, 2026
Duration	1,446 days
Number of Grantees	2
Roles	Principal Investigator; Co-Principal Investigator
Data Source	National Science Foundation (US)
Grant ID	`2154973`

Grant Description

Deep Neural Networks (DNNs) have become one of the most popular machine-learning techniques for solving real-world problems in object classification, autonomous vehicles, natural language processing, etc. Due to the ever-growing problem size and complexity, the training and inference of DNN models are increasingly time-consuming and require enormous computing resources.

As such, multi-GPU infrastructure is a desirable platform that has been widely used in modern DNN tasks. However, the delivered DNN execution scalability is severely limited due to architectural unawareness and lacking easy-to-use runtime support. This research uncovers and addresses the architectural bottlenecks of DNN executions.

The outcome of this research is expected to achieve scalable DNN executions on multi-GPU infrastructure. The educational and outreach components of this project include (i) new course projects on multi-GPU infrastructure integrated into graduate-level computer architecture courses; (ii) engaging undergraduate students in the research activities through senior Capstone project courses and an outreach program at PI’s institute; and (iii) increasing the participation and visibility of female and minority students in computer architecture, computer science, and engineering.

This research is set to uncover and address the architectural bottlenecks of DNN executions on multi-GPUs. Specifically: 1) It identifies address translation as an essential bottleneck in multi-GPU performance. It redesigns the Translation Lookaside Buffer (TLB) hierarchy and the page table walk for both single-tenant and multi-tenant DNN executions on multi-GPU infrastructure. 2) It investigates the data-movement overheads in data parallelism and model parallelism of modern DNN applications.

It proposes architecture-aware data distillation and neuron-based model partitioning to mitigate the data movement overheads. 3) It proposes a runtime framework that fosters the usage of multi-GPUs through enhanced programmability, which allows dynamic and automatic virtual kernel to physical kernel generation during execution.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

All Grantees

University of Pittsburgh

Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant

SHF: Small: Expediting the Execution of Machine Learning Applications on Multi-GPU Infrastructure with Architecture Awareness and Runtime Support

Grant Description

All Grantees

Interested in applying for this grant?

Quick Summary

Related Grants