Loading…

Loading grant details…

Active NON-SBIR/STTR RPGS NIH (US)

Fast, powerful, scalable, usable, and distributable methods for multi-modal single cell analyses

$7.02M USD

Funder NATIONAL HUMAN GENOME RESEARCH INSTITUTE
Recipient Organization Stanford University
Country United States
Start Date Feb 01, 2024
End Date Jan 31, 2028
Duration 1,460 days
Number of Grantees 1
Roles Principal Investigator
Data Source NIH (US)
Grant ID 10777236
Grant Description

SUMMARY While single-cell methods for analyzing gene expression are becoming a standard tool for unpacking cellular heterogeneity and understanding complex tissues in health and disease, other molecular features, especially open chromatin landscapes via ATAC-seq, but also surface protein abundance and the presence of CRISPR

guides, are rapidly expanding in their application. Indeed, commercial platforms for generating diverse single- cell data sets have led to an immense increase in scale of these data, and methods for split-and-pool based assays and decreasing sequencing cost all presage an exponentially increasing corpus of future large-scale

datasets. We developed ArchR, an analysis infrastructure specifically designed for analysis of large-scale single- cell (sc) ATAC-seq data sets that enables a diverse suite of complex analysis (including QC, doublet removal, iterative TF-IDF clustering, approximation methods for large-scale data sets, trajectory analysis, RNA-seq

integration, track visualization, marker peak identification, etc.), all with minimal computing hardware requirements. We estimate that ArchR has thousands of active users and is rapidly becoming the “go to” analysis software for large scATAC-seq data sets. To further extend the utility of ArchR for analyzing multi-omic data sets,

we will first engineer substantial improvements to computational efficiency of underlying single-cell computational infrastructure. To do this, we will (1) encode our fundamental matrix operations in C++ to enable streaming data matrix access, thus reducing memory requirements and effectively “lifting the cap” on the number

of cells capable of being analyzed through rapid on-the-fly calculations of diverse operations and (2) implement and benchmark efficient on-disk storage using bitpacking algorithms. These data structures and atomic operation libraries will be shared with the genomics community (and are being integrated into the popular Seurat package),

allowing repurposing of these performance improvements. Second, we will develop, implement, and benchmark powerful analytical tools for the analysis of large, diverse, and/or multi-omic datasets. We will enable the handling of diverse independent and simultaneously acquired (multi-omic) data types including RNA-seq, ATAC-seq, ADT

(CITE-seq), and CRISPR-based perturbation methods. We will develop accurate methods for cross-manifold data linkage for distinct data sets, forced-projection and regression analysis, multi-modality cell clustering, joint analysis of single-cell molecular data sets with CRISPR-based perturbations, single-cell inference of enhancer

function via correlation and the “ABC” model, and identification of continuous differentiation trajectories and

chromatin “potential.” Finally, we will develop plug-and-play cell type specific deep learning models for prediction of the regulatory effects of noncoding sequence changes. These models will learn single-cell chromatin accessibility profiles from DNA sequence to predict the cell type-specific effects of noncoding sequence changes.

We will create a user-friendly system for training, deployment, and sharing sequence-based models of cell type- specific chromatin accessibility, bringing cutting-edge machine learning for functional genomics to wide use.

All Grantees

Stanford University

Advertisement
Discover thousands of grant opportunities
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant