Statistical Methods for High-Resolution Multiscale Analysis in DNA Interactions

Project: Research project

Project Details


A fundamental mystery in genome biology is how the three billion base pairs of a mammalian DNA sequence (approximately 2 meters long) are folded, looped, and coiled to fit into a cell nucleus that is roughly 5-10 microns in diameter. Rapid progress has been made over the last few years in advancing our understanding of how the genome folds in three dimensions (3D), primarily driven by advances in sequencing technologies. The goal of this project is to develop mathematical models that will provide a deeper understanding of how 3D genome structure is connected to gene expression in healthy development, and how these folding patterns go awry during the onset and progression of disease.

This project aims to shed new light into the organizing principles governing genome folding through chromatin conformation capture experiments. To date, no clear best practice computational methods exist for the comparison of genome organization across cell types or biological perturbations. The aim of this project is to develop mathematical models and computational methods to gain new insight into how the genetic material folds in different cellular states, and to sensitively detect how these folding patterns are dynamically altered by biological perturbations such as drugs, growth factors, and genome editing. This project focuses on developing methods to sensitively detect dynamic changes in two broad categories of 3D chromatin features: (1) sub-megabase topologically associating domains exhibiting a block structure, and (2) precise long-range interactions between two distant genomic loci, leading to looping out of the intervening genomic DNA. Both parametric and non-parametric normalization approaches for elucidating these features will be explored and benchmarked. Models for these features will be developed, leading to scan statistics for identifying them in normalized 3D contact maps. Methods for false discovery rate control for these scan statistics will be developed based on analysis of heterogeneous Poisson fields.

Effective start/end date5/1/164/30/21


  • National Science Foundation: $1,395,955.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.