Research Project

Reference Panel Guided 3D Genome Data Analysis

Abstract

The widespread usage of Hi-C has revealed the hierarchical structures of the genome, thereby deepening our understanding of the organization and function of 3D genomes. However, analyzing Hi-C data remains a challenging task, mainly due to the sequencing coverage of data produced in most Hi-C experiments is insufficient.
In this project, we proposed a reference panel enabled framework to tackle the data insufficiency issue in Hi-C data analysis. This pioneering approach represents the first instance of harnessing the vast amount of existing Hi-C datasets while analyzing a given study Hi-C dataset. Within this framework, we developed three applications to enhance a Hi-C contact map, annotate chromatin loops, and identify nested topologically associating domains (TADs) from insufficiently sequenced Hi-C data. Algorithms developed in this thesis leverage ideas from attention mechanisms, representation learning, dynamic programming, and non-parametric statistics. The introduction of a panel of reference Hi-C samples significantly improved prediction accuracy across three diverse Hi-C data analysis tasks under a wide spectrum of benchmarking scenarios. Applying our tools to Hi-C data from various cells deepened our understanding of the formation of TADs and chromatin loops, unraveling key insights into these essential genomic features.

Project members

Yanlin ZHANG

Assistant Professor

Publications

1. Reference panel guided topological structure annotation of Hi-C data. Yanlin Zhang, and Mathieu Blanchette.

2. Reference panel-guided super-resolution inference of Hi-C data. Yanlin Zhang, and Mathieu Blanchette.

Project Period

2022.12.1-2023.6.30

Research Area

Computational biology

Keywords

3D genome, attention mechanism, computational biology, deep learning