科研项目

Reference Panel Guided 3D Genome Data Analysis

摘要

The widespread usage of Hi-C has revealed the hierarchical structures of the genome, thereby deepening our understanding of the organization and function of 3D genomes. However, analyzing Hi-C data remains a challenging task, mainly due to the sequencing coverage of data produced in most Hi-C experiments is insufficient.
In this project, we proposed a reference panel enabled framework to tackle the data insufficiency issue in Hi-C data analysis. This pioneering approach represents the first instance of harnessing the vast amount of existing Hi-C datasets while analyzing a given study Hi-C dataset. Within this framework, we developed three applications to enhance a Hi-C contact map, annotate chromatin loops, and identify nested topologically associating domains (TADs) from insufficiently sequenced Hi-C data. Algorithms developed in this thesis leverage ideas from attention mechanisms, representation learning, dynamic programming, and non-parametric statistics. The introduction of a panel of reference Hi-C samples significantly improved prediction accuracy across three diverse Hi-C data analysis tasks under a wide spectrum of benchmarking scenarios. Applying our tools to Hi-C data from various cells deepened our understanding of the formation of TADs and chromatin loops, unraveling key insights into these essential genomic features.

项目成员

张延林

助理教授

出版文章

1. Reference panel guided topological structure annotation of Hi-C data. Yanlin Zhang, and Mathieu Blanchette.

2. Reference panel-guided super-resolution inference of Hi-C data. Yanlin Zhang, and Mathieu Blanchette.

项目周期

2022.12.1-2023.6.30

研究领域

Computational biology

关键词

3D genome, attention mechanism, computational biology, deep learning