Measuring the reproducibility and quality of Hi-C data

Abstract Background Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate me...

Full description

Bibliographic Details
Main Authors: Galip Gürkan Yardımcı, Hakan Ozadam, Michael E. G. Sauria, Oana Ursu, Koon-Kiu Yan, Tao Yang, Abhijit Chakraborty, Arya Kaul, Bryan R. Lajoie, Fan Song, Ye Zhan, Ferhat Ay, Mark Gerstein, Anshul Kundaje, Qunhua Li, James Taylor, Feng Yue, Job Dekker, William S. Noble
Format: Article
Language:English
Published: BMC 2019-03-01
Series:Genome Biology
Online Access:http://link.springer.com/article/10.1186/s13059-019-1658-7
id doaj-c5c3304bc21c457993be8f6384642f01
record_format Article
spelling doaj-c5c3304bc21c457993be8f6384642f012020-11-25T03:29:27ZengBMCGenome Biology1474-760X2019-03-0120111910.1186/s13059-019-1658-7Measuring the reproducibility and quality of Hi-C dataGalip Gürkan Yardımcı0Hakan Ozadam1Michael E. G. Sauria2Oana Ursu3Koon-Kiu Yan4Tao Yang5Abhijit Chakraborty6Arya Kaul7Bryan R. Lajoie8Fan Song9Ye Zhan10Ferhat Ay11Mark Gerstein12Anshul Kundaje13Qunhua Li14James Taylor15Feng Yue16Job Dekker17William S. Noble18Department of Genome Sciences, University of WashingtonProgram in Systems Biology, University of Massachusetts Medical SchoolBiology Department, Johns Hopkins UniversityDepartment of Genetics, Stanford UniversityDepartment of Computational Biology, St. Jude Children’s Research HospitalBioinformatics and Genomics Program, Huck Institutes of the Life Sciences, Penn State UniversityComputational Biology Division, La Jolla Institute for Allergy and ImmunologyComputational Biology Division, La Jolla Institute for Allergy and ImmunologyProgram in Systems Biology, University of Massachusetts Medical SchoolBioinformatics and Genomics Program, Huck Institutes of the Life Sciences, Penn State UniversityUniversity of Massachusetts Medical SchoolComputational Biology Division, La Jolla Institute for Allergy and ImmunologyProgram in Computational Biology and Bioinformatics, Yale UniversityDepartment of Genetics, Stanford UniversityDepartment of Statistics, Penn State UniversityBiology Department, Johns Hopkins UniversityBioinformatics and Genomics Program, Huck Institutes of the Life Sciences, Penn State UniversityProgram in Systems Biology, University of Massachusetts Medical SchoolDepartment of Genome Sciences, University of WashingtonAbstract Background Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study. Results Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments. Conclusions In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community.http://link.springer.com/article/10.1186/s13059-019-1658-7
collection DOAJ
language English
format Article
sources DOAJ
author Galip Gürkan Yardımcı
Hakan Ozadam
Michael E. G. Sauria
Oana Ursu
Koon-Kiu Yan
Tao Yang
Abhijit Chakraborty
Arya Kaul
Bryan R. Lajoie
Fan Song
Ye Zhan
Ferhat Ay
Mark Gerstein
Anshul Kundaje
Qunhua Li
James Taylor
Feng Yue
Job Dekker
William S. Noble
spellingShingle Galip Gürkan Yardımcı
Hakan Ozadam
Michael E. G. Sauria
Oana Ursu
Koon-Kiu Yan
Tao Yang
Abhijit Chakraborty
Arya Kaul
Bryan R. Lajoie
Fan Song
Ye Zhan
Ferhat Ay
Mark Gerstein
Anshul Kundaje
Qunhua Li
James Taylor
Feng Yue
Job Dekker
William S. Noble
Measuring the reproducibility and quality of Hi-C data
Genome Biology
author_facet Galip Gürkan Yardımcı
Hakan Ozadam
Michael E. G. Sauria
Oana Ursu
Koon-Kiu Yan
Tao Yang
Abhijit Chakraborty
Arya Kaul
Bryan R. Lajoie
Fan Song
Ye Zhan
Ferhat Ay
Mark Gerstein
Anshul Kundaje
Qunhua Li
James Taylor
Feng Yue
Job Dekker
William S. Noble
author_sort Galip Gürkan Yardımcı
title Measuring the reproducibility and quality of Hi-C data
title_short Measuring the reproducibility and quality of Hi-C data
title_full Measuring the reproducibility and quality of Hi-C data
title_fullStr Measuring the reproducibility and quality of Hi-C data
title_full_unstemmed Measuring the reproducibility and quality of Hi-C data
title_sort measuring the reproducibility and quality of hi-c data
publisher BMC
series Genome Biology
issn 1474-760X
publishDate 2019-03-01
description Abstract Background Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study. Results Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments. Conclusions In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community.
url http://link.springer.com/article/10.1186/s13059-019-1658-7
work_keys_str_mv AT galipgurkanyardımcı measuringthereproducibilityandqualityofhicdata
AT hakanozadam measuringthereproducibilityandqualityofhicdata
AT michaelegsauria measuringthereproducibilityandqualityofhicdata
AT oanaursu measuringthereproducibilityandqualityofhicdata
AT koonkiuyan measuringthereproducibilityandqualityofhicdata
AT taoyang measuringthereproducibilityandqualityofhicdata
AT abhijitchakraborty measuringthereproducibilityandqualityofhicdata
AT aryakaul measuringthereproducibilityandqualityofhicdata
AT bryanrlajoie measuringthereproducibilityandqualityofhicdata
AT fansong measuringthereproducibilityandqualityofhicdata
AT yezhan measuringthereproducibilityandqualityofhicdata
AT ferhatay measuringthereproducibilityandqualityofhicdata
AT markgerstein measuringthereproducibilityandqualityofhicdata
AT anshulkundaje measuringthereproducibilityandqualityofhicdata
AT qunhuali measuringthereproducibilityandqualityofhicdata
AT jamestaylor measuringthereproducibilityandqualityofhicdata
AT fengyue measuringthereproducibilityandqualityofhicdata
AT jobdekker measuringthereproducibilityandqualityofhicdata
AT williamsnoble measuringthereproducibilityandqualityofhicdata
_version_ 1724579153418649600