LEVERAGING BIOLOGICAL REPLICATES TO IMPROVE ANALYSIS IN CHIP-SEQ EXPERIMENTS

ChIP-seq experiments identify genome-wide profiles of DNA-binding molecules including transcription factors, enzymes and epigenetic marks. Biological replicates are critical for reliable site discovery and are required for the deposition of data in the ENCODE and modENCODE projects. While early repo...

Full description

Bibliographic Details
Main Authors: Yajie Yang, Justin Fear, Jianhong Hu, Irina Haecker, Lei Zhou, Rolf Renne, David Bloom, Lauren M McIntyre
Format: Article
Language:English
Published: Elsevier 2014-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037014600106
id doaj-a6450db9061f47d3a6d23b8936bd1bda
record_format Article
spelling doaj-a6450db9061f47d3a6d23b8936bd1bda2020-11-25T01:08:16ZengElsevierComputational and Structural Biotechnology Journal2001-03702014-01-0191310.5936/csbj.201401002LEVERAGING BIOLOGICAL REPLICATES TO IMPROVE ANALYSIS IN CHIP-SEQ EXPERIMENTSYajie Yang0Justin Fear1Jianhong Hu2Irina Haecker3Lei Zhou4Rolf Renne5David Bloom6Lauren M McIntyre7Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, Florida, USADepartment of Molecular Genetics and Microbiology, University of Florida, Gainesville, Florida, USAHuman Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USADepartment of Applied Entomology, University of Giessen, Giessen, GermanyDepartment of Molecular Genetics and Microbiology, University of Florida, Gainesville, Florida, USADepartment of Molecular Genetics and Microbiology, University of Florida, Gainesville, Florida, USADepartment of Molecular Genetics and Microbiology, University of Florida, Gainesville, Florida, USADepartment of Molecular Genetics and Microbiology, University of Florida, Gainesville, Florida, USAChIP-seq experiments identify genome-wide profiles of DNA-binding molecules including transcription factors, enzymes and epigenetic marks. Biological replicates are critical for reliable site discovery and are required for the deposition of data in the ENCODE and modENCODE projects. While early reports suggested two replicates were sufficient, the widespread application of the technique has led to emerging consensus that the technique is noisy and that increasing replication may be worthwhile. Additional biological replicates also allow for quantitative assessment of differences between conditions. To date it has remained controversial about how to confirm peak identification and to determine signal strength across biological replicates, particularly when the number of replicates is greater than two. Using objective metrics, we evaluate the consistency of biological replicates in ChIP-seq experiments with more than two replicates. We compare several approaches for binding site determination, including two popular but disparate peak callers, CisGenome and MACS2. Here we propose read coverage as a quantitative measurement of signal strength for estimating sample concordance. Determining binding based on genomic features, such as promoters, is also examined. We find that increasing the number of biological replicates increases the reliability of peak identification. Critically, binding sites with strong biological evidence may be missed if researchers rely on only two biological replicates. When more than two replicates are performed, a simple majority rule (>50% of samples identify a peak) identifies peaks more reliably in all biological replicates than the absolute concordance of peak identification between any two replicates, further demonstrating the utility of increasing replicate numbers in ChIP-seq experiments.http://www.sciencedirect.com/science/article/pii/S2001037014600106ChIP-seqpeak identificationbiological replicates
collection DOAJ
language English
format Article
sources DOAJ
author Yajie Yang
Justin Fear
Jianhong Hu
Irina Haecker
Lei Zhou
Rolf Renne
David Bloom
Lauren M McIntyre
spellingShingle Yajie Yang
Justin Fear
Jianhong Hu
Irina Haecker
Lei Zhou
Rolf Renne
David Bloom
Lauren M McIntyre
LEVERAGING BIOLOGICAL REPLICATES TO IMPROVE ANALYSIS IN CHIP-SEQ EXPERIMENTS
Computational and Structural Biotechnology Journal
ChIP-seq
peak identification
biological replicates
author_facet Yajie Yang
Justin Fear
Jianhong Hu
Irina Haecker
Lei Zhou
Rolf Renne
David Bloom
Lauren M McIntyre
author_sort Yajie Yang
title LEVERAGING BIOLOGICAL REPLICATES TO IMPROVE ANALYSIS IN CHIP-SEQ EXPERIMENTS
title_short LEVERAGING BIOLOGICAL REPLICATES TO IMPROVE ANALYSIS IN CHIP-SEQ EXPERIMENTS
title_full LEVERAGING BIOLOGICAL REPLICATES TO IMPROVE ANALYSIS IN CHIP-SEQ EXPERIMENTS
title_fullStr LEVERAGING BIOLOGICAL REPLICATES TO IMPROVE ANALYSIS IN CHIP-SEQ EXPERIMENTS
title_full_unstemmed LEVERAGING BIOLOGICAL REPLICATES TO IMPROVE ANALYSIS IN CHIP-SEQ EXPERIMENTS
title_sort leveraging biological replicates to improve analysis in chip-seq experiments
publisher Elsevier
series Computational and Structural Biotechnology Journal
issn 2001-0370
publishDate 2014-01-01
description ChIP-seq experiments identify genome-wide profiles of DNA-binding molecules including transcription factors, enzymes and epigenetic marks. Biological replicates are critical for reliable site discovery and are required for the deposition of data in the ENCODE and modENCODE projects. While early reports suggested two replicates were sufficient, the widespread application of the technique has led to emerging consensus that the technique is noisy and that increasing replication may be worthwhile. Additional biological replicates also allow for quantitative assessment of differences between conditions. To date it has remained controversial about how to confirm peak identification and to determine signal strength across biological replicates, particularly when the number of replicates is greater than two. Using objective metrics, we evaluate the consistency of biological replicates in ChIP-seq experiments with more than two replicates. We compare several approaches for binding site determination, including two popular but disparate peak callers, CisGenome and MACS2. Here we propose read coverage as a quantitative measurement of signal strength for estimating sample concordance. Determining binding based on genomic features, such as promoters, is also examined. We find that increasing the number of biological replicates increases the reliability of peak identification. Critically, binding sites with strong biological evidence may be missed if researchers rely on only two biological replicates. When more than two replicates are performed, a simple majority rule (>50% of samples identify a peak) identifies peaks more reliably in all biological replicates than the absolute concordance of peak identification between any two replicates, further demonstrating the utility of increasing replicate numbers in ChIP-seq experiments.
topic ChIP-seq
peak identification
biological replicates
url http://www.sciencedirect.com/science/article/pii/S2001037014600106
work_keys_str_mv AT yajieyang leveragingbiologicalreplicatestoimproveanalysisinchipseqexperiments
AT justinfear leveragingbiologicalreplicatestoimproveanalysisinchipseqexperiments
AT jianhonghu leveragingbiologicalreplicatestoimproveanalysisinchipseqexperiments
AT irinahaecker leveragingbiologicalreplicatestoimproveanalysisinchipseqexperiments
AT leizhou leveragingbiologicalreplicatestoimproveanalysisinchipseqexperiments
AT rolfrenne leveragingbiologicalreplicatestoimproveanalysisinchipseqexperiments
AT davidbloom leveragingbiologicalreplicatestoimproveanalysisinchipseqexperiments
AT laurenmmcintyre leveragingbiologicalreplicatestoimproveanalysisinchipseqexperiments
_version_ 1725183360649658368