Integrating diverse datasets improves developmental enhancer prediction.

Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing devel...

Full description

Bibliographic Details
Main Authors: Genevieve D Erwin, Nir Oksenberg, Rebecca M Truty, Dennis Kostka, Karl K Murphy, Nadav Ahituv, Katherine S Pollard, John A Capra
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-06-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC4072507?pdf=render
id doaj-53cdaba1552e4997beced7a40fbb30e1
record_format Article
spelling doaj-53cdaba1552e4997beced7a40fbb30e12020-11-25T01:44:11ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582014-06-01106e100367710.1371/journal.pcbi.1003677Integrating diverse datasets improves developmental enhancer prediction.Genevieve D ErwinNir OksenbergRebecca M TrutyDennis KostkaKarl K MurphyNadav AhituvKatherine S PollardJohn A CapraGene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable researchers to further investigate questions in developmental biology.http://europepmc.org/articles/PMC4072507?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Genevieve D Erwin
Nir Oksenberg
Rebecca M Truty
Dennis Kostka
Karl K Murphy
Nadav Ahituv
Katherine S Pollard
John A Capra
spellingShingle Genevieve D Erwin
Nir Oksenberg
Rebecca M Truty
Dennis Kostka
Karl K Murphy
Nadav Ahituv
Katherine S Pollard
John A Capra
Integrating diverse datasets improves developmental enhancer prediction.
PLoS Computational Biology
author_facet Genevieve D Erwin
Nir Oksenberg
Rebecca M Truty
Dennis Kostka
Karl K Murphy
Nadav Ahituv
Katherine S Pollard
John A Capra
author_sort Genevieve D Erwin
title Integrating diverse datasets improves developmental enhancer prediction.
title_short Integrating diverse datasets improves developmental enhancer prediction.
title_full Integrating diverse datasets improves developmental enhancer prediction.
title_fullStr Integrating diverse datasets improves developmental enhancer prediction.
title_full_unstemmed Integrating diverse datasets improves developmental enhancer prediction.
title_sort integrating diverse datasets improves developmental enhancer prediction.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2014-06-01
description Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable researchers to further investigate questions in developmental biology.
url http://europepmc.org/articles/PMC4072507?pdf=render
work_keys_str_mv AT genevievederwin integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
AT niroksenberg integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
AT rebeccamtruty integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
AT denniskostka integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
AT karlkmurphy integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
AT nadavahituv integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
AT katherinespollard integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
AT johnacapra integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
_version_ 1725029369044271104