Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
Abstract Background Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility pro...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2017-07-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-017-1769-7 |
id |
doaj-ffc24bb70e1640fb98cf035bb7b000b1 |
---|---|
record_format |
Article |
spelling |
doaj-ffc24bb70e1640fb98cf035bb7b000b12020-11-24T21:25:19ZengBMCBMC Bioinformatics1471-21052017-07-0118111110.1186/s12859-017-1769-7Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibilitySheng Liu0Cristina Zibetti1Jun Wan2Guohua Wang3Seth Blackshaw4Jiang Qian5Department of Ophthalmology, Johns Hopkins University School of MedicineSolomon H. Snyder Department of Neuroscience, Johns Hopkins University School of MedicineDepartment of Ophthalmology, Johns Hopkins University School of MedicineDepartment of Ophthalmology, Johns Hopkins University School of MedicineDepartment of Ophthalmology, Johns Hopkins University School of MedicineDepartment of Ophthalmology, Johns Hopkins University School of MedicineAbstract Background Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. Furthermore, ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events. Results We assessed to what extent a model built upon on other TFs and/or other cell types could be used to predict the binding sites of TFs of interest. A random forest model was built using a set of cell type-independent features such as specific sequences recognized by the TFs and evolutionary conservation, as well as cell type-specific features derived from chromatin accessibility data. Our analysis suggested that the models learned from other TFs and/or cell lines performed almost as well as the model learned from the target TF in the cell type of interest. Interestingly, models based on multiple TFs performed better than single-TF models. Finally, we proposed a universal model, BPAC, which was generated using ChIP-Seq data from multiple TFs in various cell types. Conclusion Integrating chromatin accessibility information with sequence information improves prediction of TF binding.The prediction of TF binding is transferable across TFs and/or cell lines suggesting there are a set of universal “rules”. A computational tool was developed to predict TF binding sites based on the universal “rules”.http://link.springer.com/article/10.1186/s12859-017-1769-7Transcription factor binding predictionChromatin accessibilityMachine learningFeature selection |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Sheng Liu Cristina Zibetti Jun Wan Guohua Wang Seth Blackshaw Jiang Qian |
spellingShingle |
Sheng Liu Cristina Zibetti Jun Wan Guohua Wang Seth Blackshaw Jiang Qian Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility BMC Bioinformatics Transcription factor binding prediction Chromatin accessibility Machine learning Feature selection |
author_facet |
Sheng Liu Cristina Zibetti Jun Wan Guohua Wang Seth Blackshaw Jiang Qian |
author_sort |
Sheng Liu |
title |
Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility |
title_short |
Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility |
title_full |
Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility |
title_fullStr |
Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility |
title_full_unstemmed |
Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility |
title_sort |
assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2017-07-01 |
description |
Abstract Background Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. Furthermore, ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events. Results We assessed to what extent a model built upon on other TFs and/or other cell types could be used to predict the binding sites of TFs of interest. A random forest model was built using a set of cell type-independent features such as specific sequences recognized by the TFs and evolutionary conservation, as well as cell type-specific features derived from chromatin accessibility data. Our analysis suggested that the models learned from other TFs and/or cell lines performed almost as well as the model learned from the target TF in the cell type of interest. Interestingly, models based on multiple TFs performed better than single-TF models. Finally, we proposed a universal model, BPAC, which was generated using ChIP-Seq data from multiple TFs in various cell types. Conclusion Integrating chromatin accessibility information with sequence information improves prediction of TF binding.The prediction of TF binding is transferable across TFs and/or cell lines suggesting there are a set of universal “rules”. A computational tool was developed to predict TF binding sites based on the universal “rules”. |
topic |
Transcription factor binding prediction Chromatin accessibility Machine learning Feature selection |
url |
http://link.springer.com/article/10.1186/s12859-017-1769-7 |
work_keys_str_mv |
AT shengliu assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility AT cristinazibetti assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility AT junwan assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility AT guohuawang assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility AT sethblackshaw assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility AT jiangqian assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility |
_version_ |
1725983463225425920 |