Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility

Abstract Background Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility pro...

Full description

Bibliographic Details
Main Authors: Sheng Liu, Cristina Zibetti, Jun Wan, Guohua Wang, Seth Blackshaw, Jiang Qian
Format: Article
Language:English
Published: BMC 2017-07-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-017-1769-7
id doaj-ffc24bb70e1640fb98cf035bb7b000b1
record_format Article
spelling doaj-ffc24bb70e1640fb98cf035bb7b000b12020-11-24T21:25:19ZengBMCBMC Bioinformatics1471-21052017-07-0118111110.1186/s12859-017-1769-7Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibilitySheng Liu0Cristina Zibetti1Jun Wan2Guohua Wang3Seth Blackshaw4Jiang Qian5Department of Ophthalmology, Johns Hopkins University School of MedicineSolomon H. Snyder Department of Neuroscience, Johns Hopkins University School of MedicineDepartment of Ophthalmology, Johns Hopkins University School of MedicineDepartment of Ophthalmology, Johns Hopkins University School of MedicineDepartment of Ophthalmology, Johns Hopkins University School of MedicineDepartment of Ophthalmology, Johns Hopkins University School of MedicineAbstract Background Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. Furthermore, ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events. Results We assessed to what extent a model built upon on other TFs and/or other cell types could be used to predict the binding sites of TFs of interest. A random forest model was built using a set of cell type-independent features such as specific sequences recognized by the TFs and evolutionary conservation, as well as cell type-specific features derived from chromatin accessibility data. Our analysis suggested that the models learned from other TFs and/or cell lines performed almost as well as the model learned from the target TF in the cell type of interest. Interestingly, models based on multiple TFs performed better than single-TF models. Finally, we proposed a universal model, BPAC, which was generated using ChIP-Seq data from multiple TFs in various cell types. Conclusion Integrating chromatin accessibility information with sequence information improves prediction of TF binding.The prediction of TF binding is transferable across TFs and/or cell lines suggesting there are a set of universal “rules”. A computational tool was developed to predict TF binding sites based on the universal “rules”.http://link.springer.com/article/10.1186/s12859-017-1769-7Transcription factor binding predictionChromatin accessibilityMachine learningFeature selection
collection DOAJ
language English
format Article
sources DOAJ
author Sheng Liu
Cristina Zibetti
Jun Wan
Guohua Wang
Seth Blackshaw
Jiang Qian
spellingShingle Sheng Liu
Cristina Zibetti
Jun Wan
Guohua Wang
Seth Blackshaw
Jiang Qian
Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
BMC Bioinformatics
Transcription factor binding prediction
Chromatin accessibility
Machine learning
Feature selection
author_facet Sheng Liu
Cristina Zibetti
Jun Wan
Guohua Wang
Seth Blackshaw
Jiang Qian
author_sort Sheng Liu
title Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
title_short Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
title_full Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
title_fullStr Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
title_full_unstemmed Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
title_sort assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2017-07-01
description Abstract Background Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. Furthermore, ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events. Results We assessed to what extent a model built upon on other TFs and/or other cell types could be used to predict the binding sites of TFs of interest. A random forest model was built using a set of cell type-independent features such as specific sequences recognized by the TFs and evolutionary conservation, as well as cell type-specific features derived from chromatin accessibility data. Our analysis suggested that the models learned from other TFs and/or cell lines performed almost as well as the model learned from the target TF in the cell type of interest. Interestingly, models based on multiple TFs performed better than single-TF models. Finally, we proposed a universal model, BPAC, which was generated using ChIP-Seq data from multiple TFs in various cell types. Conclusion Integrating chromatin accessibility information with sequence information improves prediction of TF binding.The prediction of TF binding is transferable across TFs and/or cell lines suggesting there are a set of universal “rules”. A computational tool was developed to predict TF binding sites based on the universal “rules”.
topic Transcription factor binding prediction
Chromatin accessibility
Machine learning
Feature selection
url http://link.springer.com/article/10.1186/s12859-017-1769-7
work_keys_str_mv AT shengliu assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility
AT cristinazibetti assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility
AT junwan assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility
AT guohuawang assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility
AT sethblackshaw assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility
AT jiangqian assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility
_version_ 1725983463225425920