Deciphering eukaryotic gene-regulatory logic with 100 million random promoters

How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences th...

Full description

Bibliographic Details
Main Authors: de Boer, Carl G (Author), Vaishnav, Eeshit Dhaval (Author), Sadeh, Ronen (Author), Abeyta, Esteban Luis (Author), Friedman, Nir (Author), Regev, Aviv (Author)
Other Authors: Broad Institute of MIT and Harvard (Contributor), Massachusetts Institute of Technology. Department of Biology (Contributor), Koch Institute for Integrative Cancer Research at MIT (Contributor)
Format: Article
Language:English
Published: Springer Science and Business Media LLC, 2020-06-25T17:07:04Z.
Subjects:
Online Access:Get fulltext
LEADER 02300 am a22002773u 4500
001 125982
042 |a dc 
100 1 0 |a de Boer, Carl G.  |e author 
100 1 0 |a Broad Institute of MIT and Harvard  |e contributor 
100 1 0 |a Massachusetts Institute of Technology. Department of Biology  |e contributor 
100 1 0 |a Koch Institute for Integrative Cancer Research at MIT  |e contributor 
700 1 0 |a Vaishnav, Eeshit Dhaval  |e author 
700 1 0 |a Sadeh, Ronen  |e author 
700 1 0 |a Abeyta, Esteban Luis  |e author 
700 1 0 |a Friedman, Nir  |e author 
700 1 0 |a Regev, Aviv  |e author 
245 0 0 |a Deciphering eukaryotic gene-regulatory logic with 100 million random promoters 
260 |b Springer Science and Business Media LLC,   |c 2020-06-25T17:07:04Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/125982 
520 |a How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF's specificity, activity and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation. ©2019, The Author(s), under exclusive licence to Springer Nature America, Inc. 
520 |a NIH (grant no. K99-HG009920-01) 
520 |a Fellowship from the Canadian Institutes for Health Research 
520 |a MIT Presidential Fellowship 
546 |a en 
655 7 |a Article 
773 |t Nature Biotechnology