Learning the Regulatory Code of Gene Expression

Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence represe...

Full description

Bibliographic Details
Main Authors: Jan Zrimec, Filip Buric, Mariia Kokina, Victor Garcia, Aleksej Zelezniak
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-06-01
Series:Frontiers in Molecular Biosciences
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmolb.2021.673363/full
id doaj-c0e2ccb1487a456a858ac610a874838c
record_format Article
spelling doaj-c0e2ccb1487a456a858ac610a874838c2021-06-10T08:54:40ZengFrontiers Media S.A.Frontiers in Molecular Biosciences2296-889X2021-06-01810.3389/fmolb.2021.673363673363Learning the Regulatory Code of Gene ExpressionJan Zrimec0Filip Buric1Mariia Kokina2Mariia Kokina3Victor Garcia4Aleksej Zelezniak5Aleksej Zelezniak6Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, SwedenDepartment of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, SwedenDepartment of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, SwedenNovo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, DenmarkSchool of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, SwitzerlandDepartment of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, SwedenScience for Life Laboratory, Stockholm, SwedenData-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.https://www.frontiersin.org/articles/10.3389/fmolb.2021.673363/fullgene expression predictioncis-regulatory grammargene regulatory structuremRNA & protein abundancechromatin accessibilityregulatory genomics
collection DOAJ
language English
format Article
sources DOAJ
author Jan Zrimec
Filip Buric
Mariia Kokina
Mariia Kokina
Victor Garcia
Aleksej Zelezniak
Aleksej Zelezniak
spellingShingle Jan Zrimec
Filip Buric
Mariia Kokina
Mariia Kokina
Victor Garcia
Aleksej Zelezniak
Aleksej Zelezniak
Learning the Regulatory Code of Gene Expression
Frontiers in Molecular Biosciences
gene expression prediction
cis-regulatory grammar
gene regulatory structure
mRNA & protein abundance
chromatin accessibility
regulatory genomics
author_facet Jan Zrimec
Filip Buric
Mariia Kokina
Mariia Kokina
Victor Garcia
Aleksej Zelezniak
Aleksej Zelezniak
author_sort Jan Zrimec
title Learning the Regulatory Code of Gene Expression
title_short Learning the Regulatory Code of Gene Expression
title_full Learning the Regulatory Code of Gene Expression
title_fullStr Learning the Regulatory Code of Gene Expression
title_full_unstemmed Learning the Regulatory Code of Gene Expression
title_sort learning the regulatory code of gene expression
publisher Frontiers Media S.A.
series Frontiers in Molecular Biosciences
issn 2296-889X
publishDate 2021-06-01
description Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
topic gene expression prediction
cis-regulatory grammar
gene regulatory structure
mRNA & protein abundance
chromatin accessibility
regulatory genomics
url https://www.frontiersin.org/articles/10.3389/fmolb.2021.673363/full
work_keys_str_mv AT janzrimec learningtheregulatorycodeofgeneexpression
AT filipburic learningtheregulatorycodeofgeneexpression
AT mariiakokina learningtheregulatorycodeofgeneexpression
AT mariiakokina learningtheregulatorycodeofgeneexpression
AT victorgarcia learningtheregulatorycodeofgeneexpression
AT aleksejzelezniak learningtheregulatorycodeofgeneexpression
AT aleksejzelezniak learningtheregulatorycodeofgeneexpression
_version_ 1721385299498500096