Feasibility of predicting allele specific expression from DNA sequencing using machine learning

Abstract Allele specific expression (ASE) concerns divergent expression quantity of alternative alleles and is measured by RNA sequencing. Multiple studies show that ASE plays a role in hereditary diseases by modulating penetrance or phenotype severity. However, genome diagnostics is based on DNA se...

Full description

Bibliographic Details
Main Authors: Zhenhua Zhang, Freerk van Dijk, Niek de Klein, Mariëlle E van Gijn, Lude H Franke, Richard J Sinke, Morris A Swertz, K Joeri van der Velde
Format: Article
Language:English
Published: Nature Publishing Group 2021-05-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-021-89904-y
id doaj-046bc28e1cc24a58a64d6f222d461d04
record_format Article
spelling doaj-046bc28e1cc24a58a64d6f222d461d042021-05-23T11:33:44ZengNature Publishing GroupScientific Reports2045-23222021-05-0111111110.1038/s41598-021-89904-yFeasibility of predicting allele specific expression from DNA sequencing using machine learningZhenhua Zhang0Freerk van Dijk1Niek de Klein2Mariëlle E van Gijn3Lude H Franke4Richard J Sinke5Morris A Swertz6K Joeri van der Velde7Genomics Coordination Center, University of Groningen and University Medical Center GroningenGenomics Coordination Center, University of Groningen and University Medical Center GroningenDepartment of Genetics, University of Groningen and University Medical Center GroningenDepartment of Genetics, University of Groningen and University Medical Center GroningenDepartment of Genetics, University of Groningen and University Medical Center GroningenDepartment of Genetics, University of Groningen and University Medical Center GroningenGenomics Coordination Center, University of Groningen and University Medical Center GroningenGenomics Coordination Center, University of Groningen and University Medical Center GroningenAbstract Allele specific expression (ASE) concerns divergent expression quantity of alternative alleles and is measured by RNA sequencing. Multiple studies show that ASE plays a role in hereditary diseases by modulating penetrance or phenotype severity. However, genome diagnostics is based on DNA sequencing and therefore neglects gene expression regulation such as ASE. To take advantage of ASE in absence of RNA sequencing, it must be predicted using only DNA variation. We have constructed ASE models from BIOS (n = 3432) and GTEx (n = 369) that predict ASE using DNA features. These models are highly reproducible and comprise many different feature types, highlighting the complex regulation that underlies ASE. We applied the BIOS-trained model to population variants in three genes in which ASE plays a clinically relevant role: BRCA2, RET and NF1. This resulted in predicted ASE effects for 27 variants, of which 10 were known pathogenic variants. We demonstrated that ASE can be predicted from DNA features using machine learning. Future efforts may improve sensitivity and translate these models into a new type of genome diagnostic tool that prioritizes candidate pathogenic variants or regulators thereof for follow-up validation by RNA sequencing. All used code and machine learning models are available at GitHub and Zenodo.https://doi.org/10.1038/s41598-021-89904-y
collection DOAJ
language English
format Article
sources DOAJ
author Zhenhua Zhang
Freerk van Dijk
Niek de Klein
Mariëlle E van Gijn
Lude H Franke
Richard J Sinke
Morris A Swertz
K Joeri van der Velde
spellingShingle Zhenhua Zhang
Freerk van Dijk
Niek de Klein
Mariëlle E van Gijn
Lude H Franke
Richard J Sinke
Morris A Swertz
K Joeri van der Velde
Feasibility of predicting allele specific expression from DNA sequencing using machine learning
Scientific Reports
author_facet Zhenhua Zhang
Freerk van Dijk
Niek de Klein
Mariëlle E van Gijn
Lude H Franke
Richard J Sinke
Morris A Swertz
K Joeri van der Velde
author_sort Zhenhua Zhang
title Feasibility of predicting allele specific expression from DNA sequencing using machine learning
title_short Feasibility of predicting allele specific expression from DNA sequencing using machine learning
title_full Feasibility of predicting allele specific expression from DNA sequencing using machine learning
title_fullStr Feasibility of predicting allele specific expression from DNA sequencing using machine learning
title_full_unstemmed Feasibility of predicting allele specific expression from DNA sequencing using machine learning
title_sort feasibility of predicting allele specific expression from dna sequencing using machine learning
publisher Nature Publishing Group
series Scientific Reports
issn 2045-2322
publishDate 2021-05-01
description Abstract Allele specific expression (ASE) concerns divergent expression quantity of alternative alleles and is measured by RNA sequencing. Multiple studies show that ASE plays a role in hereditary diseases by modulating penetrance or phenotype severity. However, genome diagnostics is based on DNA sequencing and therefore neglects gene expression regulation such as ASE. To take advantage of ASE in absence of RNA sequencing, it must be predicted using only DNA variation. We have constructed ASE models from BIOS (n = 3432) and GTEx (n = 369) that predict ASE using DNA features. These models are highly reproducible and comprise many different feature types, highlighting the complex regulation that underlies ASE. We applied the BIOS-trained model to population variants in three genes in which ASE plays a clinically relevant role: BRCA2, RET and NF1. This resulted in predicted ASE effects for 27 variants, of which 10 were known pathogenic variants. We demonstrated that ASE can be predicted from DNA features using machine learning. Future efforts may improve sensitivity and translate these models into a new type of genome diagnostic tool that prioritizes candidate pathogenic variants or regulators thereof for follow-up validation by RNA sequencing. All used code and machine learning models are available at GitHub and Zenodo.
url https://doi.org/10.1038/s41598-021-89904-y
work_keys_str_mv AT zhenhuazhang feasibilityofpredictingallelespecificexpressionfromdnasequencingusingmachinelearning
AT freerkvandijk feasibilityofpredictingallelespecificexpressionfromdnasequencingusingmachinelearning
AT niekdeklein feasibilityofpredictingallelespecificexpressionfromdnasequencingusingmachinelearning
AT marielleevangijn feasibilityofpredictingallelespecificexpressionfromdnasequencingusingmachinelearning
AT ludehfranke feasibilityofpredictingallelespecificexpressionfromdnasequencingusingmachinelearning
AT richardjsinke feasibilityofpredictingallelespecificexpressionfromdnasequencingusingmachinelearning
AT morrisaswertz feasibilityofpredictingallelespecificexpressionfromdnasequencingusingmachinelearning
AT kjoerivandervelde feasibilityofpredictingallelespecificexpressionfromdnasequencingusingmachinelearning
_version_ 1721429531854635008