A Machine Learning Approach to Predicting Autism Risk Genes: Validation of Known Genes and Discovery of New Candidates

Autism spectrum disorder (ASD) is a complex neurodevelopmental condition with a strong genetic basis. The role of de novo mutations in ASD has been well established, but the set of genes implicated to date is still far from complete. The current study employs a machine learning-based approach to pre...

Full description

Bibliographic Details
Main Authors: Ying Lin, Shiva Afshar, Anjali M. Rajadhyaksha, James B. Potash, Shizhong Han
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-09-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fgene.2020.500064/full
id doaj-668731df287a4d50a6197530ad1e1daf
record_format Article
spelling doaj-668731df287a4d50a6197530ad1e1daf2020-11-25T03:48:06ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-09-011110.3389/fgene.2020.500064500064A Machine Learning Approach to Predicting Autism Risk Genes: Validation of Known Genes and Discovery of New CandidatesYing Lin0Shiva Afshar1Anjali M. Rajadhyaksha2Anjali M. Rajadhyaksha3Anjali M. Rajadhyaksha4James B. Potash5Shizhong Han6Shizhong Han7Department of Industrial Engineering, University of Houston, Houston, TX, United StatesDepartment of Industrial Engineering, University of Houston, Houston, TX, United StatesDivision of Pediatric Neurology, Department of Pediatrics, Weill Cornell Medicine, New York, NY, United StatesFeil Family Brain & Mind Research Institute, Weill Cornell Medicine, New York, NY, United StatesWeill Cornell Autism Research Program, Weill Cornell Medicine, New York, NY, United StatesDepartment of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, United StatesDepartment of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, United StatesLieber Institute for Brain Development, Baltimore, MD, United StatesAutism spectrum disorder (ASD) is a complex neurodevelopmental condition with a strong genetic basis. The role of de novo mutations in ASD has been well established, but the set of genes implicated to date is still far from complete. The current study employs a machine learning-based approach to predict ASD risk genes using features from spatiotemporal gene expression patterns in human brain, gene-level constraint metrics, and other gene variation features. The genes identified through our prediction model were enriched for independent sets of ASD risk genes, and tended to be down-expressed in ASD brains, especially in frontal and parietal cortex. The highest-ranked genes not only included those with strong prior evidence for involvement in ASD (for example, NBEA, HERC1, and TCF20), but also indicated potentially novel candidates, such as, MYCBP2 and CAND1, which are involved in protein ubiquitination. We also showed that our method outperformed state-of-the-art scoring systems for ranking curated ASD candidate genes. Gene ontology enrichment analysis of our predicted risk genes revealed biological processes clearly relevant to ASD, including neuronal signaling, neurogenesis, and chromatin remodeling, but also highlighted other potential mechanisms that might underlie ASD, such as regulation of RNA alternative splicing and ubiquitination pathway related to protein degradation. Our study demonstrates that human brain spatiotemporal gene expression patterns and gene-level constraint metrics can help predict ASD risk genes. Our gene ranking system provides a useful resource for prioritizing ASD candidate genes.https://www.frontiersin.org/article/10.3389/fgene.2020.500064/fullautismde novo mutationgene expressionconstraintmachine learning
collection DOAJ
language English
format Article
sources DOAJ
author Ying Lin
Shiva Afshar
Anjali M. Rajadhyaksha
Anjali M. Rajadhyaksha
Anjali M. Rajadhyaksha
James B. Potash
Shizhong Han
Shizhong Han
spellingShingle Ying Lin
Shiva Afshar
Anjali M. Rajadhyaksha
Anjali M. Rajadhyaksha
Anjali M. Rajadhyaksha
James B. Potash
Shizhong Han
Shizhong Han
A Machine Learning Approach to Predicting Autism Risk Genes: Validation of Known Genes and Discovery of New Candidates
Frontiers in Genetics
autism
de novo mutation
gene expression
constraint
machine learning
author_facet Ying Lin
Shiva Afshar
Anjali M. Rajadhyaksha
Anjali M. Rajadhyaksha
Anjali M. Rajadhyaksha
James B. Potash
Shizhong Han
Shizhong Han
author_sort Ying Lin
title A Machine Learning Approach to Predicting Autism Risk Genes: Validation of Known Genes and Discovery of New Candidates
title_short A Machine Learning Approach to Predicting Autism Risk Genes: Validation of Known Genes and Discovery of New Candidates
title_full A Machine Learning Approach to Predicting Autism Risk Genes: Validation of Known Genes and Discovery of New Candidates
title_fullStr A Machine Learning Approach to Predicting Autism Risk Genes: Validation of Known Genes and Discovery of New Candidates
title_full_unstemmed A Machine Learning Approach to Predicting Autism Risk Genes: Validation of Known Genes and Discovery of New Candidates
title_sort machine learning approach to predicting autism risk genes: validation of known genes and discovery of new candidates
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2020-09-01
description Autism spectrum disorder (ASD) is a complex neurodevelopmental condition with a strong genetic basis. The role of de novo mutations in ASD has been well established, but the set of genes implicated to date is still far from complete. The current study employs a machine learning-based approach to predict ASD risk genes using features from spatiotemporal gene expression patterns in human brain, gene-level constraint metrics, and other gene variation features. The genes identified through our prediction model were enriched for independent sets of ASD risk genes, and tended to be down-expressed in ASD brains, especially in frontal and parietal cortex. The highest-ranked genes not only included those with strong prior evidence for involvement in ASD (for example, NBEA, HERC1, and TCF20), but also indicated potentially novel candidates, such as, MYCBP2 and CAND1, which are involved in protein ubiquitination. We also showed that our method outperformed state-of-the-art scoring systems for ranking curated ASD candidate genes. Gene ontology enrichment analysis of our predicted risk genes revealed biological processes clearly relevant to ASD, including neuronal signaling, neurogenesis, and chromatin remodeling, but also highlighted other potential mechanisms that might underlie ASD, such as regulation of RNA alternative splicing and ubiquitination pathway related to protein degradation. Our study demonstrates that human brain spatiotemporal gene expression patterns and gene-level constraint metrics can help predict ASD risk genes. Our gene ranking system provides a useful resource for prioritizing ASD candidate genes.
topic autism
de novo mutation
gene expression
constraint
machine learning
url https://www.frontiersin.org/article/10.3389/fgene.2020.500064/full
work_keys_str_mv AT yinglin amachinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT shivaafshar amachinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT anjalimrajadhyaksha amachinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT anjalimrajadhyaksha amachinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT anjalimrajadhyaksha amachinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT jamesbpotash amachinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT shizhonghan amachinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT shizhonghan amachinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT yinglin machinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT shivaafshar machinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT anjalimrajadhyaksha machinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT anjalimrajadhyaksha machinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT anjalimrajadhyaksha machinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT jamesbpotash machinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT shizhonghan machinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
AT shizhonghan machinelearningapproachtopredictingautismriskgenesvalidationofknowngenesanddiscoveryofnewcandidates
_version_ 1724500222231445504