Interpretable Machine Learning Reveals Dissimilarities Between Subtypes of Autism Spectrum Disorder

Autism spectrum disorder (ASD) is a heterogeneous neuropsychiatric disorder with a complex genetic background. Analysis of altered molecular processes in ASD patients requires linear and nonlinear methods that provide interpretable solutions. Interpretable machine learning provides legible models th...

Full description

Bibliographic Details
Main Authors: Mateusz Garbulowski, Karolina Smolinska, Klev Diamanti, Gang Pan, Khurram Maqbool, Lars Feuk, Jan Komorowski
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-02-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2021.618277/full
Description
Summary:Autism spectrum disorder (ASD) is a heterogeneous neuropsychiatric disorder with a complex genetic background. Analysis of altered molecular processes in ASD patients requires linear and nonlinear methods that provide interpretable solutions. Interpretable machine learning provides legible models that allow explaining biological mechanisms and support analysis of clinical subgroups. In this work, we investigated several case-control studies of gene expression measurements of ASD individuals. We constructed a rule-based learning model from three independent datasets that we further visualized as a nonlinear gene-gene co-predictive network. To find dissimilarities between ASD subtypes, we scrutinized a topological structure of the network and estimated a centrality distance. Our analysis revealed that autism is the most severe subtype of ASD, while pervasive developmental disorder-not otherwise specified and Asperger syndrome are closely related and milder ASD subtypes. Furthermore, we analyzed the most important ASD-related features that were described in terms of gene co-predictors. Among others, we found a strong co-predictive mechanism between EMC4 and TMEM30A, which may suggest a co-regulation between these genes. The present study demonstrates the potential of applying interpretable machine learning in bioinformatics analyses. Although the proposed methodology was designed for transcriptomics data, it can be applied to other omics disciplines.
ISSN:1664-8021