Pathway-Activity Likelihood Analysis and Metabolite Annotation for Untargeted Metabolomics Using Probabilistic Modeling

<b>Motivation</b>: Untargeted metabolomics comprehensively characterizes small molecules and elucidates activities of biochemical pathways within a biological sample. Despite computational advances, interpreting collected measurements and determining their biological role remains a chall...

Full description

Bibliographic Details
Main Authors: Ramtin Hosseini, Neda Hassanpour, Li-Ping Liu, Soha Hassoun
Format: Article
Language:English
Published: MDPI AG 2020-05-01
Series:Metabolites
Subjects:
Online Access:https://www.mdpi.com/2218-1989/10/5/183
id doaj-0955027d557d48c0a91cc9a032bf2280
record_format Article
spelling doaj-0955027d557d48c0a91cc9a032bf22802020-11-25T03:35:27ZengMDPI AGMetabolites2218-19892020-05-011018318310.3390/metabo10050183Pathway-Activity Likelihood Analysis and Metabolite Annotation for Untargeted Metabolomics Using Probabilistic ModelingRamtin Hosseini0Neda Hassanpour1Li-Ping Liu2Soha Hassoun3Department of Computer Science, Tufts University, Medford, MA 02155, USADepartment of Computer Science, Tufts University, Medford, MA 02155, USADepartment of Computer Science, Tufts University, Medford, MA 02155, USADepartment of Computer Science, Tufts University, Medford, MA 02155, USA<b>Motivation</b>: Untargeted metabolomics comprehensively characterizes small molecules and elucidates activities of biochemical pathways within a biological sample. Despite computational advances, interpreting collected measurements and determining their biological role remains a challenge. <b>Results</b>: To interpret measurements, we present an inference-based approach, termed Probabilistic modeling for Untargeted Metabolomics Analysis (PUMA). Our approach captures metabolomics measurements and the biological network for the biological sample under study in a generative model and uses stochastic sampling to compute posterior probability distributions. PUMA predicts the likelihood of pathways being active, and then derives probabilistic annotations, which assign chemical identities to measurements. Unlike prior pathway analysis tools that analyze differentially active pathways, PUMA defines a pathway as <i>active</i> if the likelihood that the path generated the observed measurements is above a particular (user-defined) threshold. Due to the lack of “ground truth” metabolomics datasets, where all measurements are annotated and pathway activities are known, PUMA is validated on synthetic datasets that are designed to mimic cellular processes. PUMA, on average, outperforms pathway enrichment analysis by 8%. PUMA is applied to two case studies. PUMA suggests many biological meaningful pathways as active. Annotation results were in agreement to those obtained using other tools that utilize additional information in the form of spectral signatures. Importantly, PUMA annotates many measurements, suggesting 23 chemical identities for metabolites that were previously only identified as isomers, and a significant number of additional putative annotations over spectral database lookups. For an experimentally validated 50-compound dataset, annotations using PUMA yielded 0.833 precision and 0.676 recall.https://www.mdpi.com/2218-1989/10/5/183machine learninginferenceuntargeted metabolomicsbiological networkmetabolic model
collection DOAJ
language English
format Article
sources DOAJ
author Ramtin Hosseini
Neda Hassanpour
Li-Ping Liu
Soha Hassoun
spellingShingle Ramtin Hosseini
Neda Hassanpour
Li-Ping Liu
Soha Hassoun
Pathway-Activity Likelihood Analysis and Metabolite Annotation for Untargeted Metabolomics Using Probabilistic Modeling
Metabolites
machine learning
inference
untargeted metabolomics
biological network
metabolic model
author_facet Ramtin Hosseini
Neda Hassanpour
Li-Ping Liu
Soha Hassoun
author_sort Ramtin Hosseini
title Pathway-Activity Likelihood Analysis and Metabolite Annotation for Untargeted Metabolomics Using Probabilistic Modeling
title_short Pathway-Activity Likelihood Analysis and Metabolite Annotation for Untargeted Metabolomics Using Probabilistic Modeling
title_full Pathway-Activity Likelihood Analysis and Metabolite Annotation for Untargeted Metabolomics Using Probabilistic Modeling
title_fullStr Pathway-Activity Likelihood Analysis and Metabolite Annotation for Untargeted Metabolomics Using Probabilistic Modeling
title_full_unstemmed Pathway-Activity Likelihood Analysis and Metabolite Annotation for Untargeted Metabolomics Using Probabilistic Modeling
title_sort pathway-activity likelihood analysis and metabolite annotation for untargeted metabolomics using probabilistic modeling
publisher MDPI AG
series Metabolites
issn 2218-1989
publishDate 2020-05-01
description <b>Motivation</b>: Untargeted metabolomics comprehensively characterizes small molecules and elucidates activities of biochemical pathways within a biological sample. Despite computational advances, interpreting collected measurements and determining their biological role remains a challenge. <b>Results</b>: To interpret measurements, we present an inference-based approach, termed Probabilistic modeling for Untargeted Metabolomics Analysis (PUMA). Our approach captures metabolomics measurements and the biological network for the biological sample under study in a generative model and uses stochastic sampling to compute posterior probability distributions. PUMA predicts the likelihood of pathways being active, and then derives probabilistic annotations, which assign chemical identities to measurements. Unlike prior pathway analysis tools that analyze differentially active pathways, PUMA defines a pathway as <i>active</i> if the likelihood that the path generated the observed measurements is above a particular (user-defined) threshold. Due to the lack of “ground truth” metabolomics datasets, where all measurements are annotated and pathway activities are known, PUMA is validated on synthetic datasets that are designed to mimic cellular processes. PUMA, on average, outperforms pathway enrichment analysis by 8%. PUMA is applied to two case studies. PUMA suggests many biological meaningful pathways as active. Annotation results were in agreement to those obtained using other tools that utilize additional information in the form of spectral signatures. Importantly, PUMA annotates many measurements, suggesting 23 chemical identities for metabolites that were previously only identified as isomers, and a significant number of additional putative annotations over spectral database lookups. For an experimentally validated 50-compound dataset, annotations using PUMA yielded 0.833 precision and 0.676 recall.
topic machine learning
inference
untargeted metabolomics
biological network
metabolic model
url https://www.mdpi.com/2218-1989/10/5/183
work_keys_str_mv AT ramtinhosseini pathwayactivitylikelihoodanalysisandmetaboliteannotationforuntargetedmetabolomicsusingprobabilisticmodeling
AT nedahassanpour pathwayactivitylikelihoodanalysisandmetaboliteannotationforuntargetedmetabolomicsusingprobabilisticmodeling
AT lipingliu pathwayactivitylikelihoodanalysisandmetaboliteannotationforuntargetedmetabolomicsusingprobabilisticmodeling
AT sohahassoun pathwayactivitylikelihoodanalysisandmetaboliteannotationforuntargetedmetabolomicsusingprobabilisticmodeling
_version_ 1724554351866806272