Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge

Abstract Background Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. Ho...

Full description

Bibliographic Details
Main Authors:	Tara Eicher, Andrew Patt, Esko Kautto, Raghu Machiraju, Ewy Mathé, Yan Zhang
Format:	Article
Language:	English
Published:	BMC 2019-12-01
Series:	BMC Bioinformatics
Subjects:	Proteogenomics mRNA Random forests Fuzzy logic Bayesian networks
Online Access:	https://doi.org/10.1186/s12859-019-3253-z

id	doaj-08bc8c48c42543ad81b05381facb78c2
record_format	Article
spelling	doaj-08bc8c48c42543ad81b05381facb78c22020-12-20T12:42:24ZengBMCBMC Bioinformatics1471-21052019-12-0120S2411610.1186/s12859-019-3253-zChallenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challengeTara Eicher0Andrew Patt1Esko Kautto2Raghu Machiraju3Ewy Mathé4Yan Zhang5Department of Computer Science and Engineering, The Ohio State UniversityDepartment of Biomedical Informatics, College of Medicine, The Ohio State UniversityDepartment of Biomedical Informatics, College of Medicine, The Ohio State UniversityDepartment of Computer Science and Engineering, The Ohio State UniversityDepartment of Biomedical Informatics, College of Medicine, The Ohio State UniversityDepartment of Biomedical Informatics, College of Medicine, The Ohio State UniversityAbstract Background Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. However, proteomic data is not as abundant as genomic data, and it is thus beneficial to use genomic features to predict protein abundances when matching proteomic samples or measurements within samples are lacking. Results We evaluate and compare four data-driven models for prediction of proteomic data from mRNA measured in breast and ovarian cancers using the 2017 DREAM Proteogenomics Challenge data. Our results show that Bayesian network, random forests, LASSO, and fuzzy logic approaches can predict protein abundance levels with median ground truth-predicted correlation values between 0.2 and 0.5. However, the most accurately predicted proteins differ considerably between approaches. Conclusions In addition to benchmarking aforementioned machine learning approaches for predicting protein levels from transcript levels, we discuss challenges and potential solutions in state-of-the-art proteogenomic analyses.https://doi.org/10.1186/s12859-019-3253-zProteogenomicsmRNARandom forestsFuzzy logicBayesian networks
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Tara Eicher Andrew Patt Esko Kautto Raghu Machiraju Ewy Mathé Yan Zhang
spellingShingle	Tara Eicher Andrew Patt Esko Kautto Raghu Machiraju Ewy Mathé Yan Zhang Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge BMC Bioinformatics Proteogenomics mRNA Random forests Fuzzy logic Bayesian networks
author_facet	Tara Eicher Andrew Patt Esko Kautto Raghu Machiraju Ewy Mathé Yan Zhang
author_sort	Tara Eicher
title	Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_short	Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_full	Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_fullStr	Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_full_unstemmed	Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_sort	challenges in proteogenomics: a comparison of analysis methods with the case study of the dream proteogenomics sub-challenge
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2019-12-01
description	Abstract Background Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. However, proteomic data is not as abundant as genomic data, and it is thus beneficial to use genomic features to predict protein abundances when matching proteomic samples or measurements within samples are lacking. Results We evaluate and compare four data-driven models for prediction of proteomic data from mRNA measured in breast and ovarian cancers using the 2017 DREAM Proteogenomics Challenge data. Our results show that Bayesian network, random forests, LASSO, and fuzzy logic approaches can predict protein abundance levels with median ground truth-predicted correlation values between 0.2 and 0.5. However, the most accurately predicted proteins differ considerably between approaches. Conclusions In addition to benchmarking aforementioned machine learning approaches for predicting protein levels from transcript levels, we discuss challenges and potential solutions in state-of-the-art proteogenomic analyses.
topic	Proteogenomics mRNA Random forests Fuzzy logic Bayesian networks
url	https://doi.org/10.1186/s12859-019-3253-z
work_keys_str_mv	AT taraeicher challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge AT andrewpatt challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge AT eskokautto challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge AT raghumachiraju challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge AT ewymathe challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge AT yanzhang challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge
_version_	1724376105930981376

Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge

Similar Items