Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge

Abstract Background Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. Ho...

Full description

Bibliographic Details
Main Authors: Tara Eicher, Andrew Patt, Esko Kautto, Raghu Machiraju, Ewy Mathé, Yan Zhang
Format: Article
Language:English
Published: BMC 2019-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-019-3253-z
id doaj-08bc8c48c42543ad81b05381facb78c2
record_format Article
spelling doaj-08bc8c48c42543ad81b05381facb78c22020-12-20T12:42:24ZengBMCBMC Bioinformatics1471-21052019-12-0120S2411610.1186/s12859-019-3253-zChallenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challengeTara Eicher0Andrew Patt1Esko Kautto2Raghu Machiraju3Ewy Mathé4Yan Zhang5Department of Computer Science and Engineering, The Ohio State UniversityDepartment of Biomedical Informatics, College of Medicine, The Ohio State UniversityDepartment of Biomedical Informatics, College of Medicine, The Ohio State UniversityDepartment of Computer Science and Engineering, The Ohio State UniversityDepartment of Biomedical Informatics, College of Medicine, The Ohio State UniversityDepartment of Biomedical Informatics, College of Medicine, The Ohio State UniversityAbstract Background Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. However, proteomic data is not as abundant as genomic data, and it is thus beneficial to use genomic features to predict protein abundances when matching proteomic samples or measurements within samples are lacking. Results We evaluate and compare four data-driven models for prediction of proteomic data from mRNA measured in breast and ovarian cancers using the 2017 DREAM Proteogenomics Challenge data. Our results show that Bayesian network, random forests, LASSO, and fuzzy logic approaches can predict protein abundance levels with median ground truth-predicted correlation values between 0.2 and 0.5. However, the most accurately predicted proteins differ considerably between approaches. Conclusions In addition to benchmarking aforementioned machine learning approaches for predicting protein levels from transcript levels, we discuss challenges and potential solutions in state-of-the-art proteogenomic analyses.https://doi.org/10.1186/s12859-019-3253-zProteogenomicsmRNARandom forestsFuzzy logicBayesian networks
collection DOAJ
language English
format Article
sources DOAJ
author Tara Eicher
Andrew Patt
Esko Kautto
Raghu Machiraju
Ewy Mathé
Yan Zhang
spellingShingle Tara Eicher
Andrew Patt
Esko Kautto
Raghu Machiraju
Ewy Mathé
Yan Zhang
Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
BMC Bioinformatics
Proteogenomics
mRNA
Random forests
Fuzzy logic
Bayesian networks
author_facet Tara Eicher
Andrew Patt
Esko Kautto
Raghu Machiraju
Ewy Mathé
Yan Zhang
author_sort Tara Eicher
title Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_short Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_full Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_fullStr Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_full_unstemmed Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_sort challenges in proteogenomics: a comparison of analysis methods with the case study of the dream proteogenomics sub-challenge
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-12-01
description Abstract Background Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. However, proteomic data is not as abundant as genomic data, and it is thus beneficial to use genomic features to predict protein abundances when matching proteomic samples or measurements within samples are lacking. Results We evaluate and compare four data-driven models for prediction of proteomic data from mRNA measured in breast and ovarian cancers using the 2017 DREAM Proteogenomics Challenge data. Our results show that Bayesian network, random forests, LASSO, and fuzzy logic approaches can predict protein abundance levels with median ground truth-predicted correlation values between 0.2 and 0.5. However, the most accurately predicted proteins differ considerably between approaches. Conclusions In addition to benchmarking aforementioned machine learning approaches for predicting protein levels from transcript levels, we discuss challenges and potential solutions in state-of-the-art proteogenomic analyses.
topic Proteogenomics
mRNA
Random forests
Fuzzy logic
Bayesian networks
url https://doi.org/10.1186/s12859-019-3253-z
work_keys_str_mv AT taraeicher challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge
AT andrewpatt challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge
AT eskokautto challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge
AT raghumachiraju challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge
AT ewymathe challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge
AT yanzhang challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge
_version_ 1724376105930981376