Improving stability of prediction models based on correlated omics data by using network approaches.

Building prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the prese...

Full description

Bibliographic Details
Main Authors:	Renaud Tissier, Jeanine Houwing-Duistermaat, Mar Rodríguez-Girondo
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2018-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC5819809?pdf=render

id	doaj-42f50a69fe984d24b08304eab5fec140
record_format	Article
spelling	doaj-42f50a69fe984d24b08304eab5fec1402020-11-25T02:08:49ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-01132e019285310.1371/journal.pone.0192853Improving stability of prediction models based on correlated omics data by using network approaches.Renaud TissierJeanine Houwing-DuistermaatMar Rodríguez-GirondoBuilding prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the presence of correlation in the datasets, it is difficult to select the best model and application of these methods yields unstable results. We propose a novel strategy for model selection where the obtained models also perform well in terms of overall predictability. Several three step approaches are considered, where the steps are 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model incorporating the information on the modules. For the first step, we use weighted correlation networks and Gaussian graphical modelling. Identification of groups of features is performed by hierarchical clustering. The grouping information is included in the prediction model by using group-based variable selection or group-specific penalization. We compare the performance of our new approaches with standard regularized regression via simulations. Based on these results we provide recommendations for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets. Finally we illustrate the advantages of our approach by application of the methodology to two problems, namely prediction of body mass index in the DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) and prediction of response of each breast cancer cell line to treatment with specific drugs using a breast cancer cell lines pharmacogenomics dataset.http://europepmc.org/articles/PMC5819809?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Renaud Tissier Jeanine Houwing-Duistermaat Mar Rodríguez-Girondo
spellingShingle	Renaud Tissier Jeanine Houwing-Duistermaat Mar Rodríguez-Girondo Improving stability of prediction models based on correlated omics data by using network approaches. PLoS ONE
author_facet	Renaud Tissier Jeanine Houwing-Duistermaat Mar Rodríguez-Girondo
author_sort	Renaud Tissier
title	Improving stability of prediction models based on correlated omics data by using network approaches.
title_short	Improving stability of prediction models based on correlated omics data by using network approaches.
title_full	Improving stability of prediction models based on correlated omics data by using network approaches.
title_fullStr	Improving stability of prediction models based on correlated omics data by using network approaches.
title_full_unstemmed	Improving stability of prediction models based on correlated omics data by using network approaches.
title_sort	improving stability of prediction models based on correlated omics data by using network approaches.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2018-01-01
description	Building prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the presence of correlation in the datasets, it is difficult to select the best model and application of these methods yields unstable results. We propose a novel strategy for model selection where the obtained models also perform well in terms of overall predictability. Several three step approaches are considered, where the steps are 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model incorporating the information on the modules. For the first step, we use weighted correlation networks and Gaussian graphical modelling. Identification of groups of features is performed by hierarchical clustering. The grouping information is included in the prediction model by using group-based variable selection or group-specific penalization. We compare the performance of our new approaches with standard regularized regression via simulations. Based on these results we provide recommendations for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets. Finally we illustrate the advantages of our approach by application of the methodology to two problems, namely prediction of body mass index in the DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) and prediction of response of each breast cancer cell line to treatment with specific drugs using a breast cancer cell lines pharmacogenomics dataset.
url	http://europepmc.org/articles/PMC5819809?pdf=render
work_keys_str_mv	AT renaudtissier improvingstabilityofpredictionmodelsbasedoncorrelatedomicsdatabyusingnetworkapproaches AT jeaninehouwingduistermaat improvingstabilityofpredictionmodelsbasedoncorrelatedomicsdatabyusingnetworkapproaches AT marrodriguezgirondo improvingstabilityofpredictionmodelsbasedoncorrelatedomicsdatabyusingnetworkapproaches
_version_	1724925204389429248

Improving stability of prediction models based on correlated omics data by using network approaches.

Similar Items