Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.

Identifying relevant signatures for clinical patient outcome is a fundamental task in high-throughput studies. Signatures, composed of features such as mRNAs, miRNAs, SNPs or other molecular variables, are often non-overlapping, even though they have been identified from similar experiments consider...

Full description

Bibliographic Details
Main Authors: Sangkyun Lee, Jörg Rahnenführer, Michel Lang, Katleen De Preter, Pieter Mestdagh, Jan Koster, Rogier Versteeg, Raymond L Stallings, Luigi Varesio, Shahab Asgharzadeh, Johannes H Schulte, Kathrin Fielitz, Melanie Schwermer, Katharina Morik, Alexander Schramm
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0108818
id doaj-7a99a75d80854110a1d4c93ea3186528
record_format Article
spelling doaj-7a99a75d80854110a1d4c93ea31865282021-03-04T08:54:44ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-01910e10881810.1371/journal.pone.0108818Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.Sangkyun LeeJörg RahnenführerMichel LangKatleen De PreterPieter MestdaghJan KosterRogier VersteegRaymond L StallingsLuigi VaresioShahab AsgharzadehJohannes H SchulteKathrin FielitzMelanie SchwermerKatharina MorikAlexander SchrammIdentifying relevant signatures for clinical patient outcome is a fundamental task in high-throughput studies. Signatures, composed of features such as mRNAs, miRNAs, SNPs or other molecular variables, are often non-overlapping, even though they have been identified from similar experiments considering samples with the same type of disease. The lack of a consensus is mostly due to the fact that sample sizes are far smaller than the numbers of candidate features to be considered, and therefore signature selection suffers from large variation. We propose a robust signature selection method that enhances the selection stability of penalized regression algorithms for predicting survival risk. Our method is based on an aggregation of multiple, possibly unstable, signatures obtained with the preconditioned lasso algorithm applied to random (internal) subsamples of a given cohort data, where the aggregated signature is shrunken by a simple thresholding strategy. The resulting method, RS-PL, is conceptually simple and easy to apply, relying on parameters automatically tuned by cross validation. Robust signature selection using RS-PL operates within an (external) subsampling framework to estimate the selection probabilities of features in multiple trials of RS-PL. These probabilities are used for identifying reliable features to be included in a signature. Our method was evaluated on microarray data sets from neuroblastoma, lung adenocarcinoma, and breast cancer patients, extracting robust and relevant signatures for predicting survival risk. Signatures obtained by our method achieved high prediction performance and robustness, consistently over the three data sets. Genes with high selection probability in our robust signatures have been reported as cancer-relevant. The ordering of predictor coefficients associated with signatures was well-preserved across multiple trials of RS-PL, demonstrating the capability of our method for identifying a transferable consensus signature. The software is available as an R package rsig at CRAN (http://cran.r-project.org).https://doi.org/10.1371/journal.pone.0108818
collection DOAJ
language English
format Article
sources DOAJ
author Sangkyun Lee
Jörg Rahnenführer
Michel Lang
Katleen De Preter
Pieter Mestdagh
Jan Koster
Rogier Versteeg
Raymond L Stallings
Luigi Varesio
Shahab Asgharzadeh
Johannes H Schulte
Kathrin Fielitz
Melanie Schwermer
Katharina Morik
Alexander Schramm
spellingShingle Sangkyun Lee
Jörg Rahnenführer
Michel Lang
Katleen De Preter
Pieter Mestdagh
Jan Koster
Rogier Versteeg
Raymond L Stallings
Luigi Varesio
Shahab Asgharzadeh
Johannes H Schulte
Kathrin Fielitz
Melanie Schwermer
Katharina Morik
Alexander Schramm
Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.
PLoS ONE
author_facet Sangkyun Lee
Jörg Rahnenführer
Michel Lang
Katleen De Preter
Pieter Mestdagh
Jan Koster
Rogier Versteeg
Raymond L Stallings
Luigi Varesio
Shahab Asgharzadeh
Johannes H Schulte
Kathrin Fielitz
Melanie Schwermer
Katharina Morik
Alexander Schramm
author_sort Sangkyun Lee
title Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.
title_short Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.
title_full Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.
title_fullStr Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.
title_full_unstemmed Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.
title_sort robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2014-01-01
description Identifying relevant signatures for clinical patient outcome is a fundamental task in high-throughput studies. Signatures, composed of features such as mRNAs, miRNAs, SNPs or other molecular variables, are often non-overlapping, even though they have been identified from similar experiments considering samples with the same type of disease. The lack of a consensus is mostly due to the fact that sample sizes are far smaller than the numbers of candidate features to be considered, and therefore signature selection suffers from large variation. We propose a robust signature selection method that enhances the selection stability of penalized regression algorithms for predicting survival risk. Our method is based on an aggregation of multiple, possibly unstable, signatures obtained with the preconditioned lasso algorithm applied to random (internal) subsamples of a given cohort data, where the aggregated signature is shrunken by a simple thresholding strategy. The resulting method, RS-PL, is conceptually simple and easy to apply, relying on parameters automatically tuned by cross validation. Robust signature selection using RS-PL operates within an (external) subsampling framework to estimate the selection probabilities of features in multiple trials of RS-PL. These probabilities are used for identifying reliable features to be included in a signature. Our method was evaluated on microarray data sets from neuroblastoma, lung adenocarcinoma, and breast cancer patients, extracting robust and relevant signatures for predicting survival risk. Signatures obtained by our method achieved high prediction performance and robustness, consistently over the three data sets. Genes with high selection probability in our robust signatures have been reported as cancer-relevant. The ordering of predictor coefficients associated with signatures was well-preserved across multiple trials of RS-PL, demonstrating the capability of our method for identifying a transferable consensus signature. The software is available as an R package rsig at CRAN (http://cran.r-project.org).
url https://doi.org/10.1371/journal.pone.0108818
work_keys_str_mv AT sangkyunlee robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT jorgrahnenfuhrer robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT michellang robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT katleendepreter robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT pietermestdagh robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT jankoster robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT rogierversteeg robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT raymondlstallings robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT luigivaresio robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT shahabasgharzadeh robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT johanneshschulte robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT kathrinfielitz robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT melanieschwermer robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT katharinamorik robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT alexanderschramm robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
_version_ 1714807555560046592