Merging Microarray Data, Robust Feature Selection, and Predicting Prognosis in Prostate Cancer

Motivation Individual microarray studies searching for prognostic biomarkers often have few samples and low statistical power; however, publicly accessible data sets make it possible to combine data across studies. Method We present a novel approach for combining microarray data across institutions...

Full description

Bibliographic Details
Main Authors: Jing Wang, Kim Anh Do, Sijin Wen, Spyros Tsavachidis, Timothy J. Mcdonnell, Christopher J. Logothetis, Kevin R. Coombes
Format: Article
Language:English
Published: SAGE Publishing 2006-01-01
Series:Cancer Informatics
Online Access:https://doi.org/10.1177/117693510600200009
id doaj-393b58dd3f1b401796059fd291749a80
record_format Article
spelling doaj-393b58dd3f1b401796059fd291749a802020-11-25T03:45:05ZengSAGE PublishingCancer Informatics1176-93512006-01-01210.1177/117693510600200009Merging Microarray Data, Robust Feature Selection, and Predicting Prognosis in Prostate CancerJing Wang0Kim Anh Do1Sijin Wen2Spyros Tsavachidis3Timothy J. Mcdonnell4Christopher J. Logothetis5Kevin R. Coombes6Department of Biostatistics and Applied Mathematics, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA.Department of Biostatistics and Applied Mathematics, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA.Department of Biostatistics and Applied Mathematics, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA.Department of Biostatistics and Applied Mathematics, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA.Department of Molecular Pathology, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA.Department of Genitourinary Medical Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA.Department of Biostatistics and Applied Mathematics, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA.Motivation Individual microarray studies searching for prognostic biomarkers often have few samples and low statistical power; however, publicly accessible data sets make it possible to combine data across studies. Method We present a novel approach for combining microarray data across institutions and platforms. We introduce a new algorithm, robust greedy feature selection (RGFS), to select predictive genes. Results We combined two prostate cancer microarray data sets, confirmed the appropriateness of the approach with the Kolmogorov-Smirnov goodness-of-fit test, and built several predictive models. The best logistic regression model with stepwise forward selection used 7 genes and had a misclassification rate of 31%. Models that combined LDA with different feature selection algorithms had misclassification rates between 19% and 33%, and the sets of genes in the models varied substantially during cross-validation. When we combined RGFS with LDA, the best model used two genes and had a misclassification rate of 15%. Availability Affymetrix U95Av2 array data are available at http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi . The cDNA microarray data are available through the Stanford Microarray Database ( http://cmgm.stanford.edu/pbrown/ ). GeneLink software is freely available at http://bioinformatics.mdanderson.org/GeneLink/ . DNA-Chip Analyzer software is publicly available at http://biosun1.harvard.edu/complab/dchip/ .https://doi.org/10.1177/117693510600200009
collection DOAJ
language English
format Article
sources DOAJ
author Jing Wang
Kim Anh Do
Sijin Wen
Spyros Tsavachidis
Timothy J. Mcdonnell
Christopher J. Logothetis
Kevin R. Coombes
spellingShingle Jing Wang
Kim Anh Do
Sijin Wen
Spyros Tsavachidis
Timothy J. Mcdonnell
Christopher J. Logothetis
Kevin R. Coombes
Merging Microarray Data, Robust Feature Selection, and Predicting Prognosis in Prostate Cancer
Cancer Informatics
author_facet Jing Wang
Kim Anh Do
Sijin Wen
Spyros Tsavachidis
Timothy J. Mcdonnell
Christopher J. Logothetis
Kevin R. Coombes
author_sort Jing Wang
title Merging Microarray Data, Robust Feature Selection, and Predicting Prognosis in Prostate Cancer
title_short Merging Microarray Data, Robust Feature Selection, and Predicting Prognosis in Prostate Cancer
title_full Merging Microarray Data, Robust Feature Selection, and Predicting Prognosis in Prostate Cancer
title_fullStr Merging Microarray Data, Robust Feature Selection, and Predicting Prognosis in Prostate Cancer
title_full_unstemmed Merging Microarray Data, Robust Feature Selection, and Predicting Prognosis in Prostate Cancer
title_sort merging microarray data, robust feature selection, and predicting prognosis in prostate cancer
publisher SAGE Publishing
series Cancer Informatics
issn 1176-9351
publishDate 2006-01-01
description Motivation Individual microarray studies searching for prognostic biomarkers often have few samples and low statistical power; however, publicly accessible data sets make it possible to combine data across studies. Method We present a novel approach for combining microarray data across institutions and platforms. We introduce a new algorithm, robust greedy feature selection (RGFS), to select predictive genes. Results We combined two prostate cancer microarray data sets, confirmed the appropriateness of the approach with the Kolmogorov-Smirnov goodness-of-fit test, and built several predictive models. The best logistic regression model with stepwise forward selection used 7 genes and had a misclassification rate of 31%. Models that combined LDA with different feature selection algorithms had misclassification rates between 19% and 33%, and the sets of genes in the models varied substantially during cross-validation. When we combined RGFS with LDA, the best model used two genes and had a misclassification rate of 15%. Availability Affymetrix U95Av2 array data are available at http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi . The cDNA microarray data are available through the Stanford Microarray Database ( http://cmgm.stanford.edu/pbrown/ ). GeneLink software is freely available at http://bioinformatics.mdanderson.org/GeneLink/ . DNA-Chip Analyzer software is publicly available at http://biosun1.harvard.edu/complab/dchip/ .
url https://doi.org/10.1177/117693510600200009
work_keys_str_mv AT jingwang mergingmicroarraydatarobustfeatureselectionandpredictingprognosisinprostatecancer
AT kimanhdo mergingmicroarraydatarobustfeatureselectionandpredictingprognosisinprostatecancer
AT sijinwen mergingmicroarraydatarobustfeatureselectionandpredictingprognosisinprostatecancer
AT spyrostsavachidis mergingmicroarraydatarobustfeatureselectionandpredictingprognosisinprostatecancer
AT timothyjmcdonnell mergingmicroarraydatarobustfeatureselectionandpredictingprognosisinprostatecancer
AT christopherjlogothetis mergingmicroarraydatarobustfeatureselectionandpredictingprognosisinprostatecancer
AT kevinrcoombes mergingmicroarraydatarobustfeatureselectionandpredictingprognosisinprostatecancer
_version_ 1724511468430295040