Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes

Background The popularity of a large number of microarray applications has in cancer research led to the development of predictive or prognostic gene expression profiles. However, the diversity of microarray platforms has made the full validation of such profiles and their related gene lists across...

Full description

Bibliographic Details
Main Authors: Mark Burton, Mads Thomassen, Qihua Tan, Torben A. Kruse
Format: Article
Language:English
Published: SAGE Publishing 2012-01-01
Series:Cancer Informatics
Online Access:https://doi.org/10.4137/CIN.S10375
id doaj-735ea136898149f0884671669e264039
record_format Article
spelling doaj-735ea136898149f0884671669e2640392020-11-25T03:45:05ZengSAGE PublishingCancer Informatics1176-93512012-01-011110.4137/CIN.S10375Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single GenesMark Burton0Mads Thomassen1Qihua Tan2Torben A. Kruse3Department of Clinical Genetics, Odense University Hospital, Odense, Denmark.Department of Clinical Genetics, Odense University Hospital, Odense, Denmark.Institute of Public Health, University of Southern Denmark, Odense, Denmark.Department of Clinical Genetics, Odense University Hospital, Odense, Denmark.Background The popularity of a large number of microarray applications has in cancer research led to the development of predictive or prognostic gene expression profiles. However, the diversity of microarray platforms has made the full validation of such profiles and their related gene lists across studies difficult and, at the level of classification accuracies, rarely validated in multiple independent datasets. Frequently, while the individual genes between such lists may not match, genes with same function are included across such gene lists. Development of such lists does not take into account the fact that genes can be grouped together as metagenes (MGs) based on common characteristics such as pathways, regulation, or genomic location. Such MGs might be used as features in building a predictive model applicable for classifying independent data. It is, therefore, demanding to systematically compare independent validation of gene lists or classifiers based on metagene or individual gene (SG) features. Methods In this study we compared the performance of either metagene- or single gene-based feature sets and classifiers using random forest and two support vector machines for classifier building. The performance within the same dataset, feature set validation performance, and validation performance of entire classifiers in strictly independent datasets were assessed by 10 times repeated 10-fold cross validation, leave-one-out cross validation, and one-fold validation, respectively. To test the significance of the performance difference between MG- and SG-features/classifiers, we used a repeated down-sampled binomial test approach. Results MG- and SG-feature sets are transferable and perform well for training and testing prediction of metastasis outcome in strictly independent data sets, both between different and within similar microarray platforms, while classifiers had a poorer performance when validated in strictly independent datasets. The study showed that MG- and SG-feature sets perform equally well in classifying independent data. Furthermore, SG-classifiers significantly outperformed MG-classifier when validation is conducted between datasets using similar platforms, while no significant performance difference was found when validation was performed between different platforms. Conclusion Prediction of metastasis outcome in lymph node–negative patients by MG- and SG-classifiers showed that SG-classifiers performed significantly better than MG-classifiers when validated in independent data based on the same microarray platform as used for developing the classifier. However, the MG- and SG-classifiers had similar performance when conducting classifier validation in independent data based on a different microarray platform. The latter was also true when only validating sets of MG- and SG-features in independent datasets, both between and within similar and different platforms.https://doi.org/10.4137/CIN.S10375
collection DOAJ
language English
format Article
sources DOAJ
author Mark Burton
Mads Thomassen
Qihua Tan
Torben A. Kruse
spellingShingle Mark Burton
Mads Thomassen
Qihua Tan
Torben A. Kruse
Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes
Cancer Informatics
author_facet Mark Burton
Mads Thomassen
Qihua Tan
Torben A. Kruse
author_sort Mark Burton
title Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes
title_short Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes
title_full Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes
title_fullStr Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes
title_full_unstemmed Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes
title_sort prediction of breast cancer metastasis by gene expression profiles: a comparison of metagenes and single genes
publisher SAGE Publishing
series Cancer Informatics
issn 1176-9351
publishDate 2012-01-01
description Background The popularity of a large number of microarray applications has in cancer research led to the development of predictive or prognostic gene expression profiles. However, the diversity of microarray platforms has made the full validation of such profiles and their related gene lists across studies difficult and, at the level of classification accuracies, rarely validated in multiple independent datasets. Frequently, while the individual genes between such lists may not match, genes with same function are included across such gene lists. Development of such lists does not take into account the fact that genes can be grouped together as metagenes (MGs) based on common characteristics such as pathways, regulation, or genomic location. Such MGs might be used as features in building a predictive model applicable for classifying independent data. It is, therefore, demanding to systematically compare independent validation of gene lists or classifiers based on metagene or individual gene (SG) features. Methods In this study we compared the performance of either metagene- or single gene-based feature sets and classifiers using random forest and two support vector machines for classifier building. The performance within the same dataset, feature set validation performance, and validation performance of entire classifiers in strictly independent datasets were assessed by 10 times repeated 10-fold cross validation, leave-one-out cross validation, and one-fold validation, respectively. To test the significance of the performance difference between MG- and SG-features/classifiers, we used a repeated down-sampled binomial test approach. Results MG- and SG-feature sets are transferable and perform well for training and testing prediction of metastasis outcome in strictly independent data sets, both between different and within similar microarray platforms, while classifiers had a poorer performance when validated in strictly independent datasets. The study showed that MG- and SG-feature sets perform equally well in classifying independent data. Furthermore, SG-classifiers significantly outperformed MG-classifier when validation is conducted between datasets using similar platforms, while no significant performance difference was found when validation was performed between different platforms. Conclusion Prediction of metastasis outcome in lymph node–negative patients by MG- and SG-classifiers showed that SG-classifiers performed significantly better than MG-classifiers when validated in independent data based on the same microarray platform as used for developing the classifier. However, the MG- and SG-classifiers had similar performance when conducting classifier validation in independent data based on a different microarray platform. The latter was also true when only validating sets of MG- and SG-features in independent datasets, both between and within similar and different platforms.
url https://doi.org/10.4137/CIN.S10375
work_keys_str_mv AT markburton predictionofbreastcancermetastasisbygeneexpressionprofilesacomparisonofmetagenesandsinglegenes
AT madsthomassen predictionofbreastcancermetastasisbygeneexpressionprofilesacomparisonofmetagenesandsinglegenes
AT qihuatan predictionofbreastcancermetastasisbygeneexpressionprofilesacomparisonofmetagenesandsinglegenes
AT torbenakruse predictionofbreastcancermetastasisbygeneexpressionprofilesacomparisonofmetagenesandsinglegenes
_version_ 1724511500779913216