Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes

Background The popularity of a large number of microarray applications has in cancer research led to the development of predictive or prognostic gene expression profiles. However, the diversity of microarray platforms has made the full validation of such profiles and their related gene lists across...

Full description

Bibliographic Details
Main Authors:	Mark Burton, Mads Thomassen, Qihua Tan, Torben A. Kruse
Format:	Article
Language:	English
Published:	SAGE Publishing 2012-01-01
Series:	Cancer Informatics
Online Access:	https://doi.org/10.4137/CIN.S10375

id	doaj-735ea136898149f0884671669e264039
record_format	Article
spelling	doaj-735ea136898149f0884671669e2640392020-11-25T03:45:05ZengSAGE PublishingCancer Informatics1176-93512012-01-011110.4137/CIN.S10375Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single GenesMark Burton0Mads Thomassen1Qihua Tan2Torben A. Kruse3Department of Clinical Genetics, Odense University Hospital, Odense, Denmark.Department of Clinical Genetics, Odense University Hospital, Odense, Denmark.Institute of Public Health, University of Southern Denmark, Odense, Denmark.Department of Clinical Genetics, Odense University Hospital, Odense, Denmark.Background The popularity of a large number of microarray applications has in cancer research led to the development of predictive or prognostic gene expression profiles. However, the diversity of microarray platforms has made the full validation of such profiles and their related gene lists across studies difficult and, at the level of classification accuracies, rarely validated in multiple independent datasets. Frequently, while the individual genes between such lists may not match, genes with same function are included across such gene lists. Development of such lists does not take into account the fact that genes can be grouped together as metagenes (MGs) based on common characteristics such as pathways, regulation, or genomic location. Such MGs might be used as features in building a predictive model applicable for classifying independent data. It is, therefore, demanding to systematically compare independent validation of gene lists or classifiers based on metagene or individual gene (SG) features. Methods In this study we compared the performance of either metagene- or single gene-based feature sets and classifiers using random forest and two support vector machines for classifier building. The performance within the same dataset, feature set validation performance, and validation performance of entire classifiers in strictly independent datasets were assessed by 10 times repeated 10-fold cross validation, leave-one-out cross validation, and one-fold validation, respectively. To test the significance of the performance difference between MG- and SG-features/classifiers, we used a repeated down-sampled binomial test approach. Results MG- and SG-feature sets are transferable and perform well for training and testing prediction of metastasis outcome in strictly independent data sets, both between different and within similar microarray platforms, while classifiers had a poorer performance when validated in strictly independent datasets. The study showed that MG- and SG-feature sets perform equally well in classifying independent data. Furthermore, SG-classifiers significantly outperformed MG-classifier when validation is conducted between datasets using similar platforms, while no significant performance difference was found when validation was performed between different platforms. Conclusion Prediction of metastasis outcome in lymph node–negative patients by MG- and SG-classifiers showed that SG-classifiers performed significantly better than MG-classifiers when validated in independent data based on the same microarray platform as used for developing the classifier. However, the MG- and SG-classifiers had similar performance when conducting classifier validation in independent data based on a different microarray platform. The latter was also true when only validating sets of MG- and SG-features in independent datasets, both between and within similar and different platforms.https://doi.org/10.4137/CIN.S10375
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Mark Burton Mads Thomassen Qihua Tan Torben A. Kruse
spellingShingle	Mark Burton Mads Thomassen Qihua Tan Torben A. Kruse Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes Cancer Informatics
author_facet	Mark Burton Mads Thomassen Qihua Tan Torben A. Kruse
author_sort	Mark Burton
title	Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes
title_short	Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes
title_full	Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes
title_fullStr	Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes
title_full_unstemmed	Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes
title_sort	prediction of breast cancer metastasis by gene expression profiles: a comparison of metagenes and single genes
publisher	SAGE Publishing
series	Cancer Informatics
issn	1176-9351
publishDate	2012-01-01
description	Background The popularity of a large number of microarray applications has in cancer research led to the development of predictive or prognostic gene expression profiles. However, the diversity of microarray platforms has made the full validation of such profiles and their related gene lists across studies difficult and, at the level of classification accuracies, rarely validated in multiple independent datasets. Frequently, while the individual genes between such lists may not match, genes with same function are included across such gene lists. Development of such lists does not take into account the fact that genes can be grouped together as metagenes (MGs) based on common characteristics such as pathways, regulation, or genomic location. Such MGs might be used as features in building a predictive model applicable for classifying independent data. It is, therefore, demanding to systematically compare independent validation of gene lists or classifiers based on metagene or individual gene (SG) features. Methods In this study we compared the performance of either metagene- or single gene-based feature sets and classifiers using random forest and two support vector machines for classifier building. The performance within the same dataset, feature set validation performance, and validation performance of entire classifiers in strictly independent datasets were assessed by 10 times repeated 10-fold cross validation, leave-one-out cross validation, and one-fold validation, respectively. To test the significance of the performance difference between MG- and SG-features/classifiers, we used a repeated down-sampled binomial test approach. Results MG- and SG-feature sets are transferable and perform well for training and testing prediction of metastasis outcome in strictly independent data sets, both between different and within similar microarray platforms, while classifiers had a poorer performance when validated in strictly independent datasets. The study showed that MG- and SG-feature sets perform equally well in classifying independent data. Furthermore, SG-classifiers significantly outperformed MG-classifier when validation is conducted between datasets using similar platforms, while no significant performance difference was found when validation was performed between different platforms. Conclusion Prediction of metastasis outcome in lymph node–negative patients by MG- and SG-classifiers showed that SG-classifiers performed significantly better than MG-classifiers when validated in independent data based on the same microarray platform as used for developing the classifier. However, the MG- and SG-classifiers had similar performance when conducting classifier validation in independent data based on a different microarray platform. The latter was also true when only validating sets of MG- and SG-features in independent datasets, both between and within similar and different platforms.
url	https://doi.org/10.4137/CIN.S10375
work_keys_str_mv	AT markburton predictionofbreastcancermetastasisbygeneexpressionprofilesacomparisonofmetagenesandsinglegenes AT madsthomassen predictionofbreastcancermetastasisbygeneexpressionprofilesacomparisonofmetagenesandsinglegenes AT qihuatan predictionofbreastcancermetastasisbygeneexpressionprofilesacomparisonofmetagenesandsinglegenes AT torbenakruse predictionofbreastcancermetastasisbygeneexpressionprofilesacomparisonofmetagenesandsinglegenes
_version_	1724511500779913216

Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes

Similar Items