Comparison of Two Output-Coding Strategies for Multi-Class Tumor Classification Using Gene Expression Data and Latent Variable Model as Binary Classifier

Multi-class cancer classification based on microarray data is described. A generalized output-coding scheme based on One Versus One (OVO) combined with Latent Variable Model (LVM) is used. Results from the proposed One Versus One (OVO) output-coding strategy is compared with the results obtained fro...

Full description

Bibliographic Details
Main Authors: Sandeep J. Joseph, Kelly R. Robbins, Wensheng Zhang, Romdhane Rekaya
Format: Article
Language:English
Published: SAGE Publishing 2010-01-01
Series:Cancer Informatics
Online Access:https://doi.org/10.4137/CIN.S3827
id doaj-cf98b27c8db04706b4f09ace5472dd9d
record_format Article
spelling doaj-cf98b27c8db04706b4f09ace5472dd9d2020-11-25T03:45:05ZengSAGE PublishingCancer Informatics1176-93512010-01-01910.4137/CIN.S3827Comparison of Two Output-Coding Strategies for Multi-Class Tumor Classification Using Gene Expression Data and Latent Variable Model as Binary ClassifierSandeep J. Joseph0Kelly R. Robbins1Wensheng Zhang2Romdhane Rekaya3Rhodes Centre for Animal and Dairy Science, University of Georgia, Athens, GA-30605, USA.Rhodes Centre for Animal and Dairy Science, University of Georgia, Athens, GA-30605, USA.Rhodes Centre for Animal and Dairy Science, University of Georgia, Athens, GA-30605, USA.Department of Statistics, University of Georgia, Athens, GA-30605, USA.Multi-class cancer classification based on microarray data is described. A generalized output-coding scheme based on One Versus One (OVO) combined with Latent Variable Model (LVM) is used. Results from the proposed One Versus One (OVO) output-coding strategy is compared with the results obtained from the generalized One Versus All (OVA) method and their efficiencies of using them for multi-class tumor classification have been studied. This comparative study was done using two microarray gene expression data: Global Cancer Map (GCM) dataset and brain cancer (BC) dataset. Primary feature selection was based on fold change and penalized t-statistics. Evaluation was conducted with varying feature numbers. The OVO coding strategy worked quite well with the BC data, while both OVO and O VA results seemed to be similar for the GCM data. The selection of output coding methods for combining binary classifiers for multi-class tumor classification depends on the number of tumor types considered, the discrepancies between the tumor samples used for training as well as the heterogeneity of expression within the cancer subtypes used as training data.https://doi.org/10.4137/CIN.S3827
collection DOAJ
language English
format Article
sources DOAJ
author Sandeep J. Joseph
Kelly R. Robbins
Wensheng Zhang
Romdhane Rekaya
spellingShingle Sandeep J. Joseph
Kelly R. Robbins
Wensheng Zhang
Romdhane Rekaya
Comparison of Two Output-Coding Strategies for Multi-Class Tumor Classification Using Gene Expression Data and Latent Variable Model as Binary Classifier
Cancer Informatics
author_facet Sandeep J. Joseph
Kelly R. Robbins
Wensheng Zhang
Romdhane Rekaya
author_sort Sandeep J. Joseph
title Comparison of Two Output-Coding Strategies for Multi-Class Tumor Classification Using Gene Expression Data and Latent Variable Model as Binary Classifier
title_short Comparison of Two Output-Coding Strategies for Multi-Class Tumor Classification Using Gene Expression Data and Latent Variable Model as Binary Classifier
title_full Comparison of Two Output-Coding Strategies for Multi-Class Tumor Classification Using Gene Expression Data and Latent Variable Model as Binary Classifier
title_fullStr Comparison of Two Output-Coding Strategies for Multi-Class Tumor Classification Using Gene Expression Data and Latent Variable Model as Binary Classifier
title_full_unstemmed Comparison of Two Output-Coding Strategies for Multi-Class Tumor Classification Using Gene Expression Data and Latent Variable Model as Binary Classifier
title_sort comparison of two output-coding strategies for multi-class tumor classification using gene expression data and latent variable model as binary classifier
publisher SAGE Publishing
series Cancer Informatics
issn 1176-9351
publishDate 2010-01-01
description Multi-class cancer classification based on microarray data is described. A generalized output-coding scheme based on One Versus One (OVO) combined with Latent Variable Model (LVM) is used. Results from the proposed One Versus One (OVO) output-coding strategy is compared with the results obtained from the generalized One Versus All (OVA) method and their efficiencies of using them for multi-class tumor classification have been studied. This comparative study was done using two microarray gene expression data: Global Cancer Map (GCM) dataset and brain cancer (BC) dataset. Primary feature selection was based on fold change and penalized t-statistics. Evaluation was conducted with varying feature numbers. The OVO coding strategy worked quite well with the BC data, while both OVO and O VA results seemed to be similar for the GCM data. The selection of output coding methods for combining binary classifiers for multi-class tumor classification depends on the number of tumor types considered, the discrepancies between the tumor samples used for training as well as the heterogeneity of expression within the cancer subtypes used as training data.
url https://doi.org/10.4137/CIN.S3827
work_keys_str_mv AT sandeepjjoseph comparisonoftwooutputcodingstrategiesformulticlasstumorclassificationusinggeneexpressiondataandlatentvariablemodelasbinaryclassifier
AT kellyrrobbins comparisonoftwooutputcodingstrategiesformulticlasstumorclassificationusinggeneexpressiondataandlatentvariablemodelasbinaryclassifier
AT wenshengzhang comparisonoftwooutputcodingstrategiesformulticlasstumorclassificationusinggeneexpressiondataandlatentvariablemodelasbinaryclassifier
AT romdhanerekaya comparisonoftwooutputcodingstrategiesformulticlasstumorclassificationusinggeneexpressiondataandlatentvariablemodelasbinaryclassifier
_version_ 1724511576278433792