SWISS MADE: Standardized WithIn Class Sum of Squares to evaluate methodologies and dataset elements.

Contemporary high dimensional biological assays, such as mRNA expression microarrays, regularly involve multiple data processing steps, such as experimental processing, computational processing, sample selection, or feature selection (i.e. gene selection), prior to deriving any biological conclusion...

Full description

Bibliographic Details
Main Authors: Christopher R Cabanski, Yuan Qi, Xiaoying Yin, Eric Bair, Michele C Hayward, Cheng Fan, Jianying Li, Matthew D Wilkerson, J S Marron, Charles M Perou, D Neil Hayes
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2010-03-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC2845619?pdf=render
id doaj-c1dafb5d3318402ba6e98c786a018ace
record_format Article
spelling doaj-c1dafb5d3318402ba6e98c786a018ace2020-11-24T22:16:17ZengPublic Library of Science (PLoS)PLoS ONE1932-62032010-03-0153e990510.1371/journal.pone.0009905SWISS MADE: Standardized WithIn Class Sum of Squares to evaluate methodologies and dataset elements.Christopher R CabanskiYuan QiXiaoying YinEric BairMichele C HaywardCheng FanJianying LiMatthew D WilkersonJ S MarronCharles M PerouD Neil HayesContemporary high dimensional biological assays, such as mRNA expression microarrays, regularly involve multiple data processing steps, such as experimental processing, computational processing, sample selection, or feature selection (i.e. gene selection), prior to deriving any biological conclusions. These steps can dramatically change the interpretation of an experiment. Evaluation of processing steps has received limited attention in the literature. It is not straightforward to evaluate different processing methods and investigators are often unsure of the best method. We present a simple statistical tool, Standardized WithIn class Sum of Squares (SWISS), that allows investigators to compare alternate data processing methods, such as different experimental methods, normalizations, or technologies, on a dataset in terms of how well they cluster a priori biological classes. SWISS uses Euclidean distance to determine which method does a better job of clustering the data elements based on a priori classifications. We apply SWISS to three different gene expression applications. The first application uses four different datasets to compare different experimental methods, normalizations, and gene sets. The second application, using data from the MicroArray Quality Control (MAQC) project, compares different microarray platforms. The third application compares different technologies: a single Agilent two-color microarray versus one lane of RNA-Seq. These applications give an indication of the variety of problems that SWISS can be helpful in solving. The SWISS analysis of one-color versus two-color microarrays provides investigators who use two-color arrays the opportunity to review their results in light of a single-channel analysis, with all of the associated benefits offered by this design. Analysis of the MACQ data shows differential intersite reproducibility by array platform. SWISS also shows that one lane of RNA-Seq clusters data by biological phenotypes as well as a single Agilent two-color microarray.http://europepmc.org/articles/PMC2845619?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Christopher R Cabanski
Yuan Qi
Xiaoying Yin
Eric Bair
Michele C Hayward
Cheng Fan
Jianying Li
Matthew D Wilkerson
J S Marron
Charles M Perou
D Neil Hayes
spellingShingle Christopher R Cabanski
Yuan Qi
Xiaoying Yin
Eric Bair
Michele C Hayward
Cheng Fan
Jianying Li
Matthew D Wilkerson
J S Marron
Charles M Perou
D Neil Hayes
SWISS MADE: Standardized WithIn Class Sum of Squares to evaluate methodologies and dataset elements.
PLoS ONE
author_facet Christopher R Cabanski
Yuan Qi
Xiaoying Yin
Eric Bair
Michele C Hayward
Cheng Fan
Jianying Li
Matthew D Wilkerson
J S Marron
Charles M Perou
D Neil Hayes
author_sort Christopher R Cabanski
title SWISS MADE: Standardized WithIn Class Sum of Squares to evaluate methodologies and dataset elements.
title_short SWISS MADE: Standardized WithIn Class Sum of Squares to evaluate methodologies and dataset elements.
title_full SWISS MADE: Standardized WithIn Class Sum of Squares to evaluate methodologies and dataset elements.
title_fullStr SWISS MADE: Standardized WithIn Class Sum of Squares to evaluate methodologies and dataset elements.
title_full_unstemmed SWISS MADE: Standardized WithIn Class Sum of Squares to evaluate methodologies and dataset elements.
title_sort swiss made: standardized within class sum of squares to evaluate methodologies and dataset elements.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2010-03-01
description Contemporary high dimensional biological assays, such as mRNA expression microarrays, regularly involve multiple data processing steps, such as experimental processing, computational processing, sample selection, or feature selection (i.e. gene selection), prior to deriving any biological conclusions. These steps can dramatically change the interpretation of an experiment. Evaluation of processing steps has received limited attention in the literature. It is not straightforward to evaluate different processing methods and investigators are often unsure of the best method. We present a simple statistical tool, Standardized WithIn class Sum of Squares (SWISS), that allows investigators to compare alternate data processing methods, such as different experimental methods, normalizations, or technologies, on a dataset in terms of how well they cluster a priori biological classes. SWISS uses Euclidean distance to determine which method does a better job of clustering the data elements based on a priori classifications. We apply SWISS to three different gene expression applications. The first application uses four different datasets to compare different experimental methods, normalizations, and gene sets. The second application, using data from the MicroArray Quality Control (MAQC) project, compares different microarray platforms. The third application compares different technologies: a single Agilent two-color microarray versus one lane of RNA-Seq. These applications give an indication of the variety of problems that SWISS can be helpful in solving. The SWISS analysis of one-color versus two-color microarrays provides investigators who use two-color arrays the opportunity to review their results in light of a single-channel analysis, with all of the associated benefits offered by this design. Analysis of the MACQ data shows differential intersite reproducibility by array platform. SWISS also shows that one lane of RNA-Seq clusters data by biological phenotypes as well as a single Agilent two-color microarray.
url http://europepmc.org/articles/PMC2845619?pdf=render
work_keys_str_mv AT christopherrcabanski swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT yuanqi swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT xiaoyingyin swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT ericbair swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT michelechayward swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT chengfan swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT jianyingli swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT matthewdwilkerson swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT jsmarron swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT charlesmperou swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT dneilhayes swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
_version_ 1725790896565256192