Identifying and Assessing Interesting Subgroups in a Heterogeneous Population

Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecogn...

Full description

Bibliographic Details
Main Authors: Woojoo Lee, Andrey Alexeyenko, Maria Pernemalm, Justine Guegan, Philippe Dessen, Vladimir Lazar, Janne Lehtiö, Yudi Pawitan
Format: Article
Language:English
Published: Hindawi Limited 2015-01-01
Series:BioMed Research International
Online Access:http://dx.doi.org/10.1155/2015/462549
id doaj-9b3ba3043ecf4b7b90207e0fb1214a43
record_format Article
spelling doaj-9b3ba3043ecf4b7b90207e0fb1214a432020-11-24T22:51:20ZengHindawi LimitedBioMed Research International2314-61332314-61412015-01-01201510.1155/2015/462549462549Identifying and Assessing Interesting Subgroups in a Heterogeneous PopulationWoojoo Lee0Andrey Alexeyenko1Maria Pernemalm2Justine Guegan3Philippe Dessen4Vladimir Lazar5Janne Lehtiö6Yudi Pawitan7Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177 Stockholm, SwedenDepartment of Microbiology, Tumour and Cell Biology, Bioinformatics Infrastructure for Life Sciences, Science for Life Laboratory, Karolinska Institutet, 17177 Stockholm, SwedenDepartment of Oncology and Pathology, Science for Life Laboratory, Karolinska Institutet, 17121 Solna, SwedenGenomics, Institut Gustave Roussy, F-94805 Villejuif, FranceGenomics, Institut Gustave Roussy, F-94805 Villejuif, FranceGenomics, Institut Gustave Roussy, F-94805 Villejuif, FranceDepartment of Oncology and Pathology, Science for Life Laboratory, Karolinska Institutet, 17121 Solna, SwedenDepartment of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177 Stockholm, SwedenBiological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability—the basis of cluster generation—is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided.http://dx.doi.org/10.1155/2015/462549
collection DOAJ
language English
format Article
sources DOAJ
author Woojoo Lee
Andrey Alexeyenko
Maria Pernemalm
Justine Guegan
Philippe Dessen
Vladimir Lazar
Janne Lehtiö
Yudi Pawitan
spellingShingle Woojoo Lee
Andrey Alexeyenko
Maria Pernemalm
Justine Guegan
Philippe Dessen
Vladimir Lazar
Janne Lehtiö
Yudi Pawitan
Identifying and Assessing Interesting Subgroups in a Heterogeneous Population
BioMed Research International
author_facet Woojoo Lee
Andrey Alexeyenko
Maria Pernemalm
Justine Guegan
Philippe Dessen
Vladimir Lazar
Janne Lehtiö
Yudi Pawitan
author_sort Woojoo Lee
title Identifying and Assessing Interesting Subgroups in a Heterogeneous Population
title_short Identifying and Assessing Interesting Subgroups in a Heterogeneous Population
title_full Identifying and Assessing Interesting Subgroups in a Heterogeneous Population
title_fullStr Identifying and Assessing Interesting Subgroups in a Heterogeneous Population
title_full_unstemmed Identifying and Assessing Interesting Subgroups in a Heterogeneous Population
title_sort identifying and assessing interesting subgroups in a heterogeneous population
publisher Hindawi Limited
series BioMed Research International
issn 2314-6133
2314-6141
publishDate 2015-01-01
description Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability—the basis of cluster generation—is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided.
url http://dx.doi.org/10.1155/2015/462549
work_keys_str_mv AT woojoolee identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation
AT andreyalexeyenko identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation
AT mariapernemalm identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation
AT justineguegan identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation
AT philippedessen identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation
AT vladimirlazar identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation
AT jannelehtio identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation
AT yudipawitan identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation
_version_ 1725670305348714496