Identifying and Assessing Interesting Subgroups in a Heterogeneous Population
Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecogn...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2015-01-01
|
Series: | BioMed Research International |
Online Access: | http://dx.doi.org/10.1155/2015/462549 |
id |
doaj-9b3ba3043ecf4b7b90207e0fb1214a43 |
---|---|
record_format |
Article |
spelling |
doaj-9b3ba3043ecf4b7b90207e0fb1214a432020-11-24T22:51:20ZengHindawi LimitedBioMed Research International2314-61332314-61412015-01-01201510.1155/2015/462549462549Identifying and Assessing Interesting Subgroups in a Heterogeneous PopulationWoojoo Lee0Andrey Alexeyenko1Maria Pernemalm2Justine Guegan3Philippe Dessen4Vladimir Lazar5Janne Lehtiö6Yudi Pawitan7Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177 Stockholm, SwedenDepartment of Microbiology, Tumour and Cell Biology, Bioinformatics Infrastructure for Life Sciences, Science for Life Laboratory, Karolinska Institutet, 17177 Stockholm, SwedenDepartment of Oncology and Pathology, Science for Life Laboratory, Karolinska Institutet, 17121 Solna, SwedenGenomics, Institut Gustave Roussy, F-94805 Villejuif, FranceGenomics, Institut Gustave Roussy, F-94805 Villejuif, FranceGenomics, Institut Gustave Roussy, F-94805 Villejuif, FranceDepartment of Oncology and Pathology, Science for Life Laboratory, Karolinska Institutet, 17121 Solna, SwedenDepartment of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177 Stockholm, SwedenBiological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability—the basis of cluster generation—is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided.http://dx.doi.org/10.1155/2015/462549 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Woojoo Lee Andrey Alexeyenko Maria Pernemalm Justine Guegan Philippe Dessen Vladimir Lazar Janne Lehtiö Yudi Pawitan |
spellingShingle |
Woojoo Lee Andrey Alexeyenko Maria Pernemalm Justine Guegan Philippe Dessen Vladimir Lazar Janne Lehtiö Yudi Pawitan Identifying and Assessing Interesting Subgroups in a Heterogeneous Population BioMed Research International |
author_facet |
Woojoo Lee Andrey Alexeyenko Maria Pernemalm Justine Guegan Philippe Dessen Vladimir Lazar Janne Lehtiö Yudi Pawitan |
author_sort |
Woojoo Lee |
title |
Identifying and Assessing Interesting Subgroups in a Heterogeneous Population |
title_short |
Identifying and Assessing Interesting Subgroups in a Heterogeneous Population |
title_full |
Identifying and Assessing Interesting Subgroups in a Heterogeneous Population |
title_fullStr |
Identifying and Assessing Interesting Subgroups in a Heterogeneous Population |
title_full_unstemmed |
Identifying and Assessing Interesting Subgroups in a Heterogeneous Population |
title_sort |
identifying and assessing interesting subgroups in a heterogeneous population |
publisher |
Hindawi Limited |
series |
BioMed Research International |
issn |
2314-6133 2314-6141 |
publishDate |
2015-01-01 |
description |
Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability—the basis of cluster generation—is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided. |
url |
http://dx.doi.org/10.1155/2015/462549 |
work_keys_str_mv |
AT woojoolee identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation AT andreyalexeyenko identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation AT mariapernemalm identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation AT justineguegan identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation AT philippedessen identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation AT vladimirlazar identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation AT jannelehtio identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation AT yudipawitan identifyingandassessinginterestingsubgroupsinaheterogeneouspopulation |
_version_ |
1725670305348714496 |