The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data
Abstract Background In genomics, we often assume that continuous data, such as gene expression, follow a specific kind of distribution. However we rarely stop to question the validity of this assumption, or consider how broadly applicable it may be to all genes that are in the transcriptome. Our stu...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2020-12-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-020-03892-w |
id |
doaj-2ae0bac7450f46ae973414c89a644fd8 |
---|---|
record_format |
Article |
spelling |
doaj-2ae0bac7450f46ae973414c89a644fd82021-01-03T12:21:21ZengBMCBMC Bioinformatics1471-21052020-12-0121S2111810.1186/s12859-020-03892-wThe shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic dataLaurence de Torrenté0Samuel Zimmerman1Masako Suzuki2Maximilian Christopeit3John M. Greally4Jessica C. Mar5Department of Systems and Computational Biology, Albert Einstein College of MedicineDepartment of Systems and Computational Biology, Albert Einstein College of MedicineCenter for Epigenomics and Department of Genetics, Albert Einstein College of MedicineInternal Medicine II, Hematology, Oncology, Clinical Immunology and Rheumatology, University Hospital TuebingenCenter for Epigenomics and Department of Genetics, Albert Einstein College of MedicineDepartment of Systems and Computational Biology, Albert Einstein College of MedicineAbstract Background In genomics, we often assume that continuous data, such as gene expression, follow a specific kind of distribution. However we rarely stop to question the validity of this assumption, or consider how broadly applicable it may be to all genes that are in the transcriptome. Our study investigated the prevalence of a range of gene expression distributions in three different tumor types from the Cancer Genome Atlas (TCGA). Results Surprisingly, the expression of less than 50% of all genes was Normally-distributed, with other distributions including Gamma, Bimodal, Cauchy, and Lognormal also represented. Most of the distribution categories contained genes that were significantly enriched for unique biological processes. Different assumptions based on the shape of the expression profile were used to identify genes that could discriminate between patients with good versus poor survival. The prognostic marker genes that were identified when the shape of the distribution was accounted for reflected functional insights into cancer biology that were not observed when standard assumptions were applied. We showed that when multiple types of distributions were permitted, i.e. the shape of the expression profile was used, the statistical classifiers had greater predictive accuracy for determining the prognosis of a patient versus those that assumed only one type of gene expression distribution. Conclusions Our results highlight the value of studying a gene’s distribution shape to model heterogeneity of transcriptomic data and the impact on using analyses that permit more than one type of gene expression distribution. These insights would have been overlooked when using standard approaches that assume all genes follow the same type of distribution in a patient cohort.https://doi.org/10.1186/s12859-020-03892-wGene expressionMulti-modalityNon-normal distributionSurvival analysisCancer genomics |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Laurence de Torrenté Samuel Zimmerman Masako Suzuki Maximilian Christopeit John M. Greally Jessica C. Mar |
spellingShingle |
Laurence de Torrenté Samuel Zimmerman Masako Suzuki Maximilian Christopeit John M. Greally Jessica C. Mar The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data BMC Bioinformatics Gene expression Multi-modality Non-normal distribution Survival analysis Cancer genomics |
author_facet |
Laurence de Torrenté Samuel Zimmerman Masako Suzuki Maximilian Christopeit John M. Greally Jessica C. Mar |
author_sort |
Laurence de Torrenté |
title |
The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data |
title_short |
The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data |
title_full |
The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data |
title_fullStr |
The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data |
title_full_unstemmed |
The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data |
title_sort |
shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2020-12-01 |
description |
Abstract Background In genomics, we often assume that continuous data, such as gene expression, follow a specific kind of distribution. However we rarely stop to question the validity of this assumption, or consider how broadly applicable it may be to all genes that are in the transcriptome. Our study investigated the prevalence of a range of gene expression distributions in three different tumor types from the Cancer Genome Atlas (TCGA). Results Surprisingly, the expression of less than 50% of all genes was Normally-distributed, with other distributions including Gamma, Bimodal, Cauchy, and Lognormal also represented. Most of the distribution categories contained genes that were significantly enriched for unique biological processes. Different assumptions based on the shape of the expression profile were used to identify genes that could discriminate between patients with good versus poor survival. The prognostic marker genes that were identified when the shape of the distribution was accounted for reflected functional insights into cancer biology that were not observed when standard assumptions were applied. We showed that when multiple types of distributions were permitted, i.e. the shape of the expression profile was used, the statistical classifiers had greater predictive accuracy for determining the prognosis of a patient versus those that assumed only one type of gene expression distribution. Conclusions Our results highlight the value of studying a gene’s distribution shape to model heterogeneity of transcriptomic data and the impact on using analyses that permit more than one type of gene expression distribution. These insights would have been overlooked when using standard approaches that assume all genes follow the same type of distribution in a patient cohort. |
topic |
Gene expression Multi-modality Non-normal distribution Survival analysis Cancer genomics |
url |
https://doi.org/10.1186/s12859-020-03892-w |
work_keys_str_mv |
AT laurencedetorrente theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT samuelzimmerman theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT masakosuzuki theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT maximilianchristopeit theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT johnmgreally theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT jessicacmar theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT laurencedetorrente shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT samuelzimmerman shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT masakosuzuki shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT maximilianchristopeit shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT johnmgreally shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT jessicacmar shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata |
_version_ |
1724350313752690688 |