ProbCD: enrichment analysis accounting for categorization uncertainty
<p>Abstract</p> <p>Background</p> <p>As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-repr...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2007-10-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/8/383 |
id |
doaj-d02ce1db5a9c43a6884556b30d07a103 |
---|---|
record_format |
Article |
spelling |
doaj-d02ce1db5a9c43a6884556b30d07a1032020-11-25T00:23:57ZengBMCBMC Bioinformatics1471-21052007-10-018138310.1186/1471-2105-8-383ProbCD: enrichment analysis accounting for categorization uncertaintyShmulevich IlyaVêncio Ricardo ZN<p>Abstract</p> <p>Background</p> <p>As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test.</p> <p>Results</p> <p>We developed an open-source R-based software to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: <url>http://xerad.systemsbiology.net/ProbCD/</url>.</p> <p>Conclusion</p> <p>We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation.</p> http://www.biomedcentral.com/1471-2105/8/383 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Shmulevich Ilya Vêncio Ricardo ZN |
spellingShingle |
Shmulevich Ilya Vêncio Ricardo ZN ProbCD: enrichment analysis accounting for categorization uncertainty BMC Bioinformatics |
author_facet |
Shmulevich Ilya Vêncio Ricardo ZN |
author_sort |
Shmulevich Ilya |
title |
ProbCD: enrichment analysis accounting for categorization uncertainty |
title_short |
ProbCD: enrichment analysis accounting for categorization uncertainty |
title_full |
ProbCD: enrichment analysis accounting for categorization uncertainty |
title_fullStr |
ProbCD: enrichment analysis accounting for categorization uncertainty |
title_full_unstemmed |
ProbCD: enrichment analysis accounting for categorization uncertainty |
title_sort |
probcd: enrichment analysis accounting for categorization uncertainty |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2007-10-01 |
description |
<p>Abstract</p> <p>Background</p> <p>As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test.</p> <p>Results</p> <p>We developed an open-source R-based software to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: <url>http://xerad.systemsbiology.net/ProbCD/</url>.</p> <p>Conclusion</p> <p>We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation.</p> |
url |
http://www.biomedcentral.com/1471-2105/8/383 |
work_keys_str_mv |
AT shmulevichilya probcdenrichmentanalysisaccountingforcategorizationuncertainty AT vencioricardozn probcdenrichmentanalysisaccountingforcategorizationuncertainty |
_version_ |
1725354771345309696 |