An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors

<p>Abstract</p> <p>Background</p> <p>The analysis of large-scale gene expression data is a fundamental approach to functional genomics and the identification of potential drug targets. Results derived from such studies cannot be trusted unless they are adequately design...

Full description

Bibliographic Details
Main Authors: Azuaje Francisco, Jafari Peyman
Format: Article
Language:English
Published: BMC 2006-06-01
Series:BMC Medical Informatics and Decision Making
Online Access:http://www.biomedcentral.com/1472-6947/6/27
Description
Summary:<p>Abstract</p> <p>Background</p> <p>The analysis of large-scale gene expression data is a fundamental approach to functional genomics and the identification of potential drug targets. Results derived from such studies cannot be trusted unless they are adequately designed and reported. The purpose of this study is to assess current practices on the reporting of experimental design and statistical analyses in gene expression-based studies.</p> <p>Methods</p> <p>We reviewed hundreds of MEDLINE-indexed papers involving gene expression data analysis, which were published between 2003 and 2005. These papers were examined on the basis of their reporting of several factors, such as sample size, statistical power and software availability.</p> <p>Results</p> <p>Among the examined papers, we concentrated on 293 papers consisting of applications and new methodologies. These papers did not report approaches to sample size and statistical power estimation. Explicit statements on data transformation and descriptions of the normalisation techniques applied prior to data analyses (e.g. classification) were not reported in 57 (37.5%) and 104 (68.4%) of the methodology papers respectively. With regard to papers presenting biomedical-relevant applications, 41(29.1 %) of these papers did not report on data normalisation and 83 (58.9%) did not describe the normalisation technique applied. Clustering-based analysis, the <it>t</it>-test and ANOVA represent the most widely applied techniques in microarray data analysis. But remarkably, only 5 (3.5%) of the application papers included statements or references to assumption about variance homogeneity for the application of the <it>t</it>-test and ANOVA. There is still a need to promote the reporting of software packages applied or their availability.</p> <p>Conclusion</p> <p>Recently-published gene expression data analysis studies may lack key information required for properly assessing their design quality and potential impact. There is a need for more rigorous reporting of important experimental factors such as statistical power and sample size, as well as the correct description and justification of statistical methods applied. This paper highlights the importance of defining a minimum set of information required for reporting on statistical design and analysis of expression data. By improving practices of statistical analysis reporting, the scientific community can facilitate quality assurance and peer-review processes, as well as the reproducibility of results.</p>
ISSN:1472-6947