Correlating gene and protein expression data using Correlated Factor Analysis

<p>Abstract</p> <p>Background</p> <p>Joint analysis of transcriptomic and proteomic data taken from the same samples has the potential to elucidate complex biological mechanisms. Most current methods that integrate these datasets allow for the computation of the correla...

Full description

Bibliographic Details
Main Authors: Lehtiö Janne, Ploner Alexander, Salim Agus, Tan Chuen, Chia Kee, Pawitan Yudi
Format: Article
Language:English
Published: BMC 2009-09-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/10/272
Description
Summary:<p>Abstract</p> <p>Background</p> <p>Joint analysis of transcriptomic and proteomic data taken from the same samples has the potential to elucidate complex biological mechanisms. Most current methods that integrate these datasets allow for the computation of the correlation between a gene and protein but only after a one-to-one matching of genes and proteins is done. However, genes and proteins are connected via biological pathways and their relationship is not necessarily one-to-one. In this paper, we investigate the use of Correlated Factor Analysis (CFA) for modeling the correlation of genome-scale gene and protein data. Unlike existing approaches, CFA considers all possible gene-protein pairs and utilizes all gene and protein information in its modeling framework. The Generalized Singular Value Decomposition (gSVD) is another method which takes into account all available transcriptomic and proteomic data. Comparison is made between CFA and gSVD.</p> <p>Results</p> <p>Our simulation study indicates that the CFA estimates can consistently capture the dominant patterns of correlation between two sets of measurements; in contrast, the gSVD estimates cannot do that. Applied to real cancer data, the list of co-regulated genes and proteins identified by CFA has biologically meaningful interpretation, where both the gene and protein expressions are pointing to the same processes. Among the GO terms for which the genes and proteins are most correlated, we observed blood vessel morphogenesis and development.</p> <p>Conclusion</p> <p>We demonstrate that CFA is a useful tool for gene-protein data integration and modeling, where the main question is in finding which patterns of gene expression are most correlated with protein expression.</p>
ISSN:1471-2105