Correlating gene and protein expression data using Correlated Factor Analysis

<p>Abstract</p> <p>Background</p> <p>Joint analysis of transcriptomic and proteomic data taken from the same samples has the potential to elucidate complex biological mechanisms. Most current methods that integrate these datasets allow for the computation of the correla...

Full description

Bibliographic Details
Main Authors: Lehtiö Janne, Ploner Alexander, Salim Agus, Tan Chuen, Chia Kee, Pawitan Yudi
Format: Article
Language:English
Published: BMC 2009-09-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/10/272
id doaj-e9c2cec8bfae416dbac7cff30dcfee47
record_format Article
spelling doaj-e9c2cec8bfae416dbac7cff30dcfee472020-11-25T01:01:11ZengBMCBMC Bioinformatics1471-21052009-09-0110127210.1186/1471-2105-10-272Correlating gene and protein expression data using Correlated Factor AnalysisLehtiö JannePloner AlexanderSalim AgusTan ChuenChia KeePawitan Yudi<p>Abstract</p> <p>Background</p> <p>Joint analysis of transcriptomic and proteomic data taken from the same samples has the potential to elucidate complex biological mechanisms. Most current methods that integrate these datasets allow for the computation of the correlation between a gene and protein but only after a one-to-one matching of genes and proteins is done. However, genes and proteins are connected via biological pathways and their relationship is not necessarily one-to-one. In this paper, we investigate the use of Correlated Factor Analysis (CFA) for modeling the correlation of genome-scale gene and protein data. Unlike existing approaches, CFA considers all possible gene-protein pairs and utilizes all gene and protein information in its modeling framework. The Generalized Singular Value Decomposition (gSVD) is another method which takes into account all available transcriptomic and proteomic data. Comparison is made between CFA and gSVD.</p> <p>Results</p> <p>Our simulation study indicates that the CFA estimates can consistently capture the dominant patterns of correlation between two sets of measurements; in contrast, the gSVD estimates cannot do that. Applied to real cancer data, the list of co-regulated genes and proteins identified by CFA has biologically meaningful interpretation, where both the gene and protein expressions are pointing to the same processes. Among the GO terms for which the genes and proteins are most correlated, we observed blood vessel morphogenesis and development.</p> <p>Conclusion</p> <p>We demonstrate that CFA is a useful tool for gene-protein data integration and modeling, where the main question is in finding which patterns of gene expression are most correlated with protein expression.</p> http://www.biomedcentral.com/1471-2105/10/272
collection DOAJ
language English
format Article
sources DOAJ
author Lehtiö Janne
Ploner Alexander
Salim Agus
Tan Chuen
Chia Kee
Pawitan Yudi
spellingShingle Lehtiö Janne
Ploner Alexander
Salim Agus
Tan Chuen
Chia Kee
Pawitan Yudi
Correlating gene and protein expression data using Correlated Factor Analysis
BMC Bioinformatics
author_facet Lehtiö Janne
Ploner Alexander
Salim Agus
Tan Chuen
Chia Kee
Pawitan Yudi
author_sort Lehtiö Janne
title Correlating gene and protein expression data using Correlated Factor Analysis
title_short Correlating gene and protein expression data using Correlated Factor Analysis
title_full Correlating gene and protein expression data using Correlated Factor Analysis
title_fullStr Correlating gene and protein expression data using Correlated Factor Analysis
title_full_unstemmed Correlating gene and protein expression data using Correlated Factor Analysis
title_sort correlating gene and protein expression data using correlated factor analysis
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2009-09-01
description <p>Abstract</p> <p>Background</p> <p>Joint analysis of transcriptomic and proteomic data taken from the same samples has the potential to elucidate complex biological mechanisms. Most current methods that integrate these datasets allow for the computation of the correlation between a gene and protein but only after a one-to-one matching of genes and proteins is done. However, genes and proteins are connected via biological pathways and their relationship is not necessarily one-to-one. In this paper, we investigate the use of Correlated Factor Analysis (CFA) for modeling the correlation of genome-scale gene and protein data. Unlike existing approaches, CFA considers all possible gene-protein pairs and utilizes all gene and protein information in its modeling framework. The Generalized Singular Value Decomposition (gSVD) is another method which takes into account all available transcriptomic and proteomic data. Comparison is made between CFA and gSVD.</p> <p>Results</p> <p>Our simulation study indicates that the CFA estimates can consistently capture the dominant patterns of correlation between two sets of measurements; in contrast, the gSVD estimates cannot do that. Applied to real cancer data, the list of co-regulated genes and proteins identified by CFA has biologically meaningful interpretation, where both the gene and protein expressions are pointing to the same processes. Among the GO terms for which the genes and proteins are most correlated, we observed blood vessel morphogenesis and development.</p> <p>Conclusion</p> <p>We demonstrate that CFA is a useful tool for gene-protein data integration and modeling, where the main question is in finding which patterns of gene expression are most correlated with protein expression.</p>
url http://www.biomedcentral.com/1471-2105/10/272
work_keys_str_mv AT lehtiojanne correlatinggeneandproteinexpressiondatausingcorrelatedfactoranalysis
AT ploneralexander correlatinggeneandproteinexpressiondatausingcorrelatedfactoranalysis
AT salimagus correlatinggeneandproteinexpressiondatausingcorrelatedfactoranalysis
AT tanchuen correlatinggeneandproteinexpressiondatausingcorrelatedfactoranalysis
AT chiakee correlatinggeneandproteinexpressiondatausingcorrelatedfactoranalysis
AT pawitanyudi correlatinggeneandproteinexpressiondatausingcorrelatedfactoranalysis
_version_ 1725210308251746304