Correlating gene and protein expression data using Correlated Factor Analysis
<p>Abstract</p> <p>Background</p> <p>Joint analysis of transcriptomic and proteomic data taken from the same samples has the potential to elucidate complex biological mechanisms. Most current methods that integrate these datasets allow for the computation of the correla...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2009-09-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/10/272 |
id |
doaj-e9c2cec8bfae416dbac7cff30dcfee47 |
---|---|
record_format |
Article |
spelling |
doaj-e9c2cec8bfae416dbac7cff30dcfee472020-11-25T01:01:11ZengBMCBMC Bioinformatics1471-21052009-09-0110127210.1186/1471-2105-10-272Correlating gene and protein expression data using Correlated Factor AnalysisLehtiö JannePloner AlexanderSalim AgusTan ChuenChia KeePawitan Yudi<p>Abstract</p> <p>Background</p> <p>Joint analysis of transcriptomic and proteomic data taken from the same samples has the potential to elucidate complex biological mechanisms. Most current methods that integrate these datasets allow for the computation of the correlation between a gene and protein but only after a one-to-one matching of genes and proteins is done. However, genes and proteins are connected via biological pathways and their relationship is not necessarily one-to-one. In this paper, we investigate the use of Correlated Factor Analysis (CFA) for modeling the correlation of genome-scale gene and protein data. Unlike existing approaches, CFA considers all possible gene-protein pairs and utilizes all gene and protein information in its modeling framework. The Generalized Singular Value Decomposition (gSVD) is another method which takes into account all available transcriptomic and proteomic data. Comparison is made between CFA and gSVD.</p> <p>Results</p> <p>Our simulation study indicates that the CFA estimates can consistently capture the dominant patterns of correlation between two sets of measurements; in contrast, the gSVD estimates cannot do that. Applied to real cancer data, the list of co-regulated genes and proteins identified by CFA has biologically meaningful interpretation, where both the gene and protein expressions are pointing to the same processes. Among the GO terms for which the genes and proteins are most correlated, we observed blood vessel morphogenesis and development.</p> <p>Conclusion</p> <p>We demonstrate that CFA is a useful tool for gene-protein data integration and modeling, where the main question is in finding which patterns of gene expression are most correlated with protein expression.</p> http://www.biomedcentral.com/1471-2105/10/272 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Lehtiö Janne Ploner Alexander Salim Agus Tan Chuen Chia Kee Pawitan Yudi |
spellingShingle |
Lehtiö Janne Ploner Alexander Salim Agus Tan Chuen Chia Kee Pawitan Yudi Correlating gene and protein expression data using Correlated Factor Analysis BMC Bioinformatics |
author_facet |
Lehtiö Janne Ploner Alexander Salim Agus Tan Chuen Chia Kee Pawitan Yudi |
author_sort |
Lehtiö Janne |
title |
Correlating gene and protein expression data using Correlated Factor Analysis |
title_short |
Correlating gene and protein expression data using Correlated Factor Analysis |
title_full |
Correlating gene and protein expression data using Correlated Factor Analysis |
title_fullStr |
Correlating gene and protein expression data using Correlated Factor Analysis |
title_full_unstemmed |
Correlating gene and protein expression data using Correlated Factor Analysis |
title_sort |
correlating gene and protein expression data using correlated factor analysis |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2009-09-01 |
description |
<p>Abstract</p> <p>Background</p> <p>Joint analysis of transcriptomic and proteomic data taken from the same samples has the potential to elucidate complex biological mechanisms. Most current methods that integrate these datasets allow for the computation of the correlation between a gene and protein but only after a one-to-one matching of genes and proteins is done. However, genes and proteins are connected via biological pathways and their relationship is not necessarily one-to-one. In this paper, we investigate the use of Correlated Factor Analysis (CFA) for modeling the correlation of genome-scale gene and protein data. Unlike existing approaches, CFA considers all possible gene-protein pairs and utilizes all gene and protein information in its modeling framework. The Generalized Singular Value Decomposition (gSVD) is another method which takes into account all available transcriptomic and proteomic data. Comparison is made between CFA and gSVD.</p> <p>Results</p> <p>Our simulation study indicates that the CFA estimates can consistently capture the dominant patterns of correlation between two sets of measurements; in contrast, the gSVD estimates cannot do that. Applied to real cancer data, the list of co-regulated genes and proteins identified by CFA has biologically meaningful interpretation, where both the gene and protein expressions are pointing to the same processes. Among the GO terms for which the genes and proteins are most correlated, we observed blood vessel morphogenesis and development.</p> <p>Conclusion</p> <p>We demonstrate that CFA is a useful tool for gene-protein data integration and modeling, where the main question is in finding which patterns of gene expression are most correlated with protein expression.</p> |
url |
http://www.biomedcentral.com/1471-2105/10/272 |
work_keys_str_mv |
AT lehtiojanne correlatinggeneandproteinexpressiondatausingcorrelatedfactoranalysis AT ploneralexander correlatinggeneandproteinexpressiondatausingcorrelatedfactoranalysis AT salimagus correlatinggeneandproteinexpressiondatausingcorrelatedfactoranalysis AT tanchuen correlatinggeneandproteinexpressiondatausingcorrelatedfactoranalysis AT chiakee correlatinggeneandproteinexpressiondatausingcorrelatedfactoranalysis AT pawitanyudi correlatinggeneandproteinexpressiondatausingcorrelatedfactoranalysis |
_version_ |
1725210308251746304 |