Candidate gene prioritization by network analysis of differential expression using machine learning approaches

<p>Abstract</p> <p>Background</p> <p>Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of...

Full description

Bibliographic Details
Main Authors: Nitsch Daniela, Gonçalves Joana P, Ojeda Fabian, de Moor Bart, Moreau Yves
Format: Article
Language:English
Published: BMC 2010-09-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/11/460
id doaj-8bae378826e043bb92ee94dd756be58f
record_format Article
spelling doaj-8bae378826e043bb92ee94dd756be58f2020-11-24T23:40:56ZengBMCBMC Bioinformatics1471-21052010-09-0111146010.1186/1471-2105-11-460Candidate gene prioritization by network analysis of differential expression using machine learning approachesNitsch DanielaGonçalves Joana POjeda Fabiande Moor BartMoreau Yves<p>Abstract</p> <p>Background</p> <p>Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals.</p> <p>To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network.</p> <p>Results</p> <p>We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (<it>Simple Expression Ranking</it>). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the <it>Heat Kernel Diffusion Ranking </it>leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%.</p> <p>Conclusion</p> <p>In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.</p> http://www.biomedcentral.com/1471-2105/11/460
collection DOAJ
language English
format Article
sources DOAJ
author Nitsch Daniela
Gonçalves Joana P
Ojeda Fabian
de Moor Bart
Moreau Yves
spellingShingle Nitsch Daniela
Gonçalves Joana P
Ojeda Fabian
de Moor Bart
Moreau Yves
Candidate gene prioritization by network analysis of differential expression using machine learning approaches
BMC Bioinformatics
author_facet Nitsch Daniela
Gonçalves Joana P
Ojeda Fabian
de Moor Bart
Moreau Yves
author_sort Nitsch Daniela
title Candidate gene prioritization by network analysis of differential expression using machine learning approaches
title_short Candidate gene prioritization by network analysis of differential expression using machine learning approaches
title_full Candidate gene prioritization by network analysis of differential expression using machine learning approaches
title_fullStr Candidate gene prioritization by network analysis of differential expression using machine learning approaches
title_full_unstemmed Candidate gene prioritization by network analysis of differential expression using machine learning approaches
title_sort candidate gene prioritization by network analysis of differential expression using machine learning approaches
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2010-09-01
description <p>Abstract</p> <p>Background</p> <p>Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals.</p> <p>To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network.</p> <p>Results</p> <p>We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (<it>Simple Expression Ranking</it>). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the <it>Heat Kernel Diffusion Ranking </it>leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%.</p> <p>Conclusion</p> <p>In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.</p>
url http://www.biomedcentral.com/1471-2105/11/460
work_keys_str_mv AT nitschdaniela candidategeneprioritizationbynetworkanalysisofdifferentialexpressionusingmachinelearningapproaches
AT goncalvesjoanap candidategeneprioritizationbynetworkanalysisofdifferentialexpressionusingmachinelearningapproaches
AT ojedafabian candidategeneprioritizationbynetworkanalysisofdifferentialexpressionusingmachinelearningapproaches
AT demoorbart candidategeneprioritizationbynetworkanalysisofdifferentialexpressionusingmachinelearningapproaches
AT moreauyves candidategeneprioritizationbynetworkanalysisofdifferentialexpressionusingmachinelearningapproaches
_version_ 1725508701607952384