Multiple-input multiple-output causal strategies for gene selection

<p>Abstract</p> <p>Background</p> <p>Traditional strategies for selecting variables in high dimensional classification problems aim to find sets of maximally relevant variables able to explain the target variations. If these techniques may be effective in generalization...

Full description

Bibliographic Details
Main Authors: Bontempi Gianluca, Haibe-Kains Benjamin, Desmedt Christine, Sotiriou Christos, Quackenbush John
Format: Article
Language:English
Published: BMC 2011-11-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/12/458
id doaj-a285955455a9486090b8ed8bcc33b529
record_format Article
spelling doaj-a285955455a9486090b8ed8bcc33b5292020-11-25T02:28:17ZengBMCBMC Bioinformatics1471-21052011-11-0112145810.1186/1471-2105-12-458Multiple-input multiple-output causal strategies for gene selectionBontempi GianlucaHaibe-Kains BenjaminDesmedt ChristineSotiriou ChristosQuackenbush John<p>Abstract</p> <p>Background</p> <p>Traditional strategies for selecting variables in high dimensional classification problems aim to find sets of maximally relevant variables able to explain the target variations. If these techniques may be effective in generalization accuracy they often do not reveal direct causes. The latter is essentially related to the fact that high correlation (or relevance) does not imply causation. In this study, we show how to efficiently incorporate causal information into gene selection by moving from a single-input single-output to a multiple-input multiple-output setting.</p> <p>Results</p> <p>We show in synthetic case study that a better prioritization of causal variables can be obtained by considering a relevance score which incorporates a causal term. In addition we show, in a meta-analysis study of six publicly available breast cancer microarray datasets, that the improvement occurs also in terms of accuracy. The biological interpretation of the results confirms the potential of a causal approach to gene selection.</p> <p>Conclusions</p> <p>Integrating causal information into gene selection algorithms is effective both in terms of prediction accuracy and biological interpretation.</p> http://www.biomedcentral.com/1471-2105/12/458
collection DOAJ
language English
format Article
sources DOAJ
author Bontempi Gianluca
Haibe-Kains Benjamin
Desmedt Christine
Sotiriou Christos
Quackenbush John
spellingShingle Bontempi Gianluca
Haibe-Kains Benjamin
Desmedt Christine
Sotiriou Christos
Quackenbush John
Multiple-input multiple-output causal strategies for gene selection
BMC Bioinformatics
author_facet Bontempi Gianluca
Haibe-Kains Benjamin
Desmedt Christine
Sotiriou Christos
Quackenbush John
author_sort Bontempi Gianluca
title Multiple-input multiple-output causal strategies for gene selection
title_short Multiple-input multiple-output causal strategies for gene selection
title_full Multiple-input multiple-output causal strategies for gene selection
title_fullStr Multiple-input multiple-output causal strategies for gene selection
title_full_unstemmed Multiple-input multiple-output causal strategies for gene selection
title_sort multiple-input multiple-output causal strategies for gene selection
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2011-11-01
description <p>Abstract</p> <p>Background</p> <p>Traditional strategies for selecting variables in high dimensional classification problems aim to find sets of maximally relevant variables able to explain the target variations. If these techniques may be effective in generalization accuracy they often do not reveal direct causes. The latter is essentially related to the fact that high correlation (or relevance) does not imply causation. In this study, we show how to efficiently incorporate causal information into gene selection by moving from a single-input single-output to a multiple-input multiple-output setting.</p> <p>Results</p> <p>We show in synthetic case study that a better prioritization of causal variables can be obtained by considering a relevance score which incorporates a causal term. In addition we show, in a meta-analysis study of six publicly available breast cancer microarray datasets, that the improvement occurs also in terms of accuracy. The biological interpretation of the results confirms the potential of a causal approach to gene selection.</p> <p>Conclusions</p> <p>Integrating causal information into gene selection algorithms is effective both in terms of prediction accuracy and biological interpretation.</p>
url http://www.biomedcentral.com/1471-2105/12/458
work_keys_str_mv AT bontempigianluca multipleinputmultipleoutputcausalstrategiesforgeneselection
AT haibekainsbenjamin multipleinputmultipleoutputcausalstrategiesforgeneselection
AT desmedtchristine multipleinputmultipleoutputcausalstrategiesforgeneselection
AT sotiriouchristos multipleinputmultipleoutputcausalstrategiesforgeneselection
AT quackenbushjohn multipleinputmultipleoutputcausalstrategiesforgeneselection
_version_ 1724839203426009088