A feature selection approach for identification of signature genes from SAGE data

Abstract Background One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique...

Full description

Bibliographic Details
Main Authors:	Silva Paulo JS, Patrão Diogo FC, Martins David C, Humes Carlos, Cesar Roberto M, Barrera Junior, Brentani Helena
Format:	Article
Language:	English
Published:	BMC 2007-05-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/8/169

id	doaj-a4f61ba8dc3048e882f40d19962f86bc
record_format	Article
spelling	doaj-a4f61ba8dc3048e882f40d19962f86bc2020-11-24T21:39:30ZengBMCBMC Bioinformatics1471-21052007-05-018116910.1186/1471-2105-8-169A feature selection approach for identification of signature genes from SAGE dataSilva Paulo JSPatrão Diogo FCMartins David CHumes CarlosCesar Roberto MBarrera JuniorBrentani Helena<p>Abstract</p> <p>Background</p> <p>One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements.</p> <p>Results</p> <p>A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology.</p> <p>Conclusion</p> <p>The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.</p> http://www.biomedcentral.com/1471-2105/8/169
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Silva Paulo JS Patrão Diogo FC Martins David C Humes Carlos Cesar Roberto M Barrera Junior Brentani Helena
spellingShingle	Silva Paulo JS Patrão Diogo FC Martins David C Humes Carlos Cesar Roberto M Barrera Junior Brentani Helena A feature selection approach for identification of signature genes from SAGE data BMC Bioinformatics
author_facet	Silva Paulo JS Patrão Diogo FC Martins David C Humes Carlos Cesar Roberto M Barrera Junior Brentani Helena
author_sort	Silva Paulo JS
title	A feature selection approach for identification of signature genes from SAGE data
title_short	A feature selection approach for identification of signature genes from SAGE data
title_full	A feature selection approach for identification of signature genes from SAGE data
title_fullStr	A feature selection approach for identification of signature genes from SAGE data
title_full_unstemmed	A feature selection approach for identification of signature genes from SAGE data
title_sort	feature selection approach for identification of signature genes from sage data
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2007-05-01
description	<p>Abstract</p> <p>Background</p> <p>One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements.</p> <p>Results</p> <p>A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology.</p> <p>Conclusion</p> <p>The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.</p>
url	http://www.biomedcentral.com/1471-2105/8/169
work_keys_str_mv	AT silvapaulojs afeatureselectionapproachforidentificationofsignaturegenesfromsagedata AT patraodiogofc afeatureselectionapproachforidentificationofsignaturegenesfromsagedata AT martinsdavidc afeatureselectionapproachforidentificationofsignaturegenesfromsagedata AT humescarlos afeatureselectionapproachforidentificationofsignaturegenesfromsagedata AT cesarrobertom afeatureselectionapproachforidentificationofsignaturegenesfromsagedata AT barrerajunior afeatureselectionapproachforidentificationofsignaturegenesfromsagedata AT brentanihelena afeatureselectionapproachforidentificationofsignaturegenesfromsagedata AT silvapaulojs featureselectionapproachforidentificationofsignaturegenesfromsagedata AT patraodiogofc featureselectionapproachforidentificationofsignaturegenesfromsagedata AT martinsdavidc featureselectionapproachforidentificationofsignaturegenesfromsagedata AT humescarlos featureselectionapproachforidentificationofsignaturegenesfromsagedata AT cesarrobertom featureselectionapproachforidentificationofsignaturegenesfromsagedata AT barrerajunior featureselectionapproachforidentificationofsignaturegenesfromsagedata AT brentanihelena featureselectionapproachforidentificationofsignaturegenesfromsagedata
_version_	1725930954631938048

A feature selection approach for identification of signature genes from SAGE data

Similar Items