Mining gene expression data of multiple sclerosis.

Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed a...

Full description

Bibliographic Details
Main Authors:	Pi Guo, Qin Zhang, Zhenli Zhu, Zhengliang Huang, Ke Li
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2014-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC4059716?pdf=render

id	doaj-808c8603a9e84326b9be3298f3ac1ca9
record_format	Article
spelling	doaj-808c8603a9e84326b9be3298f3ac1ca92020-11-25T00:12:40ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0196e10005210.1371/journal.pone.0100052Mining gene expression data of multiple sclerosis.Pi GuoQin ZhangZhenli ZhuZhengliang HuangKe LiMicroarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed an analysis to identify disease-related genes using multiple sclerosis as an example.Gene expression profiles based on the transcriptome of peripheral blood mononuclear cells from a total of 44 samples from 26 multiple sclerosis patients and 18 individuals with other neurological diseases (control) were analyzed. Feature selection algorithms including Support Vector Machine based on Recursive Feature Elimination, Receiver Operating Characteristic Curve, and Boruta algorithms were jointly performed to select candidate genes associating with multiple sclerosis. Multiple classification models categorized samples into two different groups based on the identified genes. Models' performance was evaluated using cross-validation methods, and an optimal classifier for gene selection was determined.An overlapping feature set was identified consisting of 8 genes that were differentially expressed between the two phenotype groups. The genes were significantly associated with the pathways of apoptosis and cytokine-cytokine receptor interaction. TNFSF10 was significantly associated with multiple sclerosis. A Support Vector Machine model was established based on the featured genes and gave a practical accuracy of ∼86%. This binary classification model also outperformed the other models in terms of Sensitivity, Specificity and F1 score.The combined analytical framework integrating feature ranking algorithms and Support Vector Machine model could be used for selecting genes for other diseases.http://europepmc.org/articles/PMC4059716?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Pi Guo Qin Zhang Zhenli Zhu Zhengliang Huang Ke Li
spellingShingle	Pi Guo Qin Zhang Zhenli Zhu Zhengliang Huang Ke Li Mining gene expression data of multiple sclerosis. PLoS ONE
author_facet	Pi Guo Qin Zhang Zhenli Zhu Zhengliang Huang Ke Li
author_sort	Pi Guo
title	Mining gene expression data of multiple sclerosis.
title_short	Mining gene expression data of multiple sclerosis.
title_full	Mining gene expression data of multiple sclerosis.
title_fullStr	Mining gene expression data of multiple sclerosis.
title_full_unstemmed	Mining gene expression data of multiple sclerosis.
title_sort	mining gene expression data of multiple sclerosis.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2014-01-01
description	Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed an analysis to identify disease-related genes using multiple sclerosis as an example.Gene expression profiles based on the transcriptome of peripheral blood mononuclear cells from a total of 44 samples from 26 multiple sclerosis patients and 18 individuals with other neurological diseases (control) were analyzed. Feature selection algorithms including Support Vector Machine based on Recursive Feature Elimination, Receiver Operating Characteristic Curve, and Boruta algorithms were jointly performed to select candidate genes associating with multiple sclerosis. Multiple classification models categorized samples into two different groups based on the identified genes. Models' performance was evaluated using cross-validation methods, and an optimal classifier for gene selection was determined.An overlapping feature set was identified consisting of 8 genes that were differentially expressed between the two phenotype groups. The genes were significantly associated with the pathways of apoptosis and cytokine-cytokine receptor interaction. TNFSF10 was significantly associated with multiple sclerosis. A Support Vector Machine model was established based on the featured genes and gave a practical accuracy of ∼86%. This binary classification model also outperformed the other models in terms of Sensitivity, Specificity and F1 score.The combined analytical framework integrating feature ranking algorithms and Support Vector Machine model could be used for selecting genes for other diseases.
url	http://europepmc.org/articles/PMC4059716?pdf=render
work_keys_str_mv	AT piguo mininggeneexpressiondataofmultiplesclerosis AT qinzhang mininggeneexpressiondataofmultiplesclerosis AT zhenlizhu mininggeneexpressiondataofmultiplesclerosis AT zhenglianghuang mininggeneexpressiondataofmultiplesclerosis AT keli mininggeneexpressiondataofmultiplesclerosis
_version_	1725398309348048896

Mining gene expression data of multiple sclerosis.

Similar Items