Mining gene expression data of multiple sclerosis.

Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed a...

Full description

Bibliographic Details
Main Authors: Pi Guo, Qin Zhang, Zhenli Zhu, Zhengliang Huang, Ke Li
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4059716?pdf=render
id doaj-808c8603a9e84326b9be3298f3ac1ca9
record_format Article
spelling doaj-808c8603a9e84326b9be3298f3ac1ca92020-11-25T00:12:40ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0196e10005210.1371/journal.pone.0100052Mining gene expression data of multiple sclerosis.Pi GuoQin ZhangZhenli ZhuZhengliang HuangKe LiMicroarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed an analysis to identify disease-related genes using multiple sclerosis as an example.Gene expression profiles based on the transcriptome of peripheral blood mononuclear cells from a total of 44 samples from 26 multiple sclerosis patients and 18 individuals with other neurological diseases (control) were analyzed. Feature selection algorithms including Support Vector Machine based on Recursive Feature Elimination, Receiver Operating Characteristic Curve, and Boruta algorithms were jointly performed to select candidate genes associating with multiple sclerosis. Multiple classification models categorized samples into two different groups based on the identified genes. Models' performance was evaluated using cross-validation methods, and an optimal classifier for gene selection was determined.An overlapping feature set was identified consisting of 8 genes that were differentially expressed between the two phenotype groups. The genes were significantly associated with the pathways of apoptosis and cytokine-cytokine receptor interaction. TNFSF10 was significantly associated with multiple sclerosis. A Support Vector Machine model was established based on the featured genes and gave a practical accuracy of ∼86%. This binary classification model also outperformed the other models in terms of Sensitivity, Specificity and F1 score.The combined analytical framework integrating feature ranking algorithms and Support Vector Machine model could be used for selecting genes for other diseases.http://europepmc.org/articles/PMC4059716?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Pi Guo
Qin Zhang
Zhenli Zhu
Zhengliang Huang
Ke Li
spellingShingle Pi Guo
Qin Zhang
Zhenli Zhu
Zhengliang Huang
Ke Li
Mining gene expression data of multiple sclerosis.
PLoS ONE
author_facet Pi Guo
Qin Zhang
Zhenli Zhu
Zhengliang Huang
Ke Li
author_sort Pi Guo
title Mining gene expression data of multiple sclerosis.
title_short Mining gene expression data of multiple sclerosis.
title_full Mining gene expression data of multiple sclerosis.
title_fullStr Mining gene expression data of multiple sclerosis.
title_full_unstemmed Mining gene expression data of multiple sclerosis.
title_sort mining gene expression data of multiple sclerosis.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2014-01-01
description Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed an analysis to identify disease-related genes using multiple sclerosis as an example.Gene expression profiles based on the transcriptome of peripheral blood mononuclear cells from a total of 44 samples from 26 multiple sclerosis patients and 18 individuals with other neurological diseases (control) were analyzed. Feature selection algorithms including Support Vector Machine based on Recursive Feature Elimination, Receiver Operating Characteristic Curve, and Boruta algorithms were jointly performed to select candidate genes associating with multiple sclerosis. Multiple classification models categorized samples into two different groups based on the identified genes. Models' performance was evaluated using cross-validation methods, and an optimal classifier for gene selection was determined.An overlapping feature set was identified consisting of 8 genes that were differentially expressed between the two phenotype groups. The genes were significantly associated with the pathways of apoptosis and cytokine-cytokine receptor interaction. TNFSF10 was significantly associated with multiple sclerosis. A Support Vector Machine model was established based on the featured genes and gave a practical accuracy of ∼86%. This binary classification model also outperformed the other models in terms of Sensitivity, Specificity and F1 score.The combined analytical framework integrating feature ranking algorithms and Support Vector Machine model could be used for selecting genes for other diseases.
url http://europepmc.org/articles/PMC4059716?pdf=render
work_keys_str_mv AT piguo mininggeneexpressiondataofmultiplesclerosis
AT qinzhang mininggeneexpressiondataofmultiplesclerosis
AT zhenlizhu mininggeneexpressiondataofmultiplesclerosis
AT zhenglianghuang mininggeneexpressiondataofmultiplesclerosis
AT keli mininggeneexpressiondataofmultiplesclerosis
_version_ 1725398309348048896