Suite of decision tree-based classification algorithms on cancer gene expression data

One of the major challenges in microarray analysis, especially in cancer gene expression profiles, is to determine genes or groups of genes that are highly expressed in cancer cells but not in normal cells. Supervised machine learning techniques are used with microarray datasets to build classificat...

Full description

Bibliographic Details
Main Authors: Mohmad Badr Al Snousy, Hesham Mohamed El-Deeb, Khaled Badran, Ibrahim Ali Al Khlil
Format: Article
Language:English
Published: Elsevier 2011-07-01
Series:Egyptian Informatics Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1110866511000223
id doaj-f3451b629bec4106ba5d53e51e94f204
record_format Article
spelling doaj-f3451b629bec4106ba5d53e51e94f2042021-07-02T01:20:31ZengElsevierEgyptian Informatics Journal1110-86652011-07-01122738210.1016/j.eij.2011.04.003Suite of decision tree-based classification algorithms on cancer gene expression dataMohmad Badr Al Snousy0Hesham Mohamed El-Deeb1Khaled Badran2Ibrahim Ali Al Khlil3Department of Computer Science, Sadat Academy for Management Science (SAMS), EgyptDepartment of Computer Science, Modern University for Technology and Information (M.T.I.), EgyptDepartment of Computer Science, Military Technical College (M.T.C.), EgyptDepartment of Computer Science, Military Technical College (M.T.C.), EgyptOne of the major challenges in microarray analysis, especially in cancer gene expression profiles, is to determine genes or groups of genes that are highly expressed in cancer cells but not in normal cells. Supervised machine learning techniques are used with microarray datasets to build classification models that improve the diagnostic of different diseases. In this study, we compare the classification accuracy among nine decision tree methods; which are divided into two main categories; the first is single decision tree C4.5, CART, Decision Stump, Random Tree and REPTree. The second category is ensample decision tree such Bagging (C4.5 and REPTree), AdaBoost (C4.5 and REPTree), ADTree, and Random Forests. In addition to the previous comparative analyses, we evaluate the behaviors of these methods with/without applying attribute selection (A.S.) techniques such as Chi-square attribute selection and Gain Ratio attribute selection. Usually, the ensembles learning methods: bagging, boosting, and Random Forest; enhanced classification accuracy of single decision tree due to the natures of its mechanism which generate several classifiers from one dataset and vote for their classification decision. The values of enhancement fluctuate between (4.99–6.19%). In majority of datasets and classification methods, Gain ratio attribute selection slightly enhanced the classification accuracy (∼1.05%) due to the concentration on the most promising genes having the effective information gain that discriminate the dataset. Also, Chi-square attributes evaluation for ensemble classifiers slightly decreased the classification accuracy due to the elimination of some informative genes.http://www.sciencedirect.com/science/article/pii/S1110866511000223DNA microarrayCancerClassificationDecision treesEnsample decision treeAttribute selection
collection DOAJ
language English
format Article
sources DOAJ
author Mohmad Badr Al Snousy
Hesham Mohamed El-Deeb
Khaled Badran
Ibrahim Ali Al Khlil
spellingShingle Mohmad Badr Al Snousy
Hesham Mohamed El-Deeb
Khaled Badran
Ibrahim Ali Al Khlil
Suite of decision tree-based classification algorithms on cancer gene expression data
Egyptian Informatics Journal
DNA microarray
Cancer
Classification
Decision trees
Ensample decision tree
Attribute selection
author_facet Mohmad Badr Al Snousy
Hesham Mohamed El-Deeb
Khaled Badran
Ibrahim Ali Al Khlil
author_sort Mohmad Badr Al Snousy
title Suite of decision tree-based classification algorithms on cancer gene expression data
title_short Suite of decision tree-based classification algorithms on cancer gene expression data
title_full Suite of decision tree-based classification algorithms on cancer gene expression data
title_fullStr Suite of decision tree-based classification algorithms on cancer gene expression data
title_full_unstemmed Suite of decision tree-based classification algorithms on cancer gene expression data
title_sort suite of decision tree-based classification algorithms on cancer gene expression data
publisher Elsevier
series Egyptian Informatics Journal
issn 1110-8665
publishDate 2011-07-01
description One of the major challenges in microarray analysis, especially in cancer gene expression profiles, is to determine genes or groups of genes that are highly expressed in cancer cells but not in normal cells. Supervised machine learning techniques are used with microarray datasets to build classification models that improve the diagnostic of different diseases. In this study, we compare the classification accuracy among nine decision tree methods; which are divided into two main categories; the first is single decision tree C4.5, CART, Decision Stump, Random Tree and REPTree. The second category is ensample decision tree such Bagging (C4.5 and REPTree), AdaBoost (C4.5 and REPTree), ADTree, and Random Forests. In addition to the previous comparative analyses, we evaluate the behaviors of these methods with/without applying attribute selection (A.S.) techniques such as Chi-square attribute selection and Gain Ratio attribute selection. Usually, the ensembles learning methods: bagging, boosting, and Random Forest; enhanced classification accuracy of single decision tree due to the natures of its mechanism which generate several classifiers from one dataset and vote for their classification decision. The values of enhancement fluctuate between (4.99–6.19%). In majority of datasets and classification methods, Gain ratio attribute selection slightly enhanced the classification accuracy (∼1.05%) due to the concentration on the most promising genes having the effective information gain that discriminate the dataset. Also, Chi-square attributes evaluation for ensemble classifiers slightly decreased the classification accuracy due to the elimination of some informative genes.
topic DNA microarray
Cancer
Classification
Decision trees
Ensample decision tree
Attribute selection
url http://www.sciencedirect.com/science/article/pii/S1110866511000223
work_keys_str_mv AT mohmadbadralsnousy suiteofdecisiontreebasedclassificationalgorithmsoncancergeneexpressiondata
AT heshammohamedeldeeb suiteofdecisiontreebasedclassificationalgorithmsoncancergeneexpressiondata
AT khaledbadran suiteofdecisiontreebasedclassificationalgorithmsoncancergeneexpressiondata
AT ibrahimalialkhlil suiteofdecisiontreebasedclassificationalgorithmsoncancergeneexpressiondata
_version_ 1721345155482517504