Strategies for Combining Tree-Based Ensemble Models

Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and inp...

Full description

Bibliographic Details
Main Author: Zhang, Yi
Format: Others
Published: NSUWorks 2017
Subjects:
Online Access:http://nsuworks.nova.edu/gscis_etd/1021
http://nsuworks.nova.edu/cgi/viewcontent.cgi?article=2019&context=gscis_etd
id ndltd-nova.edu-oai-nsuworks.nova.edu-gscis_etd-2019
record_format oai_dc
spelling ndltd-nova.edu-oai-nsuworks.nova.edu-gscis_etd-20192018-01-20T15:54:23Z Strategies for Combining Tree-Based Ensemble Models Zhang, Yi Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach. 2017-01-01T08:00:00Z text application/pdf http://nsuworks.nova.edu/gscis_etd/1021 http://nsuworks.nova.edu/cgi/viewcontent.cgi?article=2019&context=gscis_etd CEC Theses and Dissertations NSUWorks ensemble models model selection multiple correspondence analysis predictive models random forest extremely randomized tree and eXtreme gradient boosting model tree based ensemble model Computer Sciences
collection NDLTD
format Others
sources NDLTD
topic ensemble models
model selection
multiple correspondence analysis
predictive models
random forest
extremely randomized tree
and eXtreme gradient boosting model
tree based ensemble model
Computer Sciences
spellingShingle ensemble models
model selection
multiple correspondence analysis
predictive models
random forest
extremely randomized tree
and eXtreme gradient boosting model
tree based ensemble model
Computer Sciences
Zhang, Yi
Strategies for Combining Tree-Based Ensemble Models
description Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach.
author Zhang, Yi
author_facet Zhang, Yi
author_sort Zhang, Yi
title Strategies for Combining Tree-Based Ensemble Models
title_short Strategies for Combining Tree-Based Ensemble Models
title_full Strategies for Combining Tree-Based Ensemble Models
title_fullStr Strategies for Combining Tree-Based Ensemble Models
title_full_unstemmed Strategies for Combining Tree-Based Ensemble Models
title_sort strategies for combining tree-based ensemble models
publisher NSUWorks
publishDate 2017
url http://nsuworks.nova.edu/gscis_etd/1021
http://nsuworks.nova.edu/cgi/viewcontent.cgi?article=2019&context=gscis_etd
work_keys_str_mv AT zhangyi strategiesforcombiningtreebasedensemblemodels
_version_ 1718611473737973760