Strategies for Combining Tree-Based Ensemble Models
Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and inp...
Main Author: | |
---|---|
Format: | Others |
Published: |
NSUWorks
2017
|
Subjects: | |
Online Access: | http://nsuworks.nova.edu/gscis_etd/1021 http://nsuworks.nova.edu/cgi/viewcontent.cgi?article=2019&context=gscis_etd |
id |
ndltd-nova.edu-oai-nsuworks.nova.edu-gscis_etd-2019 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-nova.edu-oai-nsuworks.nova.edu-gscis_etd-20192018-01-20T15:54:23Z Strategies for Combining Tree-Based Ensemble Models Zhang, Yi Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach. 2017-01-01T08:00:00Z text application/pdf http://nsuworks.nova.edu/gscis_etd/1021 http://nsuworks.nova.edu/cgi/viewcontent.cgi?article=2019&context=gscis_etd CEC Theses and Dissertations NSUWorks ensemble models model selection multiple correspondence analysis predictive models random forest extremely randomized tree and eXtreme gradient boosting model tree based ensemble model Computer Sciences |
collection |
NDLTD |
format |
Others
|
sources |
NDLTD |
topic |
ensemble models model selection multiple correspondence analysis predictive models random forest extremely randomized tree and eXtreme gradient boosting model tree based ensemble model Computer Sciences |
spellingShingle |
ensemble models model selection multiple correspondence analysis predictive models random forest extremely randomized tree and eXtreme gradient boosting model tree based ensemble model Computer Sciences Zhang, Yi Strategies for Combining Tree-Based Ensemble Models |
description |
Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach. |
author |
Zhang, Yi |
author_facet |
Zhang, Yi |
author_sort |
Zhang, Yi |
title |
Strategies for Combining Tree-Based Ensemble Models |
title_short |
Strategies for Combining Tree-Based Ensemble Models |
title_full |
Strategies for Combining Tree-Based Ensemble Models |
title_fullStr |
Strategies for Combining Tree-Based Ensemble Models |
title_full_unstemmed |
Strategies for Combining Tree-Based Ensemble Models |
title_sort |
strategies for combining tree-based ensemble models |
publisher |
NSUWorks |
publishDate |
2017 |
url |
http://nsuworks.nova.edu/gscis_etd/1021 http://nsuworks.nova.edu/cgi/viewcontent.cgi?article=2019&context=gscis_etd |
work_keys_str_mv |
AT zhangyi strategiesforcombiningtreebasedensemblemodels |
_version_ |
1718611473737973760 |