Strategies for Combining Tree-Based Ensemble Models

Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and inp...

Full description

Bibliographic Details
Main Author:	Zhang, Yi
Format:	Others
Published:	NSUWorks 2017
Subjects:	ensemble models model selection multiple correspondence analysis predictive models random forest extremely randomized tree and eXtreme gradient boosting model tree based ensemble model Computer Sciences
Online Access:	http://nsuworks.nova.edu/gscis_etd/1021 http://nsuworks.nova.edu/cgi/viewcontent.cgi?article=2019&context=gscis_etd

id	ndltd-nova.edu-oai-nsuworks.nova.edu-gscis_etd-2019
record_format	oai_dc
spelling	ndltd-nova.edu-oai-nsuworks.nova.edu-gscis_etd-20192018-01-20T15:54:23Z Strategies for Combining Tree-Based Ensemble Models Zhang, Yi Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach. 2017-01-01T08:00:00Z text application/pdf http://nsuworks.nova.edu/gscis_etd/1021 http://nsuworks.nova.edu/cgi/viewcontent.cgi?article=2019&context=gscis_etd CEC Theses and Dissertations NSUWorks ensemble models model selection multiple correspondence analysis predictive models random forest extremely randomized tree and eXtreme gradient boosting model tree based ensemble model Computer Sciences
collection	NDLTD
format	Others
sources	NDLTD
topic	ensemble models model selection multiple correspondence analysis predictive models random forest extremely randomized tree and eXtreme gradient boosting model tree based ensemble model Computer Sciences
spellingShingle	ensemble models model selection multiple correspondence analysis predictive models random forest extremely randomized tree and eXtreme gradient boosting model tree based ensemble model Computer Sciences Zhang, Yi Strategies for Combining Tree-Based Ensemble Models
description	Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach.
author	Zhang, Yi
author_facet	Zhang, Yi
author_sort	Zhang, Yi
title	Strategies for Combining Tree-Based Ensemble Models
title_short	Strategies for Combining Tree-Based Ensemble Models
title_full	Strategies for Combining Tree-Based Ensemble Models
title_fullStr	Strategies for Combining Tree-Based Ensemble Models
title_full_unstemmed	Strategies for Combining Tree-Based Ensemble Models
title_sort	strategies for combining tree-based ensemble models
publisher	NSUWorks
publishDate	2017
url	http://nsuworks.nova.edu/gscis_etd/1021 http://nsuworks.nova.edu/cgi/viewcontent.cgi?article=2019&context=gscis_etd
work_keys_str_mv	AT zhangyi strategiesforcombiningtreebasedensemblemodels
_version_	1718611473737973760

Strategies for Combining Tree-Based Ensemble Models

Similar Items