Machine learning ensemble method for discovering knowledge from big data

Big data, generated from various business internet and social media activities, has become a big challenge to researchers in the field of machine learning and data mining to develop new methods and techniques for analysing big data effectively and efficiently. Ensemble methods represent an attractiv...

Full description

Bibliographic Details
Main Author:	Farrash, Majed
Published:	University of East Anglia 2016
Subjects:	006.3
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687927

id	ndltd-bl.uk-oai-ethos.bl.uk-687927
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-6879272017-11-03T03:18:32ZMachine learning ensemble method for discovering knowledge from big dataFarrash, Majed2016Big data, generated from various business internet and social media activities, has become a big challenge to researchers in the field of machine learning and data mining to develop new methods and techniques for analysing big data effectively and efficiently. Ensemble methods represent an attractive approach in dealing with the problem of mining large datasets because of their accuracy and ability of utilizing the divide-and-conquer mechanism in parallel computing environments. This research proposes a machine learning ensemble framework and implements it in a high performance computing environment. This research begins by identifying and categorising the effects of partitioned data subset size on ensemble accuracy when dealing with very large training datasets. Then an algorithm is developed to ascertain the patterns of the relationship between ensemble accuracy and the size of partitioned data subsets. The research concludes with the development of a selective modelling algorithm, which is an efficient alternative to static model selection methods for big datasets. The results show that maximising the size of partitioned data subsets does not necessarily improve the performance of an ensemble of classifiers that deal with large datasets. Identifying the patterns exhibited by the relationship between ensemble accuracy and partitioned data subset size facilitates the determination of the best subset size for partitioning huge training datasets. Finally, traditional model selection is inefficient in cases wherein large datasets are involved.006.3University of East Angliahttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687927https://ueaeprints.uea.ac.uk/59367/Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	006.3
spellingShingle	006.3 Farrash, Majed Machine learning ensemble method for discovering knowledge from big data
description	Big data, generated from various business internet and social media activities, has become a big challenge to researchers in the field of machine learning and data mining to develop new methods and techniques for analysing big data effectively and efficiently. Ensemble methods represent an attractive approach in dealing with the problem of mining large datasets because of their accuracy and ability of utilizing the divide-and-conquer mechanism in parallel computing environments. This research proposes a machine learning ensemble framework and implements it in a high performance computing environment. This research begins by identifying and categorising the effects of partitioned data subset size on ensemble accuracy when dealing with very large training datasets. Then an algorithm is developed to ascertain the patterns of the relationship between ensemble accuracy and the size of partitioned data subsets. The research concludes with the development of a selective modelling algorithm, which is an efficient alternative to static model selection methods for big datasets. The results show that maximising the size of partitioned data subsets does not necessarily improve the performance of an ensemble of classifiers that deal with large datasets. Identifying the patterns exhibited by the relationship between ensemble accuracy and partitioned data subset size facilitates the determination of the best subset size for partitioning huge training datasets. Finally, traditional model selection is inefficient in cases wherein large datasets are involved.
author	Farrash, Majed
author_facet	Farrash, Majed
author_sort	Farrash, Majed
title	Machine learning ensemble method for discovering knowledge from big data
title_short	Machine learning ensemble method for discovering knowledge from big data
title_full	Machine learning ensemble method for discovering knowledge from big data
title_fullStr	Machine learning ensemble method for discovering knowledge from big data
title_full_unstemmed	Machine learning ensemble method for discovering knowledge from big data
title_sort	machine learning ensemble method for discovering knowledge from big data
publisher	University of East Anglia
publishDate	2016
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687927
work_keys_str_mv	AT farrashmajed machinelearningensemblemethodfordiscoveringknowledgefrombigdata
_version_	1718559905779023872

Machine learning ensemble method for discovering knowledge from big data

Similar Items