Machine learning ensemble method for discovering knowledge from big data

Big data, generated from various business internet and social media activities, has become a big challenge to researchers in the field of machine learning and data mining to develop new methods and techniques for analysing big data effectively and efficiently. Ensemble methods represent an attractiv...

Full description

Bibliographic Details
Main Author: Farrash, Majed
Published: University of East Anglia 2016
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687927
id ndltd-bl.uk-oai-ethos.bl.uk-687927
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-6879272017-11-03T03:18:32ZMachine learning ensemble method for discovering knowledge from big dataFarrash, Majed2016Big data, generated from various business internet and social media activities, has become a big challenge to researchers in the field of machine learning and data mining to develop new methods and techniques for analysing big data effectively and efficiently. Ensemble methods represent an attractive approach in dealing with the problem of mining large datasets because of their accuracy and ability of utilizing the divide-and-conquer mechanism in parallel computing environments. This research proposes a machine learning ensemble framework and implements it in a high performance computing environment. This research begins by identifying and categorising the effects of partitioned data subset size on ensemble accuracy when dealing with very large training datasets. Then an algorithm is developed to ascertain the patterns of the relationship between ensemble accuracy and the size of partitioned data subsets. The research concludes with the development of a selective modelling algorithm, which is an efficient alternative to static model selection methods for big datasets. The results show that maximising the size of partitioned data subsets does not necessarily improve the performance of an ensemble of classifiers that deal with large datasets. Identifying the patterns exhibited by the relationship between ensemble accuracy and partitioned data subset size facilitates the determination of the best subset size for partitioning huge training datasets. Finally, traditional model selection is inefficient in cases wherein large datasets are involved.006.3University of East Angliahttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687927https://ueaeprints.uea.ac.uk/59367/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 006.3
spellingShingle 006.3
Farrash, Majed
Machine learning ensemble method for discovering knowledge from big data
description Big data, generated from various business internet and social media activities, has become a big challenge to researchers in the field of machine learning and data mining to develop new methods and techniques for analysing big data effectively and efficiently. Ensemble methods represent an attractive approach in dealing with the problem of mining large datasets because of their accuracy and ability of utilizing the divide-and-conquer mechanism in parallel computing environments. This research proposes a machine learning ensemble framework and implements it in a high performance computing environment. This research begins by identifying and categorising the effects of partitioned data subset size on ensemble accuracy when dealing with very large training datasets. Then an algorithm is developed to ascertain the patterns of the relationship between ensemble accuracy and the size of partitioned data subsets. The research concludes with the development of a selective modelling algorithm, which is an efficient alternative to static model selection methods for big datasets. The results show that maximising the size of partitioned data subsets does not necessarily improve the performance of an ensemble of classifiers that deal with large datasets. Identifying the patterns exhibited by the relationship between ensemble accuracy and partitioned data subset size facilitates the determination of the best subset size for partitioning huge training datasets. Finally, traditional model selection is inefficient in cases wherein large datasets are involved.
author Farrash, Majed
author_facet Farrash, Majed
author_sort Farrash, Majed
title Machine learning ensemble method for discovering knowledge from big data
title_short Machine learning ensemble method for discovering knowledge from big data
title_full Machine learning ensemble method for discovering knowledge from big data
title_fullStr Machine learning ensemble method for discovering knowledge from big data
title_full_unstemmed Machine learning ensemble method for discovering knowledge from big data
title_sort machine learning ensemble method for discovering knowledge from big data
publisher University of East Anglia
publishDate 2016
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687927
work_keys_str_mv AT farrashmajed machinelearningensemblemethodfordiscoveringknowledgefrombigdata
_version_ 1718559905779023872