Machine learning ensemble method for discovering knowledge from big data
Big data, generated from various business internet and social media activities, has become a big challenge to researchers in the field of machine learning and data mining to develop new methods and techniques for analysing big data effectively and efficiently. Ensemble methods represent an attractiv...
Main Author: | |
---|---|
Published: |
University of East Anglia
2016
|
Subjects: | |
Online Access: | http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687927 |
id |
ndltd-bl.uk-oai-ethos.bl.uk-687927 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-bl.uk-oai-ethos.bl.uk-6879272017-11-03T03:18:32ZMachine learning ensemble method for discovering knowledge from big dataFarrash, Majed2016Big data, generated from various business internet and social media activities, has become a big challenge to researchers in the field of machine learning and data mining to develop new methods and techniques for analysing big data effectively and efficiently. Ensemble methods represent an attractive approach in dealing with the problem of mining large datasets because of their accuracy and ability of utilizing the divide-and-conquer mechanism in parallel computing environments. This research proposes a machine learning ensemble framework and implements it in a high performance computing environment. This research begins by identifying and categorising the effects of partitioned data subset size on ensemble accuracy when dealing with very large training datasets. Then an algorithm is developed to ascertain the patterns of the relationship between ensemble accuracy and the size of partitioned data subsets. The research concludes with the development of a selective modelling algorithm, which is an efficient alternative to static model selection methods for big datasets. The results show that maximising the size of partitioned data subsets does not necessarily improve the performance of an ensemble of classifiers that deal with large datasets. Identifying the patterns exhibited by the relationship between ensemble accuracy and partitioned data subset size facilitates the determination of the best subset size for partitioning huge training datasets. Finally, traditional model selection is inefficient in cases wherein large datasets are involved.006.3University of East Angliahttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687927https://ueaeprints.uea.ac.uk/59367/Electronic Thesis or Dissertation |
collection |
NDLTD |
sources |
NDLTD |
topic |
006.3 |
spellingShingle |
006.3 Farrash, Majed Machine learning ensemble method for discovering knowledge from big data |
description |
Big data, generated from various business internet and social media activities, has become a big challenge to researchers in the field of machine learning and data mining to develop new methods and techniques for analysing big data effectively and efficiently. Ensemble methods represent an attractive approach in dealing with the problem of mining large datasets because of their accuracy and ability of utilizing the divide-and-conquer mechanism in parallel computing environments. This research proposes a machine learning ensemble framework and implements it in a high performance computing environment. This research begins by identifying and categorising the effects of partitioned data subset size on ensemble accuracy when dealing with very large training datasets. Then an algorithm is developed to ascertain the patterns of the relationship between ensemble accuracy and the size of partitioned data subsets. The research concludes with the development of a selective modelling algorithm, which is an efficient alternative to static model selection methods for big datasets. The results show that maximising the size of partitioned data subsets does not necessarily improve the performance of an ensemble of classifiers that deal with large datasets. Identifying the patterns exhibited by the relationship between ensemble accuracy and partitioned data subset size facilitates the determination of the best subset size for partitioning huge training datasets. Finally, traditional model selection is inefficient in cases wherein large datasets are involved. |
author |
Farrash, Majed |
author_facet |
Farrash, Majed |
author_sort |
Farrash, Majed |
title |
Machine learning ensemble method for discovering knowledge from big data |
title_short |
Machine learning ensemble method for discovering knowledge from big data |
title_full |
Machine learning ensemble method for discovering knowledge from big data |
title_fullStr |
Machine learning ensemble method for discovering knowledge from big data |
title_full_unstemmed |
Machine learning ensemble method for discovering knowledge from big data |
title_sort |
machine learning ensemble method for discovering knowledge from big data |
publisher |
University of East Anglia |
publishDate |
2016 |
url |
http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687927 |
work_keys_str_mv |
AT farrashmajed machinelearningensemblemethodfordiscoveringknowledgefrombigdata |
_version_ |
1718559905779023872 |