Parallel Genetic-Fuzzy Mining with MapReduce Architecture

碩士 === 國立中山大學 === 資訊工程學系研究所 === 103 === Fuzzy data mining can successfully find out hidden linguistic association rules by transforming quantity information into fuzzy membership values. In the derivation process, good membership functions play a key role in achieving the quality of finial results....

Full description

Bibliographic Details
Main Authors: Yu-Yang Liu, 劉育瑒
Other Authors: Tzung-Pei Hong
Format: Others
Language:en_US
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/eq783m
Description
Summary:碩士 === 國立中山大學 === 資訊工程學系研究所 === 103 === Fuzzy data mining can successfully find out hidden linguistic association rules by transforming quantity information into fuzzy membership values. In the derivation process, good membership functions play a key role in achieving the quality of finial results. In the past, some researches were proposed to train membership functions by genetic algorithms and could indeed improve the quality of found rules. Those kinds of methods were, however, suffered from the long execution time in the training phase. Besides, after appropriate fuzzy membership functions are found, mining out the frequent itemsets from them is also a very time-consuming process as traditional data mining. In this thesis, we thus propose a series of approaches based on the MapReduce architecture to speed up the GA-fuzzy mining process. The contributions can be divided into three parts, including data preprocessing, membership-function training by GA, and fuzzy association-rule derivation. All are performed by MapReduce. For data preprocessing, the proposed approach can not only transform the original data into key-value format to fit the requirement of MapReduce, but also efficiently reduce the redundant database scan by joining the quantities into lists. For membership-function training by GA, the fitness evaluation, which is the most time-costly process, is distributed to shorten the execution time. At last, a distributed fuzzy rule mining approach based on FP-growth is designed to improve the time efficiency of finding fuzzy association rules. The performance between using a single processor and using MapReduce will be compared and discussed from experiments and the results show that our approaches can efficiently reduce the execution time of the whole process.