Improving the Query Efficiency of Log Data on Hadoop through the Bloom Filter

碩士 === 輔仁大學 === 資訊工程學系碩士班 === 107 === Due to the rapid development of the Internet, with the rapid growth of various electronic forms of data, the storage and calculation of Big Data has become an important issue. Hadoop is an open source cloud system platform that includes the HDFS (Hadoop Distrib...

Full description

Bibliographic Details
Main Authors: TSAI, PEI-FENG, 蔡沛峰
Other Authors: YEH, TSO-ZEN
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/a9nvqc
Description
Summary:碩士 === 輔仁大學 === 資訊工程學系碩士班 === 107 === Due to the rapid development of the Internet, with the rapid growth of various electronic forms of data, the storage and calculation of Big Data has become an important issue. Hadoop is an open source cloud system platform that includes the HDFS (Hadoop Distributed File System) and the MapReduce computing framework, providing a viable solution for Big Data storage and computing. One of the common applications of Big Data is to store the plain-text log files into HDFS and use the MapReduce framework to query the data by the feature field. The application characteristic is WORM (Write Once Read Many). Based on the characteristics of this application method, we use the Bloom Filter to process the feature field of the log data, so the HDFS system can reduce the number of file read to retrieve the log files we need.