A Cross-Platform Environment for Big Data Analytics Using RHadoop

碩士 === 中華大學 === 資訊管理學系 === 104 === The quantity of data has increased dramatically with the advancing of IT technology. Traditional data analysis methods are no long enough to deal with such quantity of data. That is why big data analysis becomes one of the hottest fields. However, most of the s...

Full description

Bibliographic Details
Main Authors: Ho, Li-Fung, 何立峰
Other Authors: Wang, Su-Hua
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/85246435709374365839
id ndltd-TW-104CHPI0396003
record_format oai_dc
spelling ndltd-TW-104CHPI03960032017-10-29T04:34:38Z http://ndltd.ncl.edu.tw/handle/85246435709374365839 A Cross-Platform Environment for Big Data Analytics Using RHadoop 跨平台之RHadoop巨量資料分析環境 Ho, Li-Fung 何立峰 碩士 中華大學 資訊管理學系 104 The quantity of data has increased dramatically with the advancing of IT technology. Traditional data analysis methods are no long enough to deal with such quantity of data. That is why big data analysis becomes one of the hottest fields. However, most of the software programs available in the market are expensive, and people turn and look for open-source software for big data analysis. On the other hand, the open-source software programs available are skill-demanding and do not have enough functions for people’s needs. As a result, users have to have the professional ability and skills to use them effectively. The R programming language is one of the most commonly used open-source programs for data analysis. It satisfies the analysis needs for most fields of study, but users have to deal with two major limits; one is that users have to have an advanced level of programming ability to program with R, which makes it difficult for the users who are inexperienced in programming; The other is that R computes as fast as the machine on which it is installed, which is not enough to deal with the real-time demands for big data. Apache Hadoop, a distributed computing platform, provides the perfect operating environment, which makes it the top choice of open-source software for big data. However, users still have two obstacles to overcome; one is that Hadoop provides only limited machine learning and analysis capability, which is not enough for analysis needs; and the other is that it only works on Linux environment, which often deters those who are not familiar with Linux. Facing the inherent restrictions of R and Hadoop in big data, this study was intended to combine R and Hadoop on the Linux system using RHadoop as an attempt to have them complement each other. SSH, the inter-platform communication technique was introduced on Windows to develop an cross-platform environment framework for big data analysis. This solved the difficulties in working with Linux while allowing users to build R-based analysis scripts simply by choosing certain options. Then the SSH automatically allowed R that was combined with Hadoop at the Linux end for analysis and showed the analysis results to users, thus eliminating the restriction that users also have to be a programmer. Finally, the functions of analysis method management were provided to make the analysis methods of this framework more expandable. To validate the feasibility of cross-platform big data analysis environment framework, virtual machines were established to test this cross-platform framework. The test results suggested that it was capable of establishing analysis scripts at the Windows end and performing analysis at the Linux end through SSH, while allowing the management of analysis under this framework using the method management functions for expansion. This inter-platform big data analysis environment will be able to provide a complete environment for big data analysis and lower the technical barriers for analysts, which will help those who would like to conduct big data analysis but have difficulty in programming or the limited budget for the analysis to catch this big data wave. Wang, Su-Hua 王素華 2016 學位論文 ; thesis 76 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 中華大學 === 資訊管理學系 === 104 === The quantity of data has increased dramatically with the advancing of IT technology. Traditional data analysis methods are no long enough to deal with such quantity of data. That is why big data analysis becomes one of the hottest fields. However, most of the software programs available in the market are expensive, and people turn and look for open-source software for big data analysis. On the other hand, the open-source software programs available are skill-demanding and do not have enough functions for people’s needs. As a result, users have to have the professional ability and skills to use them effectively. The R programming language is one of the most commonly used open-source programs for data analysis. It satisfies the analysis needs for most fields of study, but users have to deal with two major limits; one is that users have to have an advanced level of programming ability to program with R, which makes it difficult for the users who are inexperienced in programming; The other is that R computes as fast as the machine on which it is installed, which is not enough to deal with the real-time demands for big data. Apache Hadoop, a distributed computing platform, provides the perfect operating environment, which makes it the top choice of open-source software for big data. However, users still have two obstacles to overcome; one is that Hadoop provides only limited machine learning and analysis capability, which is not enough for analysis needs; and the other is that it only works on Linux environment, which often deters those who are not familiar with Linux. Facing the inherent restrictions of R and Hadoop in big data, this study was intended to combine R and Hadoop on the Linux system using RHadoop as an attempt to have them complement each other. SSH, the inter-platform communication technique was introduced on Windows to develop an cross-platform environment framework for big data analysis. This solved the difficulties in working with Linux while allowing users to build R-based analysis scripts simply by choosing certain options. Then the SSH automatically allowed R that was combined with Hadoop at the Linux end for analysis and showed the analysis results to users, thus eliminating the restriction that users also have to be a programmer. Finally, the functions of analysis method management were provided to make the analysis methods of this framework more expandable. To validate the feasibility of cross-platform big data analysis environment framework, virtual machines were established to test this cross-platform framework. The test results suggested that it was capable of establishing analysis scripts at the Windows end and performing analysis at the Linux end through SSH, while allowing the management of analysis under this framework using the method management functions for expansion. This inter-platform big data analysis environment will be able to provide a complete environment for big data analysis and lower the technical barriers for analysts, which will help those who would like to conduct big data analysis but have difficulty in programming or the limited budget for the analysis to catch this big data wave.
author2 Wang, Su-Hua
author_facet Wang, Su-Hua
Ho, Li-Fung
何立峰
author Ho, Li-Fung
何立峰
spellingShingle Ho, Li-Fung
何立峰
A Cross-Platform Environment for Big Data Analytics Using RHadoop
author_sort Ho, Li-Fung
title A Cross-Platform Environment for Big Data Analytics Using RHadoop
title_short A Cross-Platform Environment for Big Data Analytics Using RHadoop
title_full A Cross-Platform Environment for Big Data Analytics Using RHadoop
title_fullStr A Cross-Platform Environment for Big Data Analytics Using RHadoop
title_full_unstemmed A Cross-Platform Environment for Big Data Analytics Using RHadoop
title_sort cross-platform environment for big data analytics using rhadoop
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/85246435709374365839
work_keys_str_mv AT holifung acrossplatformenvironmentforbigdataanalyticsusingrhadoop
AT hélìfēng acrossplatformenvironmentforbigdataanalyticsusingrhadoop
AT holifung kuàpíngtáizhīrhadoopjùliàngzīliàofēnxīhuánjìng
AT hélìfēng kuàpíngtáizhīrhadoopjùliàngzīliàofēnxīhuánjìng
AT holifung crossplatformenvironmentforbigdataanalyticsusingrhadoop
AT hélìfēng crossplatformenvironmentforbigdataanalyticsusingrhadoop
_version_ 1718557764442128384