The benefit research of virtualization for Apache Spark.

碩士 === 中原大學 === 資訊管理研究所 === 105 === Spark distributed system architecture is deployed through virtualisation. In addition to being quick to deploy, this architecture enables the effective usage of a computer’s hardware capacity and the resilient distribution of hardware resources, which reduces hard...

Full description

Bibliographic Details
Main Authors:	SIH YIN HSU, 徐思縯
Other Authors:	Chi-Hli Hung
Format:	Others
Language:	zh-TW
Published:	2017
Online Access:	http://ndltd.ncl.edu.tw/handle/7w33cv

id	ndltd-TW-105CYCU5396029
record_format	oai_dc
spelling	ndltd-TW-105CYCU53960292019-05-15T23:39:16Z http://ndltd.ncl.edu.tw/handle/7w33cv The benefit research of virtualization for Apache Spark. Apache Spark 運用於虛擬化技術之效益研究 SIH YIN HSU 徐思縯碩士中原大學資訊管理研究所 105 Spark distributed system architecture is deployed through virtualisation. In addition to being quick to deploy, this architecture enables the effective usage of a computer’s hardware capacity and the resilient distribution of hardware resources, which reduces hardware costs. This study used the virtualisation technology of Virtual Machine Software to deploy a Spark distributed system and the Hadoop Distributed File System to access data. Data analysis was conducted through a performance analysis of the in-memory computing framework of Spark resilient distributed datasets (RDD). In this research, the two methods of secondary sorting and WordCount combined with Top-K were employed to test performance on a data volume of 300 GB. These two methods were then cross-validated, and the system CPUs, memory, and computing nodes were adjusted according to the experimental phases to determine the optimal hardware configuration. Experimental results verified that using more nodes resulted in more rapid data analysis in a Spark distributed system. However, when processing of a small data volume such as 30 GB was performed, and given that the hardware resources of each node were sufficient, data analysis performance could not be improved further after it had reached a certain threshold. Chi-Hli Hung 洪智力 2017 學位論文 ; thesis 75 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 中原大學 === 資訊管理研究所 === 105 === Spark distributed system architecture is deployed through virtualisation. In addition to being quick to deploy, this architecture enables the effective usage of a computer’s hardware capacity and the resilient distribution of hardware resources, which reduces hardware costs. This study used the virtualisation technology of Virtual Machine Software to deploy a Spark distributed system and the Hadoop Distributed File System to access data. Data analysis was conducted through a performance analysis of the in-memory computing framework of Spark resilient distributed datasets (RDD). In this research, the two methods of secondary sorting and WordCount combined with Top-K were employed to test performance on a data volume of 300 GB. These two methods were then cross-validated, and the system CPUs, memory, and computing nodes were adjusted according to the experimental phases to determine the optimal hardware configuration. Experimental results verified that using more nodes resulted in more rapid data analysis in a Spark distributed system. However, when processing of a small data volume such as 30 GB was performed, and given that the hardware resources of each node were sufficient, data analysis performance could not be improved further after it had reached a certain threshold.
author2	Chi-Hli Hung
author_facet	Chi-Hli Hung SIH YIN HSU 徐思縯
author	SIH YIN HSU 徐思縯
spellingShingle	SIH YIN HSU 徐思縯 The benefit research of virtualization for Apache Spark.
author_sort	SIH YIN HSU
title	The benefit research of virtualization for Apache Spark.
title_short	The benefit research of virtualization for Apache Spark.
title_full	The benefit research of virtualization for Apache Spark.
title_fullStr	The benefit research of virtualization for Apache Spark.
title_full_unstemmed	The benefit research of virtualization for Apache Spark.
title_sort	benefit research of virtualization for apache spark.
publishDate	2017
url	http://ndltd.ncl.edu.tw/handle/7w33cv
work_keys_str_mv	AT sihyinhsu thebenefitresearchofvirtualizationforapachespark AT xúsīyǎn thebenefitresearchofvirtualizationforapachespark AT sihyinhsu apachesparkyùnyòngyúxūnǐhuàjìshùzhīxiàoyìyánjiū AT xúsīyǎn apachesparkyùnyòngyúxūnǐhuàjìshùzhīxiàoyìyánjiū AT sihyinhsu benefitresearchofvirtualizationforapachespark AT xúsīyǎn benefitresearchofvirtualizationforapachespark
_version_	1719150667887542272

The benefit research of virtualization for Apache Spark.

Similar Items