Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay

碩士 === 國立清華大學 === 資訊工程學系 === 104 === In recent year, data increases in a rapid speed. With this trend, BigData becomes a signicant knowledge and the need for large scale storage and computing cluster grows up too. Because not every user has enough funds to support large amount of computers, more and...

Full description

Bibliographic Details
Main Authors:	Wei, Wei Che, 魏偉哲
Other Authors:	Chou, Chi Yuan
Format:	Others
Language:	en_US
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/25286565634431082800

id	ndltd-TW-104NTHU5392033
record_format	oai_dc
spelling	ndltd-TW-104NTHU53920332017-08-27T04:29:59Z http://ndltd.ncl.edu.tw/handle/25286565634431082800 Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay 利用維持資料局部性及減少資料傳輸延遲提升雲平台之資料處理效能 Wei, Wei Che 魏偉哲碩士國立清華大學資訊工程學系 104 In recent year, data increases in a rapid speed. With this trend, BigData becomes a signicant knowledge and the need for large scale storage and computing cluster grows up too. Because not every user has enough funds to support large amount of computers, more and more company, like Amazon, Microsoft, begin to build a plat- form with many services on cloud and provide on demand service for users. However, these cloud providers usually separate those dierent kinds of services independently in order to price each service individually. For example, they will provide a storage service, a virtual machine service or a simple cluster service while these services are all independent. In a general use case, user will need to store their data in a high reliability and scalability storage system and build a computing cluster above it to analysis those data. It is not convenient to use the storage service and computing cluster service in such situation. Therefore, we develop a service to integrate these two kinds of services well and propose a data pipeline scheduling service for this scenario dealing with multiple jobs on Amazon Web Service. Beside providing a simple way to use Elastic MapReduce, the computing cluster service provided by Amazon, this service also have a good performance improvement over the basic use case proposed by Amazon. Chou, Chi Yuan 周志遠 2016 學位論文 ; thesis 54 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立清華大學 === 資訊工程學系 === 104 === In recent year, data increases in a rapid speed. With this trend, BigData becomes a signicant knowledge and the need for large scale storage and computing cluster grows up too. Because not every user has enough funds to support large amount of computers, more and more company, like Amazon, Microsoft, begin to build a plat- form with many services on cloud and provide on demand service for users. However, these cloud providers usually separate those dierent kinds of services independently in order to price each service individually. For example, they will provide a storage service, a virtual machine service or a simple cluster service while these services are all independent. In a general use case, user will need to store their data in a high reliability and scalability storage system and build a computing cluster above it to analysis those data. It is not convenient to use the storage service and computing cluster service in such situation. Therefore, we develop a service to integrate these two kinds of services well and propose a data pipeline scheduling service for this scenario dealing with multiple jobs on Amazon Web Service. Beside providing a simple way to use Elastic MapReduce, the computing cluster service provided by Amazon, this service also have a good performance improvement over the basic use case proposed by Amazon.
author2	Chou, Chi Yuan
author_facet	Chou, Chi Yuan Wei, Wei Che 魏偉哲
author	Wei, Wei Che 魏偉哲
spellingShingle	Wei, Wei Che 魏偉哲 Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay
author_sort	Wei, Wei Che
title	Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay
title_short	Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay
title_full	Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay
title_fullStr	Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay
title_full_unstemmed	Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay
title_sort	maximize data processing throughput on cloud via exploiting data locality and minimizing data transfer delay
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/25286565634431082800
work_keys_str_mv	AT weiweiche maximizedataprocessingthroughputoncloudviaexploitingdatalocalityandminimizingdatatransferdelay AT wèiwěizhé maximizedataprocessingthroughputoncloudviaexploitingdatalocalityandminimizingdatatransferdelay AT weiweiche lìyòngwéichízīliàojúbùxìngjíjiǎnshǎozīliàochuánshūyánchítíshēngyúnpíngtáizhīzīliàochùlǐxiàonéng AT wèiwěizhé lìyòngwéichízīliàojúbùxìngjíjiǎnshǎozīliàochuánshūyánchítíshēngyúnpíngtáizhīzīliàochùlǐxiàonéng
_version_	1718519356300722176

Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay

Similar Items