Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay
碩士 === 國立清華大學 === 資訊工程學系 === 104 === In recent year, data increases in a rapid speed. With this trend, BigData becomes a signicant knowledge and the need for large scale storage and computing cluster grows up too. Because not every user has enough funds to support large amount of computers, more and...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2016
|
Online Access: | http://ndltd.ncl.edu.tw/handle/25286565634431082800 |
id |
ndltd-TW-104NTHU5392033 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-104NTHU53920332017-08-27T04:29:59Z http://ndltd.ncl.edu.tw/handle/25286565634431082800 Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay 利用維持資料局部性及減少資料傳輸延遲提升雲平台之資料處理效能 Wei, Wei Che 魏偉哲 碩士 國立清華大學 資訊工程學系 104 In recent year, data increases in a rapid speed. With this trend, BigData becomes a signicant knowledge and the need for large scale storage and computing cluster grows up too. Because not every user has enough funds to support large amount of computers, more and more company, like Amazon, Microsoft, begin to build a plat- form with many services on cloud and provide on demand service for users. However, these cloud providers usually separate those dierent kinds of services independently in order to price each service individually. For example, they will provide a storage service, a virtual machine service or a simple cluster service while these services are all independent. In a general use case, user will need to store their data in a high reliability and scalability storage system and build a computing cluster above it to analysis those data. It is not convenient to use the storage service and computing cluster service in such situation. Therefore, we develop a service to integrate these two kinds of services well and propose a data pipeline scheduling service for this scenario dealing with multiple jobs on Amazon Web Service. Beside providing a simple way to use Elastic MapReduce, the computing cluster service provided by Amazon, this service also have a good performance improvement over the basic use case proposed by Amazon. Chou, Chi Yuan 周志遠 2016 學位論文 ; thesis 54 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立清華大學 === 資訊工程學系 === 104 === In recent year, data increases in a rapid speed. With this trend, BigData becomes
a signicant knowledge and the need for large scale storage and computing cluster
grows up too. Because not every user has enough funds to support large amount of
computers, more and more company, like Amazon, Microsoft, begin to build a plat-
form with many services on cloud and provide on demand service for users. However,
these cloud providers usually separate those dierent kinds of services independently
in order to price each service individually. For example, they will provide a storage
service, a virtual machine service or a simple cluster service while these services are
all independent. In a general use case, user will need to store their data in a high
reliability and scalability storage system and build a computing cluster above it to
analysis those data. It is not convenient to use the storage service and computing
cluster service in such situation. Therefore, we develop a service to integrate these
two kinds of services well and propose a data pipeline scheduling service for this
scenario dealing with multiple jobs on Amazon Web Service. Beside providing a
simple way to use Elastic MapReduce, the computing cluster service provided by
Amazon, this service also have a good performance improvement over the basic use
case proposed by Amazon.
|
author2 |
Chou, Chi Yuan |
author_facet |
Chou, Chi Yuan Wei, Wei Che 魏偉哲 |
author |
Wei, Wei Che 魏偉哲 |
spellingShingle |
Wei, Wei Che 魏偉哲 Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay |
author_sort |
Wei, Wei Che |
title |
Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay |
title_short |
Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay |
title_full |
Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay |
title_fullStr |
Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay |
title_full_unstemmed |
Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay |
title_sort |
maximize data processing throughput on cloud via exploiting data locality and minimizing data transfer delay |
publishDate |
2016 |
url |
http://ndltd.ncl.edu.tw/handle/25286565634431082800 |
work_keys_str_mv |
AT weiweiche maximizedataprocessingthroughputoncloudviaexploitingdatalocalityandminimizingdatatransferdelay AT wèiwěizhé maximizedataprocessingthroughputoncloudviaexploitingdatalocalityandminimizingdatatransferdelay AT weiweiche lìyòngwéichízīliàojúbùxìngjíjiǎnshǎozīliàochuánshūyánchítíshēngyúnpíngtáizhīzīliàochùlǐxiàonéng AT wèiwěizhé lìyòngwéichízīliàojúbùxìngjíjiǎnshǎozīliàochuánshūyánchítíshēngyúnpíngtáizhīzīliàochùlǐxiàonéng |
_version_ |
1718519356300722176 |