Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay

碩士 === 國立清華大學 === 資訊工程學系 === 104 === In recent year, data increases in a rapid speed. With this trend, BigData becomes a signicant knowledge and the need for large scale storage and computing cluster grows up too. Because not every user has enough funds to support large amount of computers, more and...

Full description

Bibliographic Details
Main Authors: Wei, Wei Che, 魏偉哲
Other Authors: Chou, Chi Yuan
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/25286565634431082800
id ndltd-TW-104NTHU5392033
record_format oai_dc
spelling ndltd-TW-104NTHU53920332017-08-27T04:29:59Z http://ndltd.ncl.edu.tw/handle/25286565634431082800 Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay 利用維持資料局部性及減少資料傳輸延遲提升雲平台之資料處理效能 Wei, Wei Che 魏偉哲 碩士 國立清華大學 資訊工程學系 104 In recent year, data increases in a rapid speed. With this trend, BigData becomes a signicant knowledge and the need for large scale storage and computing cluster grows up too. Because not every user has enough funds to support large amount of computers, more and more company, like Amazon, Microsoft, begin to build a plat- form with many services on cloud and provide on demand service for users. However, these cloud providers usually separate those dierent kinds of services independently in order to price each service individually. For example, they will provide a storage service, a virtual machine service or a simple cluster service while these services are all independent. In a general use case, user will need to store their data in a high reliability and scalability storage system and build a computing cluster above it to analysis those data. It is not convenient to use the storage service and computing cluster service in such situation. Therefore, we develop a service to integrate these two kinds of services well and propose a data pipeline scheduling service for this scenario dealing with multiple jobs on Amazon Web Service. Beside providing a simple way to use Elastic MapReduce, the computing cluster service provided by Amazon, this service also have a good performance improvement over the basic use case proposed by Amazon. Chou, Chi Yuan 周志遠 2016 學位論文 ; thesis 54 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立清華大學 === 資訊工程學系 === 104 === In recent year, data increases in a rapid speed. With this trend, BigData becomes a signicant knowledge and the need for large scale storage and computing cluster grows up too. Because not every user has enough funds to support large amount of computers, more and more company, like Amazon, Microsoft, begin to build a plat- form with many services on cloud and provide on demand service for users. However, these cloud providers usually separate those dierent kinds of services independently in order to price each service individually. For example, they will provide a storage service, a virtual machine service or a simple cluster service while these services are all independent. In a general use case, user will need to store their data in a high reliability and scalability storage system and build a computing cluster above it to analysis those data. It is not convenient to use the storage service and computing cluster service in such situation. Therefore, we develop a service to integrate these two kinds of services well and propose a data pipeline scheduling service for this scenario dealing with multiple jobs on Amazon Web Service. Beside providing a simple way to use Elastic MapReduce, the computing cluster service provided by Amazon, this service also have a good performance improvement over the basic use case proposed by Amazon.
author2 Chou, Chi Yuan
author_facet Chou, Chi Yuan
Wei, Wei Che
魏偉哲
author Wei, Wei Che
魏偉哲
spellingShingle Wei, Wei Che
魏偉哲
Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay
author_sort Wei, Wei Che
title Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay
title_short Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay
title_full Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay
title_fullStr Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay
title_full_unstemmed Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay
title_sort maximize data processing throughput on cloud via exploiting data locality and minimizing data transfer delay
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/25286565634431082800
work_keys_str_mv AT weiweiche maximizedataprocessingthroughputoncloudviaexploitingdatalocalityandminimizingdatatransferdelay
AT wèiwěizhé maximizedataprocessingthroughputoncloudviaexploitingdatalocalityandminimizingdatatransferdelay
AT weiweiche lìyòngwéichízīliàojúbùxìngjíjiǎnshǎozīliàochuánshūyánchítíshēngyúnpíngtáizhīzīliàochùlǐxiàonéng
AT wèiwěizhé lìyòngwéichízīliàojúbùxìngjíjiǎnshǎozīliàochuánshūyánchítíshēngyúnpíngtáizhīzīliàochùlǐxiàonéng
_version_ 1718519356300722176