Evaluating and improving the performances of job execution and data transmission in data grids, YouTube, and Hadoop YARN

博士 === 國立交通大學 === 資訊科學與工程研究所 === 103 === Recently, several different types of distributed computation and storage systems, such as data grids, YouTube, and Hadoop YARN system, have been widely employed around the world to respectively resolve complex scientific computation and storage problems, enab...

Full description

Bibliographic Details
Main Authors: Lee, Ming-Chang, 李明昌
Other Authors: Chen, Ying-ping
Format: Others
Language:en_US
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/99808945472329041277
id ndltd-TW-103NCTU5394076
record_format oai_dc
spelling ndltd-TW-103NCTU53940762016-07-02T04:29:08Z http://ndltd.ncl.edu.tw/handle/99808945472329041277 Evaluating and improving the performances of job execution and data transmission in data grids, YouTube, and Hadoop YARN 評估與改善資料網格、YouTube與Hadoop YARN工作執行與資料傳輸效能之研究 Lee, Ming-Chang 李明昌 博士 國立交通大學 資訊科學與工程研究所 103 Recently, several different types of distributed computation and storage systems, such as data grids, YouTube, and Hadoop YARN system, have been widely employed around the world to respectively resolve complex scientific computation and storage problems, enable people to share videos, and process large scale of data and applications. In the above systems, bandwidth consumption and job execution performance are very important two issues. In data grids, several data replication algorithms have been proposed to shorten file transmission time, improve data access performance, and reduce bandwidth consumption. But none of them considers data access patterns, i.e., users’ access behaviors, which causes that data grids has longer data transmission delays and higher bandwidth consumption. YouTube utilizes a distributed memory caching scheme named Memcached to cache videos, and employs the least-recently-used (LRU for short) cache replacement algorithm to evict videos when Memcached runs out of space. However, LRU might increase network overhead and video retrieval time. On the other hand, Hadoop YARN provides several scheduling policies and supports queue hierarchy, while the corresponding impacts on different types of applications that are executable on Hadoop YARN are unknown. In order to solve the aforementioned problems, in this dissertation, we propose a Popular File Replicate First algorithm (PFRF for short) considered user access behavior to improve job turnaround time, data availability, and bandwidth cost in data grids. Next, we propose two Pareto-based algorithms for YouTube to reduce video fault and shorten video-retrieval time so that the network overhead and video retrieval time in YouTube can be improved. Finally, we study how the scheduling-policy combinations (SPCs for short) supported by Hadoop YARN with several different queue structures impact the performance of various types of applications. Chen, Ying-ping Leu, Fang-Yie 陳穎平 呂芳懌 2015 學位論文 ; thesis 81 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立交通大學 === 資訊科學與工程研究所 === 103 === Recently, several different types of distributed computation and storage systems, such as data grids, YouTube, and Hadoop YARN system, have been widely employed around the world to respectively resolve complex scientific computation and storage problems, enable people to share videos, and process large scale of data and applications. In the above systems, bandwidth consumption and job execution performance are very important two issues. In data grids, several data replication algorithms have been proposed to shorten file transmission time, improve data access performance, and reduce bandwidth consumption. But none of them considers data access patterns, i.e., users’ access behaviors, which causes that data grids has longer data transmission delays and higher bandwidth consumption. YouTube utilizes a distributed memory caching scheme named Memcached to cache videos, and employs the least-recently-used (LRU for short) cache replacement algorithm to evict videos when Memcached runs out of space. However, LRU might increase network overhead and video retrieval time. On the other hand, Hadoop YARN provides several scheduling policies and supports queue hierarchy, while the corresponding impacts on different types of applications that are executable on Hadoop YARN are unknown. In order to solve the aforementioned problems, in this dissertation, we propose a Popular File Replicate First algorithm (PFRF for short) considered user access behavior to improve job turnaround time, data availability, and bandwidth cost in data grids. Next, we propose two Pareto-based algorithms for YouTube to reduce video fault and shorten video-retrieval time so that the network overhead and video retrieval time in YouTube can be improved. Finally, we study how the scheduling-policy combinations (SPCs for short) supported by Hadoop YARN with several different queue structures impact the performance of various types of applications.
author2 Chen, Ying-ping
author_facet Chen, Ying-ping
Lee, Ming-Chang
李明昌
author Lee, Ming-Chang
李明昌
spellingShingle Lee, Ming-Chang
李明昌
Evaluating and improving the performances of job execution and data transmission in data grids, YouTube, and Hadoop YARN
author_sort Lee, Ming-Chang
title Evaluating and improving the performances of job execution and data transmission in data grids, YouTube, and Hadoop YARN
title_short Evaluating and improving the performances of job execution and data transmission in data grids, YouTube, and Hadoop YARN
title_full Evaluating and improving the performances of job execution and data transmission in data grids, YouTube, and Hadoop YARN
title_fullStr Evaluating and improving the performances of job execution and data transmission in data grids, YouTube, and Hadoop YARN
title_full_unstemmed Evaluating and improving the performances of job execution and data transmission in data grids, YouTube, and Hadoop YARN
title_sort evaluating and improving the performances of job execution and data transmission in data grids, youtube, and hadoop yarn
publishDate 2015
url http://ndltd.ncl.edu.tw/handle/99808945472329041277
work_keys_str_mv AT leemingchang evaluatingandimprovingtheperformancesofjobexecutionanddatatransmissionindatagridsyoutubeandhadoopyarn
AT lǐmíngchāng evaluatingandimprovingtheperformancesofjobexecutionanddatatransmissionindatagridsyoutubeandhadoopyarn
AT leemingchang pínggūyǔgǎishànzīliàowǎnggéyoutubeyǔhadoopyarngōngzuòzhíxíngyǔzīliàochuánshūxiàonéngzhīyánjiū
AT lǐmíngchāng pínggūyǔgǎishànzīliàowǎnggéyoutubeyǔhadoopyarngōngzuòzhíxíngyǔzīliàochuánshūxiàonéngzhīyánjiū
_version_ 1718333018699988992