Preserving the Chronological Versions of HDFS Files in Hadoop Clusters

碩士 === 輔仁大學 === 資訊工程學系碩士班 === 104 === Cloud platform in recent years has been widely used in various area, including Hadoop platform environment which is the most widely used. Initially Hadoop provides a simple, scalable, efficient cloud computing and cloud storage architecture. With the rapid advan...

Full description

Bibliographic Details
Main Authors: CHIEN,TING-YU, 簡霆毓
Other Authors: YEH,TSOZEN
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/06634888613504148822
id ndltd-TW-104FJU00396010
record_format oai_dc
spelling ndltd-TW-104FJU003960102017-11-12T04:38:35Z http://ndltd.ncl.edu.tw/handle/06634888613504148822 Preserving the Chronological Versions of HDFS Files in Hadoop Clusters 實現Hadoop叢集HDFS檔案之歷史版本保存 CHIEN,TING-YU 簡霆毓 碩士 輔仁大學 資訊工程學系碩士班 104 Cloud platform in recent years has been widely used in various area, including Hadoop platform environment which is the most widely used. Initially Hadoop provides a simple, scalable, efficient cloud computing and cloud storage architecture. With the rapid advances in data analysis and large demand computing, the new development and application of the Hadoop ecosystem approach has been increasingly richer, its efficacy also improves. Hadoop distributed file system, HDFS(Hadoop Distributed File System) is the default Hadoop file system, HDFS structure mainly includes two roles, NameNode and DataNode. Basic HDFS will be configured as a cluster with one NameNode and host and multiple DataNode, NameNode keeps metadata to manage all archives namespace and location information of storage blocks, DataNode is responsible for storing files block, and their duplicates. During the early development of Hadoop, in order to meet the performance of using large files stored some of the basic features of the UNIX file system were not available. So Hadoop does not allow users to modify HDFS file content until append and truncate features appeared later. Meanwhile, Hadoop 2.2 officially added snapshot capabilities. It can specify the time to do a complete backup of the entire directory of files. Frequent snapshots many will have the similar backups. However, if there is a change made to the content between two snapshots, the changes will not be rewarded accordingly. In the past, a more the complete file version history can only be done through increasing the numbers of snapshots. However, often only a small number of files in the snapshots really require frequent backups. Even so, change between two consecutive snapshots are still not rewarded as described above. We developed new a mechanism by using existing HDFS append feature to save all versions of individual files, we also provide uses an easy way to retrieve those different file versions kept. As a result, were can we are design to easily identify the contents of different versions of individual files. YEH,TSOZEN 葉佐任 2016 學位論文 ; thesis 48 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 輔仁大學 === 資訊工程學系碩士班 === 104 === Cloud platform in recent years has been widely used in various area, including Hadoop platform environment which is the most widely used. Initially Hadoop provides a simple, scalable, efficient cloud computing and cloud storage architecture. With the rapid advances in data analysis and large demand computing, the new development and application of the Hadoop ecosystem approach has been increasingly richer, its efficacy also improves. Hadoop distributed file system, HDFS(Hadoop Distributed File System) is the default Hadoop file system, HDFS structure mainly includes two roles, NameNode and DataNode. Basic HDFS will be configured as a cluster with one NameNode and host and multiple DataNode, NameNode keeps metadata to manage all archives namespace and location information of storage blocks, DataNode is responsible for storing files block, and their duplicates. During the early development of Hadoop, in order to meet the performance of using large files stored some of the basic features of the UNIX file system were not available. So Hadoop does not allow users to modify HDFS file content until append and truncate features appeared later. Meanwhile, Hadoop 2.2 officially added snapshot capabilities. It can specify the time to do a complete backup of the entire directory of files. Frequent snapshots many will have the similar backups. However, if there is a change made to the content between two snapshots, the changes will not be rewarded accordingly. In the past, a more the complete file version history can only be done through increasing the numbers of snapshots. However, often only a small number of files in the snapshots really require frequent backups. Even so, change between two consecutive snapshots are still not rewarded as described above. We developed new a mechanism by using existing HDFS append feature to save all versions of individual files, we also provide uses an easy way to retrieve those different file versions kept. As a result, were can we are design to easily identify the contents of different versions of individual files.
author2 YEH,TSOZEN
author_facet YEH,TSOZEN
CHIEN,TING-YU
簡霆毓
author CHIEN,TING-YU
簡霆毓
spellingShingle CHIEN,TING-YU
簡霆毓
Preserving the Chronological Versions of HDFS Files in Hadoop Clusters
author_sort CHIEN,TING-YU
title Preserving the Chronological Versions of HDFS Files in Hadoop Clusters
title_short Preserving the Chronological Versions of HDFS Files in Hadoop Clusters
title_full Preserving the Chronological Versions of HDFS Files in Hadoop Clusters
title_fullStr Preserving the Chronological Versions of HDFS Files in Hadoop Clusters
title_full_unstemmed Preserving the Chronological Versions of HDFS Files in Hadoop Clusters
title_sort preserving the chronological versions of hdfs files in hadoop clusters
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/06634888613504148822
work_keys_str_mv AT chientingyu preservingthechronologicalversionsofhdfsfilesinhadoopclusters
AT jiǎntíngyù preservingthechronologicalversionsofhdfsfilesinhadoopclusters
AT chientingyu shíxiànhadoopcóngjíhdfsdàngànzhīlìshǐbǎnběnbǎocún
AT jiǎntíngyù shíxiànhadoopcóngjíhdfsdàngànzhīlìshǐbǎnběnbǎocún
_version_ 1718561139316490240