Preserving the Chronological Versions of HDFS Files in Hadoop Clusters

碩士 === 輔仁大學 === 資訊工程學系碩士班 === 104 === Cloud platform in recent years has been widely used in various area, including Hadoop platform environment which is the most widely used. Initially Hadoop provides a simple, scalable, efficient cloud computing and cloud storage architecture. With the rapid advan...

Full description

Bibliographic Details
Main Authors:	CHIEN,TING-YU, 簡霆毓
Other Authors:	YEH,TSOZEN
Format:	Others
Language:	zh-TW
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/06634888613504148822

id	ndltd-TW-104FJU00396010
record_format	oai_dc
spelling	ndltd-TW-104FJU003960102017-11-12T04:38:35Z http://ndltd.ncl.edu.tw/handle/06634888613504148822 Preserving the Chronological Versions of HDFS Files in Hadoop Clusters 實現Hadoop叢集HDFS檔案之歷史版本保存 CHIEN,TING-YU 簡霆毓碩士輔仁大學資訊工程學系碩士班 104 Cloud platform in recent years has been widely used in various area, including Hadoop platform environment which is the most widely used. Initially Hadoop provides a simple, scalable, efficient cloud computing and cloud storage architecture. With the rapid advances in data analysis and large demand computing, the new development and application of the Hadoop ecosystem approach has been increasingly richer, its efficacy also improves. Hadoop distributed file system, HDFS(Hadoop Distributed File System) is the default Hadoop file system, HDFS structure mainly includes two roles, NameNode and DataNode. Basic HDFS will be configured as a cluster with one NameNode and host and multiple DataNode, NameNode keeps metadata to manage all archives namespace and location information of storage blocks, DataNode is responsible for storing files block, and their duplicates. During the early development of Hadoop, in order to meet the performance of using large files stored some of the basic features of the UNIX file system were not available. So Hadoop does not allow users to modify HDFS file content until append and truncate features appeared later. Meanwhile, Hadoop 2.2 officially added snapshot capabilities. It can specify the time to do a complete backup of the entire directory of files. Frequent snapshots many will have the similar backups. However, if there is a change made to the content between two snapshots, the changes will not be rewarded accordingly. In the past, a more the complete file version history can only be done through increasing the numbers of snapshots. However, often only a small number of files in the snapshots really require frequent backups. Even so, change between two consecutive snapshots are still not rewarded as described above. We developed new a mechanism by using existing HDFS append feature to save all versions of individual files, we also provide uses an easy way to retrieve those different file versions kept. As a result, were can we are design to easily identify the contents of different versions of individual files. YEH,TSOZEN 葉佐任 2016 學位論文 ; thesis 48 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 輔仁大學 === 資訊工程學系碩士班 === 104 === Cloud platform in recent years has been widely used in various area, including Hadoop platform environment which is the most widely used. Initially Hadoop provides a simple, scalable, efficient cloud computing and cloud storage architecture. With the rapid advances in data analysis and large demand computing, the new development and application of the Hadoop ecosystem approach has been increasingly richer, its efficacy also improves. Hadoop distributed file system, HDFS(Hadoop Distributed File System) is the default Hadoop file system, HDFS structure mainly includes two roles, NameNode and DataNode. Basic HDFS will be configured as a cluster with one NameNode and host and multiple DataNode, NameNode keeps metadata to manage all archives namespace and location information of storage blocks, DataNode is responsible for storing files block, and their duplicates. During the early development of Hadoop, in order to meet the performance of using large files stored some of the basic features of the UNIX file system were not available. So Hadoop does not allow users to modify HDFS file content until append and truncate features appeared later. Meanwhile, Hadoop 2.2 officially added snapshot capabilities. It can specify the time to do a complete backup of the entire directory of files. Frequent snapshots many will have the similar backups. However, if there is a change made to the content between two snapshots, the changes will not be rewarded accordingly. In the past, a more the complete file version history can only be done through increasing the numbers of snapshots. However, often only a small number of files in the snapshots really require frequent backups. Even so, change between two consecutive snapshots are still not rewarded as described above. We developed new a mechanism by using existing HDFS append feature to save all versions of individual files, we also provide uses an easy way to retrieve those different file versions kept. As a result, were can we are design to easily identify the contents of different versions of individual files.
author2	YEH,TSOZEN
author_facet	YEH,TSOZEN CHIEN,TING-YU 簡霆毓
author	CHIEN,TING-YU 簡霆毓
spellingShingle	CHIEN,TING-YU 簡霆毓 Preserving the Chronological Versions of HDFS Files in Hadoop Clusters
author_sort	CHIEN,TING-YU
title	Preserving the Chronological Versions of HDFS Files in Hadoop Clusters
title_short	Preserving the Chronological Versions of HDFS Files in Hadoop Clusters
title_full	Preserving the Chronological Versions of HDFS Files in Hadoop Clusters
title_fullStr	Preserving the Chronological Versions of HDFS Files in Hadoop Clusters
title_full_unstemmed	Preserving the Chronological Versions of HDFS Files in Hadoop Clusters
title_sort	preserving the chronological versions of hdfs files in hadoop clusters
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/06634888613504148822
work_keys_str_mv	AT chientingyu preservingthechronologicalversionsofhdfsfilesinhadoopclusters AT jiǎntíngyù preservingthechronologicalversionsofhdfsfilesinhadoopclusters AT chientingyu shíxiànhadoopcóngjíhdfsdàngànzhīlìshǐbǎnběnbǎocún AT jiǎntíngyù shíxiànhadoopcóngjíhdfsdàngànzhīlìshǐbǎnběnbǎocún
_version_	1718561139316490240

Preserving the Chronological Versions of HDFS Files in Hadoop Clusters

Similar Items