Analysis of Hadoop Distributed Environment for Data Storage and Data Computing

碩士 === 國立交通大學 === 電信工程研究所 === 100 === The primary issue of this thesis is the architecture of HDFS(Hadoop Distributed File System)and Hadoop MapReduce software framework for distributed computing. Distributed file system is a file system that allowed many computers to share their files and storage s...

Full description

Bibliographic Details
Main Author: 王耀駿
Other Authors: 張文鐘
Format: Others
Language:zh-TW
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/22814669427908852751
Description
Summary:碩士 === 國立交通大學 === 電信工程研究所 === 100 === The primary issue of this thesis is the architecture of HDFS(Hadoop Distributed File System)and Hadoop MapReduce software framework for distributed computing. Distributed file system is a file system that allowed many computers to share their files and storage spaces through the network, and distributed computing is a way to solve large computational problems in parallel by collecting distributed computing resources. HDFS and MapReduce framework are running on the same computer cluster, HDFS provide file system service and store large data sets in disks of computer cluster, while MapReduce applications process large data sets stored in HDFS in-parallel on large cluster. Both HDFS and MapReduce framework follow master/slave architecture, a cluster consists of a single master server and many slave servers. The master server is responsible for managing and coordinating the storage and computing resource provided by slave servers in cluster to serve the requests from clients. All servers are fully connected and communicate with each other by using TCP-based protocols and streaming mechanism. The mechanism of HDFS and MapReduce framework would be verified through the studying of relative open source code, and a computer cluster would be set up to further clarify the operations.