Summary: | 碩士 === 淡江大學 === 資訊工程學系碩士班 === 103 === With the rapid development of Internet technologies, we are now in the era of Big Data. To meet the need of handling a vast amount of data, many global enterprises are using cloud computing model to solve their problems. The massive computing and storage capabilities of cloud computing come from a huge cluster of servers in data centers, and many data centers are using Hadoop MapReduce to process their data. Previous researches pointed out that sending intermediate data to the Reducers during the shuffle phase of MapReduce can cause network congestion in the data center, and thus degrades the overall computation performance. To address this issue, some researchers proposed the idea of introducing the Software-Defined Networking (SDN) technology into a Hadoop cluster. Specifically, with the knowledge of the scheduling of MapReduce jobs, the SDN technology can be used to adjust the network resources dynamically to prevent network congestion during the Shuffle phase. As a result, the MapReduce jobs can be completed faster.
Therefore, in this research we build a small-scale experimental Hadoop cluster with two Open vSwitches and one Floodlight Controller. By matching shuffle traffic to a flow entry with higher transmission rate, even when the network is congested, those packets carrying intermediate data can be sent as fast as possible, so the Hadoop MapReduce execution time is reduced. To prove the idea, we designed four experiments and compared the experimental results. Finally, to ease the administration of creating and deleting flow entries for Hadoop applications, we designed an SDN App for Hadoop MapReduce. This is accomplished by using Node.js technology and the REST APIs provided by the Floodlight Controller.
|