Efficiency Improvement and Analysis of Accessing Stacked Memories on Many-Core Platforms

碩士 === 國立清華大學 === 資訊工程學系 === 102 === Because of DRAM is its structural simplicity, high density per unit area and more inexpensive, it’s very suited to be a role of main-memory in computer architecture. However, from a historical point of view, since the DRAM was flourished, the rate of improvement...

Full description

Bibliographic Details
Main Authors: Chen, Huan-Wen, 陳煥文
Other Authors: Huang, Chih-Tsun
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/62576801445937981882
Description
Summary:碩士 === 國立清華大學 === 資訊工程學系 === 102 === Because of DRAM is its structural simplicity, high density per unit area and more inexpensive, it’s very suited to be a role of main-memory in computer architecture. However, from a historical point of view, since the DRAM was flourished, the rate of improvement in processor speed exceeds the rate of improvement in DRAM memory speed, that W. Wulf and S. McKee called the phenomenon “memory wall”. Nevertheless, over the past few decades the amount of on-chip cores comes from one to several, and the up-coming NoC-based (most is mesh) many-core architecture no longer blindly upgrades processor’s performance, but takes advantage of parallelism to achieve the throughput requirement with superior cost-effectiveness. Unfortunately, the demand for memory bandwidth or throughput is still increased. Therefore, many engineer try to do their best to enhance the efficiency between memory controller and DRAM devices by proposing better memory scheduling policy, increasing bandwidth and improving the access speed, etc. Recently, the emergence of 3D-stacked DRAM (wide I/O) slightly reduces the speed gap between processor and memory system. But the architecture which used Network-on-Chip as a bridge to connect processors and memory controllers has a characteristic that some DRAM requests from processors may go through very far distance to access memory controller. Based on the above motivation, in this thesis we present an architecture which improves efficiency of accessing stacked memories on many-core platforms. This architecture uses an extra switch network to transport the packets which come from processor to DRAM sub-system and groups few numbers of processor to specify DRAM-channel. By this method, we can alleviate the traffic contention between DRAM-requests and inter-processor communication. We use traditional method as a contrast, that all of DRAM-requests are routed by NoC. Experimental results of SPLASH2 applications demonstrate significant speed up that ranges from 1.13 times to 2.57 times, with cost-affordable crossbar switch network which also applies to the Wide I/O DRAM interface.