Design and Analysis of Memory Interface Architecture for Many-Core Platforms

碩士 === 國立清華大學 === 資訊工程學系 === 104 === In past decades, system on a chip gives explorers add more functions on a single chip. But Moore's Law indicates transistor counts doubled approximately every two years. The design complexity also encounter sharp challenge. Undoubtedly, raising the abstracti...

Full description

Bibliographic Details
Main Authors: Yeh, Kuo Kai, 葉國楷
Other Authors: Huang, Chih Tsun
Format: Others
Language:en_US
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/09523182051823206473
Description
Summary:碩士 === 國立清華大學 === 資訊工程學系 === 104 === In past decades, system on a chip gives explorers add more functions on a single chip. But Moore's Law indicates transistor counts doubled approximately every two years. The design complexity also encounter sharp challenge. Undoubtedly, raising the abstraction level of modeling and simulation is urgent need. Nowadays, single processor development has encounter bottleneck of rising frequency and energy efficiency problem. So the emerging many-core architecture has been designed for replacing traditional centralized single core design. Multi-core processor's advantages are high performance computing, low power, and suitable to multi-thread applications. However, the demand for memory bandwidth is still increased. In 1994, Wulf and McKee through the improvement of computer's performance would stop. Factual proof that from 1986 to 2000, CPU speed improved at an annual rate of 55% while memory speed only improved at 10%. In other words, memory speed would become the bottleneck in computer performance. Therefore, many engineers dedicate to improve the efficiency between memory controller and DRAM.  In addition, the many-core architecture which use mesh or torus architecture between cores has a phenomenon that the distance from the core to DRAM may be very far. Based on the above motivation, we present an architecture which has better efficiency of memory access, and a mechanism reduces memory access's routing time on NoC. This mechanism clusters processors and as-signs exclusive memory channel to the cluster. The architecture uses a multi-port Crossbar Switch to re-schedule DRAM requests from memory channels to DRAM. We call the architecture that memory requests routing by Crossbar Switch as CS-based approach. In contrast with Original approach that memory requests routing by NoC. To implement the architecture, we adopt SCE-MI to bridge ESL many-core platform with RTL memory sub-system. Experiment of SPLASH2 applications demonstrates remarkable speed up that ranges from 1.18 to 1.74 times. And the extra Crossbar Switch is about 7k gate count.