Memory-subsystem resource management for the many-core era

As semiconductor technology continues to scale lower in the nanometer era, the communication between processor and main memory has been particularly challenged. The well-studied frequency, memory and power ``walls'' have redirect architects towards utilizing Chip Multiprocessors (CMP) as a...

Full description

Bibliographic Details
Main Author:	Kaseridis, Dimitrios
Format:	Others
Language:	English
Published:	2012
Subjects:	Chip-multiprocessors Many-core Cache Memory Resource-management Processor Computer architecture Memory controllers
Online Access:	http://hdl.handle.net/2152/ETD-UT-2011-05-2758

id	ndltd-UTEXAS-oai-repositories.lib.utexas.edu-2152-ETD-UT-2011-05-2758
record_format	oai_dc
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Chip-multiprocessors Many-core Cache Memory Resource-management Processor Computer architecture Memory controllers
spellingShingle	Chip-multiprocessors Many-core Cache Memory Resource-management Processor Computer architecture Memory controllers Kaseridis, Dimitrios Memory-subsystem resource management for the many-core era
description	As semiconductor technology continues to scale lower in the nanometer era, the communication between processor and main memory has been particularly challenged. The well-studied frequency, memory and power ``walls'' have redirect architects towards utilizing Chip Multiprocessors (CMP) as an attractive architecture for leveraging technology scaling. In order to achieve high efficiency and throughput, CMPs rely heavily on sharing resources among multiple cores, especially in the case of the memory hierarchy. Unfortunately, such sharing introduces resource contention and interference between the multiple executing threads. The ever-increasing access latency difference between processor and memory, the gradually increasing memory bandwidth demands to main memory, and the decreasing cache capacity size available to each core due to multiple core integration, has made the need for an efficient memory subsystem resource management more critical than ever before. This dissertation focuses on managing the sharing of the Last-level Cache (LLC) capacity and the main memory bandwidth, as the two most important resources that significantly affect system performance and energy consumption. The presented schemes include efficient solutions to all of the three basic requirements for implementing a resource management schemes, that is: a) profiling mechanisms to capture applications' resource requirements, b) microarchitecture mechanisms to enforce a resource allocation scheme, and c) resource allocations algorithms/policies to manage the available memory resources throughput the whole memory hierarchy of a CMP system. To achieve these targets the dissertation first describes a set of low overhead, non-invasive profiling mechanisms that are able to project applications’ memory resource requirements and memory sharing behavior. Two memory resource partitioning schemes are presented. The first one, the Bank-aware dynamic partitioning scheme provides a low overhead solution for partitioning cache resources of large CMP architectures that are based on a Dynamic Non-Uniform Cache Architecture (DNUCA) last-level cache design, consistent with the current industry trends. In addition, the second scheme, the Bandwidth-aware dynamic scheme presents a system-wide optimization of memory-subsystem resource allocation and job scheduling for large, multi-chip CMP systems. The scheme is seeking for optimizations both within and outside single CMP chips, aiming at overall system throughput and efficiency improvements. As cache partitioning schemes with isolated partitions impose a set of restrictions in the use of the last-level cache, which can severely affect the performance of large CMP designs, this dissertation presents a Quasi-partitioning scheme that breaks such restrictions while providing most of the benefits of cache partitioning schemes. The presented solution is able to efficiently scale to a significant larger number of cores than what previously described schemes that are based on isolated partition can achieve. Finally, as the memory controller is one of the fundamental components of the memory-subsystem, a well-designed memory-subsystem resource management needs to carefully utilize the memory controller resources and coordinate its functionality with the operation of the main memory and the last-level cache. To improve execution fairness and system throughput, this dissertation presents a criticality-based, memory controller requests priority scheme. The scheme ranks demand read and prefetch operations based on their latency sensitivity, while it coordinates its operation with the DRAM page-mode policy and the memory data prefetcher. === text
author	Kaseridis, Dimitrios
author_facet	Kaseridis, Dimitrios
author_sort	Kaseridis, Dimitrios
title	Memory-subsystem resource management for the many-core era
title_short	Memory-subsystem resource management for the many-core era
title_full	Memory-subsystem resource management for the many-core era
title_fullStr	Memory-subsystem resource management for the many-core era
title_full_unstemmed	Memory-subsystem resource management for the many-core era
title_sort	memory-subsystem resource management for the many-core era
publishDate	2012
url	http://hdl.handle.net/2152/ETD-UT-2011-05-2758
work_keys_str_mv	AT kaseridisdimitrios memorysubsystemresourcemanagementforthemanycoreera
_version_	1716822299609399296
spelling	ndltd-UTEXAS-oai-repositories.lib.utexas.edu-2152-ETD-UT-2011-05-27582015-09-20T17:07:12ZMemory-subsystem resource management for the many-core eraKaseridis, DimitriosChip-multiprocessorsMany-coreCacheMemoryResource-managementProcessorComputer architectureMemory controllersAs semiconductor technology continues to scale lower in the nanometer era, the communication between processor and main memory has been particularly challenged. The well-studied frequency, memory and power ``walls'' have redirect architects towards utilizing Chip Multiprocessors (CMP) as an attractive architecture for leveraging technology scaling. In order to achieve high efficiency and throughput, CMPs rely heavily on sharing resources among multiple cores, especially in the case of the memory hierarchy. Unfortunately, such sharing introduces resource contention and interference between the multiple executing threads. The ever-increasing access latency difference between processor and memory, the gradually increasing memory bandwidth demands to main memory, and the decreasing cache capacity size available to each core due to multiple core integration, has made the need for an efficient memory subsystem resource management more critical than ever before. This dissertation focuses on managing the sharing of the Last-level Cache (LLC) capacity and the main memory bandwidth, as the two most important resources that significantly affect system performance and energy consumption. The presented schemes include efficient solutions to all of the three basic requirements for implementing a resource management schemes, that is: a) profiling mechanisms to capture applications' resource requirements, b) microarchitecture mechanisms to enforce a resource allocation scheme, and c) resource allocations algorithms/policies to manage the available memory resources throughput the whole memory hierarchy of a CMP system. To achieve these targets the dissertation first describes a set of low overhead, non-invasive profiling mechanisms that are able to project applications’ memory resource requirements and memory sharing behavior. Two memory resource partitioning schemes are presented. The first one, the Bank-aware dynamic partitioning scheme provides a low overhead solution for partitioning cache resources of large CMP architectures that are based on a Dynamic Non-Uniform Cache Architecture (DNUCA) last-level cache design, consistent with the current industry trends. In addition, the second scheme, the Bandwidth-aware dynamic scheme presents a system-wide optimization of memory-subsystem resource allocation and job scheduling for large, multi-chip CMP systems. The scheme is seeking for optimizations both within and outside single CMP chips, aiming at overall system throughput and efficiency improvements. As cache partitioning schemes with isolated partitions impose a set of restrictions in the use of the last-level cache, which can severely affect the performance of large CMP designs, this dissertation presents a Quasi-partitioning scheme that breaks such restrictions while providing most of the benefits of cache partitioning schemes. The presented solution is able to efficiently scale to a significant larger number of cores than what previously described schemes that are based on isolated partition can achieve. Finally, as the memory controller is one of the fundamental components of the memory-subsystem, a well-designed memory-subsystem resource management needs to carefully utilize the memory controller resources and coordinate its functionality with the operation of the main memory and the last-level cache. To improve execution fairness and system throughput, this dissertation presents a criticality-based, memory controller requests priority scheme. The scheme ranks demand read and prefetch operations based on their latency sensitivity, while it coordinates its operation with the DRAM page-mode policy and the memory data prefetcher.text2012-07-11T13:54:17Z2012-07-11T13:54:17Z2011-052012-07-11May 20112012-07-11T13:54:30Zthesisapplication/pdfhttp://hdl.handle.net/2152/ETD-UT-2011-05-27582152/ETD-UT-2011-05-2758eng

Memory-subsystem resource management for the many-core era

Similar Items