A study of data partitioning and prefetching for hybrid storage systems

Storage system performance has received much attention since the early days of computing systems because of the mismatch between computing power and I/O access times. The invention of new technologies have increased storage system performance, but due to the cost-performance trade off no one type of...

Full description

Bibliographic Details
Main Author: Sultana, Maliha
Language:English
Published: University of British Columbia 2011
Online Access:http://hdl.handle.net/2429/37741
Description
Summary:Storage system performance has received much attention since the early days of computing systems because of the mismatch between computing power and I/O access times. The invention of new technologies have increased storage system performance, but due to the cost-performance trade off no one type of storage media is capable to meet both performance and capacity requirements. This motivated us to study the impact of data management techniques such as data partitioning and correlated prefetching on I/O performance when two different non-volatile storage media are integrated into a computing system. First, we consider partitioning data blocks between two devices, where one device is significantly faster than the other. We assume that significantly faster performance also implies a significantly smaller capacity. Clearly not all data can be stored or cached in the faster device. Second, to improve performance of the slower device, we investigate if correlation-directed prefetching (CDP) may offer significant benefits. Although CDP has been studied previously, we look into some special aspects of it. We analyze how different block correlation analysis heuristics affect the performance of CDP. We developed a simulator to study the effect of the different techniques when using devices with differing characteristics. Our results show that data partitioning can significantly improve storage system performance. For a hard disk and solid-state drive based system, we achieved 2--92% improvement for different traces. We also show that data partitioning based on application long-range block access patterns performs significantly better than caching temporal locality of references. To evaluate the hybrid system in real world settings, we present a case study, a prototype data block manager for Linux-based systems that permits data to be partitioned across an SSD and an HDD. This partitioning is transparent to the file system and the block manager can also trigger data prefetches when there is high correlation between data block accesses.