UW-Madison Logo

The ADvanced Systems Laboratory (ADSL)
Publication abstract

Data-Driven Models in Storage System Design

Florentina I. Popovici
Department of Computer Sciences , University of Wisconsin-Madison

Abstract:

Systems with high data demands depend on the efficiency of the storage system to reach the high performance that is expected from them. Unfortunately, because of the way these systems evolve, this is not easy to achieve. Storage systems are expected to act as a monolithic unit, but they are actually constructed as a stack of layers that communicate through narrow interfaces. Because the information that flows between the layers is limited, it is difficult to implement many desirable optimizations.

We propose to use data-driven models to alleviate this lack of information. These models are empirical models that observe the inputs and outputs of the system being modeled, and then predict its behavior based on those previous observations.

We particularly focus on data-driven models for disks, as good disk usage can improve the performance of a system by orders of magnitude. It is difficult to model disks because of their intrinsic complexity. The demands of deploying data-driven models on-line, in a running system, adds to the challenge of modeling storage devices.

The data-driven models we develop are tailored to the specific applications that use them. This allows us to build simplified models and to integrate them more seamlessly in an existing system. For example, we built such models to aid in decisions made by a throughput-optimizing I/O scheduler at the operating system level or to help lay out a write-ahead log on disk such that synchronous write requests do not incur unnecessary and expensive rotational latency overhead. We explore how to build models for different devices by building a partial data-driven model of a RAID-5 storage system, and use it to perform stripe-aligned writes.

In this dissertation we build data-driven models and use them in scheduling and layout applications at the operating system and application level. Additionally we leverage experience from modeling disk drives to model more complex storage systems as RAIDs. This allows us to validate the generality of out approach. Through experiments we show that data-driven models can bring significant performance improvements to the systems where they were deployed.

Full Paper: PDF   PS   BibTeX

Publications