UW-Madison Logo

The ADvanced Systems Laboratory (ADSL)
Publication abstract

Representative, Reproducible, and Practical Benchmarking of File and Storage Systems

Nitin Agrawal
Department of Computer Sciences ,
University of Wisconsin-Madison


Benchmarks are crucial to assessing performance of file and storage systems; by providing a common measuring stick among differing systems, comparisons can be made, and new techniques deemed better or worse than existing ones. Unfortunately, file and storage systems are currently difficult to benchmark; there is little consensus regarding the workloads that matter and insufficient infrastructure to make it easy to run interesting workloads. This dissertation attempts to simplify the task of file and storage system benchmarking by focusing on all three of its important principles - developing an understanding of and creating solutions for representative, reproducible and practical benchmarking state and benchmark workloads.

We develop an understanding of file-system metadata by performing a large-scale longitudinal study of file system snapshots representative of corporate PC systems. For five years from 2000 to 2004, we collected annual snapshots of file-system metadata from over 60,000 Windows PC file systems in a large corporation. In our study, we use these snapshots to study temporal changes in file size, file age, file-type frequency, directory size, namespace structure, file-system population, storage capacity and consumption, and degree of file modification. We present a generative model that explains the namespace structure and the distribution of directory sizes. We find significant temporal trends relating to the popularity of certain file types, the origin of file content, the way the namespace is used, and the degree of variation among file systems, as well as more pedestrian changes in sizes and capacities.

We develop means to recreate representative and reproducible file-system state for benchmarking. The performance of file systems and related software depends on characteristics of underlying file-system image (i.e., file-system metadata and file contents). Unfortunately, rather than benchmarking with realistic file-system images, most system designers and evaluators rely on ad hoc assumptions and (often inaccurate) rules of thumb. To remedy these problems, we develop Impressions, a framework to generate statistically accurate file-system images with realistic metadata and content; we present its design, implementation and evaluation. Impressions is flexible, supporting user-specified constraints on various file-system parameters using a number of statistical techniques to generate consistent images. We find that Impressions not only accurately quantifies benchmark performance, but also helps uncover application policies and potential bugs, making it a useful for system developers and users alike.

We develop a system that makes it practical to run large, complex benchmarks on storage systems with modest capacities. Typically, benchmarking with such benchmarks on large disks is a frequent source of frustration for file-system evaluators; the scale alone acts as a strong deterrent against using larger albeit realistic benchmarks. To address this problem, we have developed Compressions, a benchmarking system that makes it practical to run benchmarks that were otherwise infeasible on a given system, while also being faster in total runtime. Compressions creates a "compressed" version of the original file-system image on disk by omitting all file data and laying out metadata more efficiently; we present the design, implementation and evaluation of Compressions.

We develop an understanding towards creating representative, reproducible and practical synthetic benchmark workloads. Synthetic benchmarks are accepted and widely used as substitutes for more realistic and complex workloads in file systems research, however, they are largely based on the benchmark writer's interpretation of the real workload, and how it exercises the system API. It is our hypothesis that if two workloads execute roughly the same set of function calls within the file system, that they will be roughly equivalent to one another; based on this hypothesis, we describe our first steps in creating "realistic synthetic" benchmarks by building a tool called CodeMRI. CodeMRI leverages file-system domain knowledge and a small amount of system profiling in order to better understand how the benchmark is stressing the system and to deconstruct its workload.

Full Paper: Postscript   PDF   BibTeX