The Interaction of Failure and Performance in a Migratory File Service
John Bent, Douglas Thain, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny
We present the design, implemetitation, and evaluation of a Migratory File Service (MFS), a system designed to exploit semantic knowledge of workloads and user expectations to improve performance and handle failures effectively in wide-area batch scheduling systems. We discuss Hawk, a prototype MFS system which has two novel components: migratory proxies, which cache data at remote clusters, and a workflow manager, which manages the workflow of the system. Hawk integrates aggressive caching and I/O filtering to reduce wide-area traffic, proactively replicates data to avoid regeneration due to failure, and performs fine-grained rollback and recovery to minimize the effort required to recover from failure. Through a case study of data-intensive applications, we demonstrate the benefits of Hawk over traditional approaches, delivering a two to three orders of magnitude increase in performance for jobs that are deployed across a wide-area batch scheduling environment.
Download this report (PDF)
Return to tech report index