|
The ADvanced Systems Laboratory (ADSL)
|
||||||||||||||||||
|
Abstract:Storage systems employ various techniques to protect user data from hardware failures and software defects. These techniques, while effective in their own domains, fail to provide comprehensive protection. In this dissertation, we identify the problem of isolated protection in both local storage systems and cloud storage services, and propose cooperative data protection to address this problem. In the first half of this dissertation (on local storage systems), we present a study of the effects of disk and memory corruption on ZFS, a modern commercial file system with numerous reliability mechanisms. Through careful and thorough fault injection, we show that ZFS is robust to a wide range of disk faults, but because of its isolated integrity checks that only cover on-disk data, it is less resilient to memory corruption, which can lead to corrupt data being returned to applications or system crashes. To solve this problem, we introduce flexible end-to-end data integrity, which enables all components along the I/O path (e.g., page cache, file system) to handle checksums cooperatively. Each component is able to alter its protection scheme to meet the performance and reliability demands of the system. We apply this new concept to ZFS and build Zettabyte-Reliable ZFS (Z2FS). Z2FS provides dynamical tradeoffs between performance and protection and offers Zettabyte Reliability, which is at most one undetected corruption per Zettabyte of data read. We develop an analytical framework to evaluate reliability; the protection approaches in Z2FS are built upon the foundations of the framework. For comparison, we implement a straight-forward End-to-End ZFS (E2ZFS) with the same protection scheme for all components. Through analysis and experiment, we show that Z2FS is able to achieve better overall performance than E2ZFS, while still offering Zettabyte Reliability. In the second half of this dissertation (on cloud storage services), we analyze how reliable cloud-based synchronization services are in the face of local corruption and crashes. We perform fault injection experiments on several popular synchronization services and local file systems, and find that despite the excellent reliability that the cloud back-end provides, the loose coupling of these services and local file systems makes synchronized data more vulnerable than users might believe. Local corruption may be propagated to the cloud, polluting all copies on other devices, and a crash or untimely shutdown may lead to inconsistency between a local file and its cloud copy. Even without these failures, these services cannot provide causal consistency. To solve this problem, we present ViewBox, an integrated synchronization service and local file system that provides freedom from data corruption and inconsistency. ViewBox detects these problems using ext4-cksum, a modified version of ext4, and recovers from them using a user-level daemon, cloud helper, to fetch correct data from the cloud. To provide a stable basis for recovery, ViewBox employs the view manager on top of ext4-cksum. The view manager creates and exposes views, consistent in-memory snapshots of the file system, which the synchronization client then uploads. Our experiments show that ViewBox detects and recovers from both corruption and inconsistency, while incurring minimal overhead.
Full Paper:
PDF
BibTeX
|
||||||||||||||||||