|
The ADvanced Systems Laboratory (ADSL)
|
||||||||||||||||
|
HARDFS: Hardening HDFS with Selective and Lightweight Versioning
Thanh Do,
Tyler Harter,
Yingchao Liu,
Haryadi S. Gunawi*,
Andrea C. Arpaci-Dusseau,
Remzi H. Arpaci-Dusseau
Abstract:
We harden the Hadoop Distributed File System (HDFS) against fail-silent (non fail-stop) behaviors that result from memory corruption and software bugs us- ing a new approach: selective and lightweight versioning (SLEEVE). With this approach, actions performed by important subsystems of HDFS (e.g., namespace management) are checked by a second implementation of the subsystem that uses lightweight, approximate data structures. We show that HARDFS detects and recovers from a wide range of fail-silent behaviors caused by random bit flips, targeted corruptions, and real software bugs. In particular, HARDFS handles 90% of the fail-silent faults that result from random memory corruption and correctly detects and recovers from 100% of 78 targeted corruptions and 5 real-world bugs. Moreover, it recoevrs orders of magnitude faster than full reboot by using micro-recovery. The extra protection in HARDFS incurs minimal performance and space overheads.
Full Paper:
PDF,
BibTex
|
||||||||||||||||