UW-Madison Logo

The ADvanced Systems Laboratory (ADSL)

PACE: Protocol-Aware Correlated Crash Explorer

PACE checks if distributed storage systems (eg., key-value stores, configuration stores, and databases) can survive data-center-wide power outages. PACE aims to find violations in user-level expectations such as data loss, data corruption, and unavailability when such correlated crashes occur. We have applied PACE to many distributed storage systems including ZooKeeper, Redis, MongoDB, and Kafka and found some correlated crash vulnerabilities in them. For more details on PACE and how it works, read our OSDI paper on Correlated Crash Vulnerabilities.

Source and Documentation

Source code of PACE and few example system workloads and checkers are available in this github repo. We strongly recommended users to read the documentation to understand PACE's limitations and caveats. You can send any suggestions, bug fixes, and comments to ra@cs.wisc.edu.

Using PACE

Please contact (Ram)natthan Alagappan (ra@cs.wisc.edu) if you want to apply PACE to check for correlated crash vulnerabilities in your storage system or for details on the workloads, checkers, and vulnerabilities discovered in our study. Please cite this paper, if you use this work/tool - Thanks!