![]() |
The ADvanced Systems Laboratory (ADSL)
|
||||||||||||||||||
|
Abstract:Consensus protocols play a pivotal role in fault-tolerant distributed systems, with their most prominent applications found in cloud services that leverage replication to deliver high availability and strong consistency. Confronting these consensus systems are new challenges that emerge from modern cloud workloads and infrastructure: geographical distance, payload density, diversity of workload characteristics and hardware conditions, and the constant flux that demands adaptability. Classic consensus protocols, such as MultiPaxos and Raft, fall short in addressing these challenges. Despite being the de-facto standard implemented in practice, these protocols do not recognize location affinity and bandwidth pressure, which are problems that arise in cloud environments but are not modeled in their designs. More importantly, none of the existing consensus protocols are adaptive to diversity and dynamism; this is due to their rigid, pessimistic approach to quorum composition and failure handling. With these issues in mind, we propose optimistic connectivity, a design principle for cloud consensus protocols. Inspired by optimistic algorithm designs that speculate conflict-free execution and resolve conflicts upon validation errors, we apply the idea of optimism to quorum composition. Protocols are allowed to assume arbitrarily large quorums that may require higher connectivity than simple majority in failure-free cases, while assuring progress upon timeout errors by selecting conservative configurations. Following this principle, we design Crossword and Bodega, two consensus protocols that tackle compound cloud-era challenges; we implement and evaluate both protocols with Summerset, a protocol-generic replicated key-value store testbed. In the first part of this dissertation, we present Crossword, an erasure-coded consensus protocol for dynamic data-heavy workloads, where variable payload sizes create sporadic bandwidth stress. Crossword integrates erasure coding with consensus and establishes a runtime per-instance tradeoff between the quorum size and the number of shards assigned per replica, improving performance by adapting with the best configuration. Graceful leader failover is achieved through a lazy follower gossiping strategy with minimal overhead. Evaluation shows up to 2.3x throughput over MultiPaxos and Raft under dynamic workloads and network conditions, and 1.32x aggregate TPC-C throughput for CockroachDB. In the second part, we present Bodega, a wide-area consensus protocol that can safely serve linearizable reads locally by any desired replica at any time, significantly improving read performance. Bodega introduces a new notion of cluster metadata called the roster, which allows selecting arbitrary local-read-enabled replicas for each key. To achieve fault-tolerant transitions between rosters, Bodega proposes a novel all-to-all roster leases mechanism to maintain a consistent roster agreement across replicas with zero critical-path overhead, generalizing existing all-to-one leasing approaches such as leader leases. Evaluation shows 5.6x to 13.1x acceleration compared to state-of-the-art Leader Leases and Quorum Leases on geo-scale clusters, and comparable performance to sequentially-consistent deployments of etcd and ZooKeeper. In the third part, we describe our development of Summerset, an open-source, distributed, replicated, protocol-generic key-value store testbed for concise implementation and fair evaluation of consensus protocols. Summerset takes advantage of modern asynchronous Rust programming structures and adopts a modularized architecture, allowing each protocol to be implemented as a straightforward event loop. Summerset currently consists of 14.6k lines of infrastructure code and 11 protocol module implementations. Finally, we contribute to crucial supportive topics that empower our research on replication systems, including consistency modeling, testing, and formal verification. We unify the definition of linearizability with weaker consistency levels via a self-contained, practical, and understandable hierarchy model. The effectiveness of this model is demonstrated with a Jepsen-integrated consistency checker implementation that reports conformity results across multiple levels. With TLA+, we construct advanced, practice-grounded formal specifications for MultiPaxos, Crossword, and Bodega. All specifications are equipped with modern replication system features and are thoroughly model-checked.
Full Paper:
PDF
BibTeX
Slides
|
||||||||||||||||||