UW-Madison Logo

The ADvanced Systems Laboratory (ADSL)
Publication abstract

Scale, Performance, and Fault Tolerance in a Filesystem Semi-Microkernel

Jing Liu
Department of Computer Sciences
University of Wisconsin-Madison

Abstract:

The landscape of computing is evolving rapidly, with storage devices offering microsecond-level latency and substantial bandwidth. However, monolithic OS kernels like Linux struggle to keep up. These kernels incur significant overheads, particularly in the filesystem stack, and face scalability challenges on multi-core CPUs. Developing kernel code is difficult and slow, and upstreaming a kernel feature can take months or even years. Moreover, the increasing complexity of hardware and software heightens the risk of failures, as a single fault can crash the entire system.

To address these issues, this work explores a semi-microkernel architecture, where the I/O subsystem operates as a standalone user-space service, while the rest of the OS remains in the monolithic kernel. We focus on one critical I/O subsystem: filesystems. Several filesystem semi-microkernels, including uFS, Nebula, and uFS-Shadow, were built to investigate this approach, emphasizing performance, resource elasticity, and fault tolerance. uFS is a fully functional, high-performance, and crash-consistent user-space filesystem following the semi-microkernel approach. Nebula, based on uFS, provides fast, robust, and seamless recovery upon unexpected faults, as if no failure ever occurs. uFS-Shadow, when incorporated into Nebula, improves the reliability of uFS by recovering from both transient and deterministic errors.

In the first part of this dissertation, we focus on the architecture of uFS, emphasizing its high performance. uFS leverages polling-based I/O and high-performance inter-process communication (IPC) for low latency. uFS achieves multi-core scalability through a "shared-nothing" design, where files are partitioned among threads, allowing each server thread to operate independently.

In the second part, we address resource elasticity in uFS by incorporating load management. In the semi-microkernel architecture, the filesystem server can scale CPU resources independently of applications because the application and server threads are decoupled. This feature allows uFS to balance performance and CPU efficiency while adapting to dynamic workloads.

In the third part, we examine the issue where filesystem applications cannot continue after a server crash, even though the entire system remains unaffected. The server buffers updates in memory, creating a state gap between what the application perceives and what is on disk. Simply restarting the server risks losing these updates, leading to potential data loss or silent errors. To address this, we introduce exit activation, a process recovery mechanism, which is code that runs after a server crash and uses the failed process's memory to safely recover the state gap before itis reclaimed by the OS.

In the final part, we introduce robust alternative execution (RAE), an approach to enhance the reliability of an existing high-performance filesystem via a shadow filesystem. This shadow system, which prioritizes correctness, takes over when the base filesystem encounters errors. By simplifying its design and omitting performance optimizations, the shadow filesystem is less prone to bugs, thereby improving overall reliability.

Overall, this approach is best suited for scenarios where the local filesystem needs to be specialized for hardware or where preventing filesystem faults from crashing the entire system is critical, with a sufficient user base to drive further customizations and filesystem innovation.

Full Paper: PDF   BibTeX

Publications