Date
Topic and Speaker
Monday
October 29
4:00 PM
2310 CS&S
Scalable Middleware for Large Scale Systems

I will discuss the problem of developing tools for large scale parallel environments. We are especially interested in systems, both leadership class parallel computers and clusters that have 10,000's or even millions of processors. The infrastructure that we have developed to address this problem is called MRNet, the Multicast/Reduction Network. MRNet's approach to scale is to structure control and data flow in a tree-based overlay network (TBON) that allows for efficient request distribution and flexible data reductions.

The second part of this talk will present an overview of the MRNet design, architecture, and computational model and then discuss several of the applications of MRNet. The applications include scalable automated performance analysis in Paradyn, a vision clustering application and, most recently, an effort to develop our first petascale tool, STAT, a scalable stack trace analyzer running currently on 1000's of processors and soon on 100,000.

I will conclude with a brief description of a new fault tolerance design that leverages natural redundancies in the tree structure to provide recovery without checkpoints or message logging.

Monday
November 5
4:00 PM
2310 CS&S
The Future of Device Drivers

Despite decades of research on OS structure and design, device drivers have received little attention. However, today drivers are the dominant form of kernel code: 70% of the code in the MacOS X kernel is in drivers, and the majority of kernel code written is in drivers. Drivers from early versions of Unix bare a striking resemblance to modern Linux and Unix drivers. However, the platform in which drivers execute has changed dramatically: memory size and processor count has grown dramatically and virtual machines are becoming ubiquitous. In addition, the number and variety of devices attached to a system has grown by an order of magnitude.

In this talk, I will discuss problems with existing driver architectures and opportunities for change. Multi-core processors provide the possibility of dedicated driver CPUs (a.k.a. channel processors), and virtualization provides the opportunity for moving I/O handling out of the operating system and into applications. Programming language support for concurrent programming may also simplify driver design. I'll discuss ongoing work in factoring drivers and ideas for driver design in the future.

Monday
November 12
4:00 PM
2310 CS&S
File Systems Are Broken (And What We're Doing To Fix Them)

In this talk, I will present a summary of our recent research in understanding how disks fail and how file and storage systems handle such failures. Our findings reveal numerous design and implementation problems in a wide range of both open-source and commercial systems; put simply, file systems are broken (at least when it comes to reliability).

I will then present a number of current research directions that we are pursuing in order to build a new generation of robust and reliable storage systems. With more formal analysis and construction techniques, we hope to transform the "art" of file system design and implementation into a more rigorous and careful science.

Bio: Remzi Arpaci-Dusseau is an associate professor of Computer Sciences at the University of Wisconsin, Madison. His primary interest is in writing a short bio.

Monday
November 26
4:00 PM
2310 CS&S
A Scalable Failure Recovery Model for Tree-based Overlay Networks

We present a scalable failure recovery model for data aggregations in large scale tree-based overlay networks (TBONs). A TBON is a network of hierarchically organized processes that exploits the logarithmic scaling properties of trees to provide scalable data multicast, gather, and in-network aggregation. TBONs are commonly used in debugging and performance tools, system monitoring, information management systems, stream processing, and mobile ad hoc networks.

Our recovery model leverages inherent information redundancies in TBON computations. This redundant information is gathered from non-failed processes to compensate for computation and communication state lost due to failures. This state compensation strategy is attractive because: (1) it avoids the time and resource overheads of previous reliability approaches, which rely on explicit replication; (2) recovery is rapid and only involves a small subset of the network; and (3) it applies to many useful, complex computations. In this work, we formalize the TBON model and its fundamental properties to prove that our state compensation model properly preserves computational semantics across TBON process failures. These properties lead to an efficient implementation of state compensation, which we use to empirically validate and evaluate recovery performance. We show that state compensation can recover from failures in extremely large TBONs in milliseconds rendering practically no application service interruption.