Date
Topic and Speaker
Thursday
February 21
2:00 PM
2310 CS&S
Reliable and Efficient Data Intensive Distributed Computing

The increasing computation and data requirements of scientific applications have necessitated the use of distributed resources owned by collaborating parties. While existing distributed systems work well for computation that requires limited data movement, they fail in unexpected ways when the computation accesses, creates, and moves large amounts of data over wide-area networks. Existing systems closely couple data movement and computation, and consider data movement as a side effect of computation. I propose a framework that de-couples data movement from computation, and acts as an I/O subsystem for distributed systems. This system provides a uniform interface to heterogeneous storage sytems and data transfer protocols; permits policy support and higher-level optimization; and enables reliable, efficient scheduling of compute and data resources.