Computer Sciences Dept.

Clustera: An Integrated Computation And Data Management System

David J. DeWitt, Eric Robinson, Srinath Shankar, Erik Paulson, Jeffrey Naughton, Joshua Royalty, Andrew Krioukov
2008

This paper introduces Clustera, an integrated computation and data management system. In contrast to traditional cluster-management systems that target specific types of workloads, Clustera is designed for extensibility, enabling the system to be easily extended to handle a wide variety of job types ranging from computationally-intensive, long-running jobs with minimal I/O requirements to complex SQL queries over massive relational tables. Another unique feature of Clustera is the way in which the system architecture exploits modern software building blocks including application servers and relational database systems in order to realize important performance, scalability, portability and usability benefits. Finally, experimental evaluation suggests that Clustera has good scale-up properties for SQL processing, that Clustera delivers performance comparable to Hadoop for MapReduce processing and that Clustera can support higher job throughput rates than previously published results for the Condor and CondorJ2 batch computing systems.

Download this report (PDF)


Return to tech report index

 
Computer Science | UW Home