Computer Sciences Dept.

MapReduce for the Cell B.E. Architecture

Marc de Kruijf and Karthikeyan Sankaralingam
2007

MapReduce is a simple and flexible parallel programming model proposed by Google for large scale data processing in a distributed computing environment [4]. In this paper, we present a design and implementation of MapReduce for the Cell architecture. This model provides a simple machine abstraction to users, hiding parallelization and hardware primitives. Our runtime automatically manages parallelization, scheduling, partitioning and memory transfers. We study the basic characteristics of the model and evaluate our runtime’s performance, scalability, and efficiency for micro-benchmarks and complete applications.We show that the model is well suited for many applications that map well to the Cell architecture, and that the runtime sustains high performance on these applications. For other applications, we analyze runtime performance and describe why performance is less impressive. Overall, we find that the simplicity of the model and the efficiency of our MapReduce implementationmake it an attractive choice for the Cell platform specifically and more generally to distributed memory systems and software-exposed memories.

Download this report (PDF)


Return to tech report index

 
Computer Science | UW Home