Main

Tools

CUDA Implementation of Parsec

We have developed GPU implementations of some of the PARSEC benchmarks in CUDA. Specifically we have developed GPU implementations for the following benchmarks: blackcholes, fluidanimate, streamcluster, and swaptions.

It is important to note that these files are provided AS IS, and can be improved in many aspects. While we performed some performance optimization, there is more to be done. We do not claim that this is the most optimal implementation. The code is presented as a representative case of a CUDA implementation of these workloads only. It is NOT meant to be interpreted as a definitive answer to how well this application can perform on GPUs or CUDA. If any of you are interested in improving the performance of this benchmark, please let us know.

Additionally, it is important to note that this implementation was based on CUDA SDK 2.3. Future versions of CUDA allow you to implement more C++ features, which may simplify this code or allow other optimizations (in our paper, we note some of these places).

The benchmarks have been released as of July 13th, 2011. Email the following addresses to request to download this implementation: sinclair@cs.wisc.edu, karu@cs.wisc.edu

Link to paper, please cite it if you use our work -- Bibtex.

Important Notes:

  • (1/24/12) There have been some emails on the PARSEC mailing list about patches for some of the benchmarks. At least one of these affects the sequential/pthreads versions of the benchmarks included in our release. I have not yet had time to update the non-CUDA versions of these programs in the the tarball we provide, so I wanted to put the information here so you can make the change(s) in your copy:

Model for Timing Speculation

Details and download available soon. Email karu@cs.wisc.edu for advance release.

MapReduce for Cell

MapReduce is a simple and flexible parallel programming model proposed by Google for large scale data processing in a distributed computing environment. We have developed a design and implementation of MapReduce for the Cell processor architecture.

The runtime is available for public download here. The package includes an application suite that demonstrates usage of the runtime. Applications include a word count application, distributed sort, the kmeans clustering algorithm, and several other applications.

One page summary

If you use this work in your own work, or develop on top of this work, we would appreciate you letting us know. If you want to cite MapReduce for Cell in your research writings, please refer to the paper “MapReduce for the Cell B.E. Architecture.” by M. de Kruijf and K. Sankaralingam, University of Wisconsin Computer Sciences Technical Report CS-TR-2007-1625, October 2007.