We have developed GPU implementations of some of the PARSEC benchmarks in CUDA. Specifically we have developed GPU implementations for the following benchmarks: blackcholes, fluidanimate, streamcluster, and swaptions.
It is important to note that these files are provided AS IS, and can be improved in many aspects. While we performed some performance optimization, there is more to be done. We do not claim that this is the most optimal implementation. The code is presented as a representative case of a CUDA implementation of these workloads only. It is NOT meant to be interpreted as a definitive answer to how well this application can perform on GPUs or CUDA. If any of you are interested in improving the performance of this benchmark, please let us know.
Additionally, it is important to note that this implementation was based on CUDA SDK 2.3. Future versions of CUDA allow you to implement more C++ features, which may simplify this code or allow other optimizations (in our paper, we note some of these places).
The benchmarks are being released as of July 13th, 2011. Email the following addresses to request to download this implementation: sinclair@cs.wisc.edu, karu@cs.wisc.edu.
Details can also be found here. Link to paper. Please cite this paper if you use our work -- Bibtex.
Important notes:
- (1/24/12) There have been some emails on the PARSEC mailing list about patches for some of the benchmarks. At least one of these affects the sequential/pthreads versions of the benchmarks included in our release. I have not yet had time to update the non-CUDA versions of these programs in the the tarball we provide, so I wanted to put the information here so you can make the change(s) in your copy: