Porting CMP Benchmarks to GPUs
Matthew D. Sinclair, Henry Duwe, Karthikeyan Sankaralingam
GPUs have become increasingly popular in recent years, in large part due to their potential to offer a large amount of computational power at low prices. They offer massive potential speedups in program performance, but only if an application maps well to its data parallel programming model. However, it is unclear how to effectively port programs that do not map well onto the GPU programming model. The amount of performance these programs will have on GPUs is also unclear. If GPUs can be shown to execute general-purpose programs with high performance, then it is possible that a GPU-like, many-core architecture could provide the next big increase in general-purpose program performance. In this project, we implemented four benchmarks from the PARSEC CMP benchmarks suite on GPUs -- streamcluster, blackscholes, fluidanimate, and swaptions -- then analyzed their performance and compared their performance to that of the PARSEC serial and pthreads versions of the same programs. We also investigated what general-purpose programming techniques worked well when mapped to a GPU, what techniques did not work well, and where bottlenecks occurred. We observed that general-purpose programs neither mapped uniformly easily nor well to GPUs in our implementations.
Download this report (PDF)
Return to tech report index