Our group has a strong record of making our tools on our research publicly available and support them. Below is a chronological list of tools. Email for any questions, dangling pointers etc.

1.  Optimization based spatial architecture scheduler (October 2013)

This page describes our ILP scheduler tool. It can be downloaded here. A web-version is also available at the Live demo link. source tarball

If you use this tool, cite this PLDI paper or journal :

Related Material:

2.  DySER Framework x86 toolchain (compiler, simulator) (October 2013)

We have released our compiler and simulator for the entire DySER toolchain. This page describes the x86-based toolchain which we have released as a virtual machine. Click here

Related papers:

  • Breaking SIMD Shackles: Liberating Accelerators by Exposing Flexible Microarchitectural Mechanisms. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques
  • DySER: Unifying Functionality and Parallelism Specialization for Energy Efficient Computing. IEEE Micro, 33(5), 2012.
  • Dynamically Specialized Datapaths for Energy Efficient Computing, HPCA-11

3.  DySER Framework SPARC toolchain (compiler, simulator, Verilog, FPGA tutorial/bringup instructions) (June 2013)

This toolchain is our SPARC-based toolchain for the DySER project. The compiler is slightly less sophisticated than the x86 compiler. On the other hand this toolchain includes our entire verilog and tutorials for FPGA bringup etc. Click here

  • Dynamically Specialized Datapaths for Energy Efficient Computing, HPCA-11
  • Design Integration and Implementation of the DySER Hardware Accelerator into OpenSPARC, HPCA-12

4.  ISA Power Struggles data and detailed techreport (February 2013)

We have released a techreport and all of data from our ISA Power Struggles HPCA paper. Click here

5.  iCompiler (February 2013)

One of the outcomes of our Idempotence work is our iCompiler which is an LLVM-based compiler that outputs programs that are continuous idempotent regions. The tools pages for the iCompiler is here. We would appreciate citing either of these papers if you use the tool:

  • Idempotent Code Generation: Implementation, Analysis, and Evaluation, CGO-2013
  • Static Analysis and Compiler Design for Idempotent Processing, PLDI-2012

6.  Dark Silicon Models (January 2012)

We have developed a web-based interface for exploring the models used in our ISCA-10 paper on Dark Silicon. A webpage devoted to this tool explains the details and hosts the web model. Click here.

Related papers:

  • Power Challenges May End the Multicore Era, Communications of the ACM (CACM), February 2013.
  • Power Limitations and Dark Silicon are Challenging the Future of Multicore, ACM Transcations on Computer Systems (TOCS), 2012.
  • Multicore Model from Abstract Single Core Inputs, Computer Architecture Letters (CAL), 2012.
  • Dark Silicon and the End of Multicore Scaling, Micro Top Picks 2012.
  • Dark Silicon and the End of Multicore Scaling, ISCA 2011, pdf

7.  Parsec on CUDA (July 2011)

We have developed GPU implementations of some of the PARSEC benchmarks in CUDA. Specifically we have developed GPU implementations for the following benchmarks: blackcholes, fluidanimate, streamcluster, and swaptions.

It is important to note that these files are provided AS IS, and can be improved in many aspects. While we performed some performance optimization, there is more to be done. We do not claim that this is the most optimal implementation. The code is presented as a representative case of a CUDA implementation of these workloads only. It is NOT meant to be interpreted as a definitive answer to how well this application can perform on GPUs or CUDA. If any of you are interested in improving the performance of this benchmark, please let us know.

Link to paper, please cite it if you use our work -- Bibtex.

Additionally, it is important to note that this implementation was based on CUDA SDK 2.3. Future versions of CUDA allow you to implement more C++ features, which may simplify this code or allow other optimizations (in our paper, we note some of these places).

The benchmarks are being released as of July 13th, 2011. Email the following addresses to request to download this implementation:,

Important Notes:

  • (1/24/12) There have been some emails on the PARSEC mailing list about patches for some of the benchmarks. At least one of these affects the sequential/pthreads versions of the benchmarks included in our release. I have not yet had time to update the non-CUDA versions of these programs in the the tarball we provide, so I wanted to put the information here so you can make the change(s) in your copy:

8.  MapReduce for Cell

MapReduce for Cell

MapReduce is a simple and flexible parallel programming model proposed by Google for large scale data processing in a distributed computing environment. We have developed a design and implementation of MapReduce for the Cell processor architecture.

The runtime is available for public download here. The package includes an application suite that demonstrates usage of the runtime. Applications include a word count application, distributed sort, the kmeans clustering algorithm, and several other applications.

If you use this work in your own work, we would appreciate you letting us know. If you want to cite MapReduce for Cell in your research writings, please refer to the paper for the Cell B.E. Architecture. by M. de Kruijf and K. Sankaralingam, University of Wisconsin Computer Sciences Technical Report CS-TR-2007-1625, October 2007. An extended version of this paper is also published in IBM Journal of Research and Development (Volume:53 , Issue: 5 ).