On this page... (hide)
Our group has a strong record of making our tools on our research publicly available and support them. Below is a chronological list of tools. Email firstname.lastname@example.org for any questions, dangling pointers etc.
This page describes our ILP scheduler tool. It can be downloaded here. A web-version is also available at the Live demo link. source tarball
If you use this tool, cite this PLDI paper or journal :
We have released our compiler and simulator for the entire DySER toolchain. This page describes the x86-based toolchain which we have released as a virtual machine. Click here
This toolchain is our SPARC-based toolchain for the DySER project. The compiler is slightly less sophisticated than the x86 compiler. On the other hand this toolchain includes our entire verilog and tutorials for FPGA bringup etc. Click here
We have released a techreport and all of data from our ISA Power Struggles HPCA paper. Click here
One of the outcomes of our Idempotence work is our iCompiler which is an LLVM-based compiler that outputs programs that are continuous idempotent regions. The tools pages for the iCompiler is here. We would appreciate citing either of these papers if you use the tool:
We have developed a web-based interface for exploring the models used in our ISCA-10 paper on Dark Silicon. A webpage devoted to this tool explains the details and hosts the web model. Click here.
We have developed GPU implementations of some of the PARSEC benchmarks in CUDA. Specifically we have developed GPU implementations for the following benchmarks: blackcholes, fluidanimate, streamcluster, and swaptions.
It is important to note that these files are provided AS IS, and can be improved in many aspects. While we performed some performance optimization, there is more to be done. We do not claim that this is the most optimal implementation. The code is presented as a representative case of a CUDA implementation of these workloads only. It is NOT meant to be interpreted as a definitive answer to how well this application can perform on GPUs or CUDA. If any of you are interested in improving the performance of this benchmark, please let us know.
Additionally, it is important to note that this implementation was based on CUDA SDK 2.3. Future versions of CUDA allow you to implement more C++ features, which may simplify this code or allow other optimizations (in our paper, we note some of these places).
The benchmarks are being released as of July 13th, 2011. Email the following addresses to request to download this implementation: email@example.com, firstname.lastname@example.org
Based on our models of timing speculation in our DSN paper titled: "A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design Style, and Fault Recovery Mechanism.", we have developed a web-based tool for those models for others to use. See below.
MapReduce is a simple and flexible parallel programming model proposed by Google for large scale data processing in a distributed computing environment. We have developed a design and implementation of MapReduce for the Cell processor architecture.
The runtime is available for public download here. The package includes an application suite that demonstrates usage of the runtime. Applications include a word count application, distributed sort, the kmeans clustering algorithm, and several other applications.
If you use this work in your own work, we would appreciate you letting us know. If you want to cite MapReduce for Cell in your research writings, please refer to the paper for the Cell B.E. Architecture. by M. de Kruijf and K. Sankaralingam, University of Wisconsin Computer Sciences Technical Report CS-TR-2007-1625, October 2007. An extended version of this paper is also published in IBM Journal of Research and Development (Volume:53 , Issue: 5 ).