Simulation is a key method used by computer architects to evaluate new ideas. For example, the execution-driven simulator SimpleScalar has been tremendously successful in the academic research community because it supplies researchers a simple framework which they can build upon. However, SimpleScalar is not able to run a full-scale operating system and is quite limited in its threading capabilities. Therefore, SimpleScalar is often limited to evaluating uniprocessor systems running SPEC workloads.
The emergence of chip-multiprocessors (CMPs), and other multi-threaded architectures, requires full-system simulation for accurate evaluation. Full-system simulators are capable of simulating actual operating environments required by multi-threaded execution. Furthermore, multi-threaded workloads place a greater demand on the operating system and ignoring its effect decreases the fidelity of the results. On the other hand, full-system simulation is more complex and costly to implement and evaluate.
In this tutorial we present Multifacet's General Execution Model Simulator (GEMS) that provides an accurate and flexible full-system simulation environment, while minimizing the implementation cost and complexity. Although originally developed to simulate SMP systems (as published in ), GEMS is now capable of simulating Multiple-CMP systems with high fidelity. A key characteristic of GEMS is its reliance on Virtutech Simics, a full-system functional simulator to provide correctness. For instance, Simics allows GEMS to simulate a multiple-processor SPARC system running workloads such as IBM's DB2 on Solaris. Because GEMS offloads functional correctness to Simics, GEMS can concentrate on providing architects a simple and flexible framework for performance studies of a wide variety of systems.
GEMS can simulate both in-order and out-of-order processors. For quick, less accurate simulation, GEMS utilizes the in-order blocking processor model provided by Simics. For more accurate simulation, we developed an out-of-order processor module called Opal. Opal is a timing-first simulater-- it implements the performance sensitive aspects of an OOO processor but ultimately relies on Simics to provide absolute correctness. This captures the performance of an OOO processor without the complexity of implementing every instruction and architectural quirk.
GEMS implements a vast assortment of multiprocessor shared memory systems. Ruby is the module we developed that allows the simulation of memory system models ranging from a uniprocessor with DNUCA caches to a Multiple-CMP system with shared caches. To simplify and speed the development of different target systems, we developed a language and code generator called SLICC (Specification Language Integrating Cache Coherence). Different systems can be specified in SLICC, a table-driven language, which then generates C++ files for Ruby. This speeds development because the language is geared towards cache-coherence protocols, and abstracts away the details of C++.
In the coming months (before the tutorial date), we plan on releasing the GEMS components we developed, to the community, under an open source license. Hence if a researcher gets access to Virtutech Simics, he or she can add our components to enable detailed full-system simulation of SMPs, CMPs, and Multiple-CMPs. The purpose of this tutorial is to present GEMS-- its design, use, examples, and limitations. We envision a hands-on tutorial where we will give demonstrations of the simulator. As part of our tutorial, we will discuss Simics from a simulation implementer's point-of-view as well as other issues and techniques such as workload variability.
Researchers in academia and industry interested in full-system simulation based on Simics.