Transformer Specification Language:
A System for Generating Analyzers and its Applications

Junghee Lim
University of Wisconsin

As computers have become a pivotal component of daily lives, computer safety, reliability, and security issues have become enormously important. A considerable amount of recent research in program analysis and software engineering has been carried out on techniques and tools for finding software bugs and security vulnerabilities, and on checking computer-safety properties. Most of this research has focused on analyzing source code. Recently, machine-code analysis has begun to receive great attention both because source code is often unavailable and because there can be mismatches in various ways between source code and the machine code generated from the source code.

The tools and techniques for analyzing machine code are, in principle, language-independent. However, their implementations are often tied to one specific instruction set. Retargeting them to another instruction set can be an expensive and error-prone process. This dissertation describes a system that I developed, called TSL (for ``Transformer Specification Language'') that provides a systematic solution to the problem of creating retargetable tools for analyzing machine-code. The TSL system is a meta-tool, or tool generator, that automatically creates different abstract interpreters for machine-code instruction sets. The system addresses the problem of supporting multiple instruction sets by providing a YACC-like mechanism for creating key components of machine-code analyzers. The TSL system takes a single, unified description of the concrete operational semantics of an instruction set, which is specified in TSL, a strongly typed, first-order functional language, and automatically creates implementations of different abstract interpreters for the given instruction set.

TSL provides a fixed set of base-types and operators, as well as map-types with map-access and (applicative) map-update operations. The TSL compiler generates a common intermediate representation that allows the meanings of the input-language constructs to be redefined by supplying alternative interpretations of the base-types, map-types, and the operations on them (``semantic reinterpretation''). Because all the abstract operations are defined at the meta-level, a semantic reinterpretation is independent of any given instruction set defined in TSL. Therefore, each implementation of an analysis component's driver serves as the unchanging driver for use in different instantiations of the analysis component for different instruction sets. The TSL language becomes the specification language for retargeting that analysis component to different instruction sets.

As an application of the TSL system, we developed a novel way of applying semantic reinterpretation to automatically create symbolic-analysis primitives for symbolic evaluation, weakestliberal precondition, and symbolic composition. Furthermore, using the TSL system, as well as the TSL-generated symbolic-analysis primitives, we developed a machine-code verification tool, called MCVETO, and a concolic-execution-based program-exploration tool, called BCE.

MCVETO addresses a large number of issues that arise when developing model-checking tools for machine code, for which standard techniques used in source-code model-checking tools would be unsound if applied to machine code.

What distinguishes the work on BCE is that it makes use of control-dependence information to make program exploration goal-directed toward a given set of targets.

(Click here to access the paper: PDF.)