Transformer Specification Language:
Junghee Lim
As computers have become a pivotal component of daily lives, computer
safety, reliability, and security issues have become enormously
important. A considerable amount of recent research in program
analysis and software engineering has been carried out on techniques
and tools for finding software bugs and security vulnerabilities, and
on checking computer-safety properties. Most of this research has
focused on analyzing source code. Recently, machine-code analysis has
begun to receive great attention both because source code is often
unavailable and because there can be mismatches in various ways
between source code and the machine code generated from the source
code.
The tools and techniques for analyzing machine code are, in principle,
language-independent. However, their implementations are often tied to
one specific instruction set. Retargeting them to another instruction
set can be an expensive and error-prone process. This dissertation
describes a system that I developed, called TSL (for ``Transformer
Specification Language'') that provides a systematic solution to the
problem of creating retargetable tools for analyzing machine-code. The
TSL system is a meta-tool, or tool generator, that automatically
creates different abstract interpreters for machine-code instruction
sets. The system addresses the problem of supporting multiple
instruction sets by providing a YACC-like mechanism for creating key
components of machine-code analyzers. The TSL system takes a single,
unified description of the concrete operational semantics of an
instruction set, which is specified in TSL, a strongly typed,
first-order functional language, and automatically creates
implementations of different abstract interpreters for the given
instruction set.
TSL provides a fixed set of base-types and operators, as well as
map-types with map-access and (applicative) map-update operations. The
TSL compiler generates a common intermediate representation that
allows the meanings of the input-language constructs to be redefined
by supplying alternative interpretations of the base-types, map-types,
and the operations on them (``semantic reinterpretation''). Because all
the abstract operations are defined at the meta-level, a semantic
reinterpretation is independent of any given instruction set defined
in TSL. Therefore, each implementation of an analysis component's
driver serves as the unchanging driver for use in different
instantiations of the analysis component for different instruction
sets. The TSL language becomes the specification language for
retargeting that analysis component to different instruction sets.
As an application of the TSL system, we developed a novel way of
applying semantic reinterpretation to automatically create
symbolic-analysis primitives for symbolic evaluation, weakestliberal
precondition, and symbolic composition. Furthermore, using the TSL
system, as well as the TSL-generated symbolic-analysis primitives, we
developed a machine-code verification tool, called MCVETO, and a
concolic-execution-based program-exploration tool, called BCE.
(Click here to access the paper:
PDF.)
A System for Generating Analyzers and its Applications
University of Wisconsin