Synthesis of Machine Code: Algorithms and Applications
Venkatesh Srinivasan
The analysis of binaries has gotten an increasing amount of attention
from the academic community in the last decade. The results of binary
analysis have been predominantly used to answer questions about the
properties of binaries. Another potential use of analysis results is
to rewrite the binary via semantic transformations. Semantics-based
binary-rewriting tools become particularly important when one wishes
to modify the functionality of the binary, and the source code and/or
compiler toolchain for the binary is unavailable. Semantics-based
binary rewriting can be done for the purposes of binary optimization
(offline optimization, superoptimization), software reuse (partial
evaluation, slicing, binary translation), or software security (binary
obfuscation, de-obfuscation).
This dissertation proposes various algorithms for synthesizing a
machine-code implementation from a semantic specification of the
desired behavior, and describes how one can use a machine-code
synthesizer to build different semantics-based binary-rewriting tools.
The first part of this dissertation introduces the problem of
machine-code synthesis, proposes a framework for semantics-based
binary rewriting via machine-code synthesis, and presents several
algorithms for machine-code synthesis. A key challenge in synthesizing
machine code is the enormous size of the synthesis
search-space. Instruction sets like Intel's IA-32 have around 43,000
unique instruction schemas; this huge instruction pool, along with the
exponential cost inherent in enumerative synthesis, results in an
enormous search space for the synthesizer: even for relatively small
specifications, a naive synthesizer might take several days to find an
implementation. We have developed a suite of techniques---ranging from
pruning heuristics to machine learning---that assist an enumerative
synthesizer in combating the enormous search space, and speeding up
synthesis from the order of days to the order of seconds.
The second part of this dissertation describes how one can instantiate
the aforementioned synthesizer-equipped framework to create different
semantics-based binary rewriters. In particular, we have instantiated
the framework to create two tools that rewrite binaries for purposes
of software reuse: a novel machine-code partial evaluator, and an
improved machine-code slicer. Among other potential uses, these tools
can be used to either modify or extract a component from a binary that
lacks source code.
Keywords:
Machine-code synthesis, IA-32 instruction set, machine learning, partial evaluation, slicing
(Click here to access the dissertation:
PDF.)
University of Wisconsin