WYSINWYX: What You See Is Not What You eXecute
Gogul Balakrishnan
There is an increasing need for tools to help programmers and security
analysts understand executables. For instance, commercial companies
and the military increasingly use Commercial Off-The Shelf (COTS)
components to reduce the cost of software development. They are
interested in ensuring that COTS components do not perform malicious
actions (or can be forced to perform malicious actions). Viruses and
worms have become ubiquitous. A tool that aids in understanding their
behavior can ensure early dissemination of signatures, and thereby
control the extent of damage caused by them. In both domains, the
questions that need to be answered cannot be answered perfectly -- the
problems are undecidable -- but static analysis provides a way to
answer them conservatively.
In recent years, there has been a considerable amount of research
activity to develop analysis tools to find bugs and security
vulnerabilities. However, most of the effort has been on analysis of
source code, and the issue of analyzing executables has largely been
ignored. In the security context, this is particularly unfortunate,
because performing analysis on the source code can fail to detect
certain vulnerabilities due to the WYSINWYX phenomenon: ``What You
See Is Not What You eXecute''. That is, there can be a mismatch
between what a programmer intends and what is actually executed on the
processor.
Even though the advantages of analyzing executables are appreciated
and well-understood, there is a dearth of tools that work on
executables directly. The overall goal of our work is to develop
algorithms for analyzing executables, and to explore their
applications in the context of program understanding and automated bug
hunting. Unlike existing tools, we want to provide useful information
about memory accesses, even in the absence of debugging
information. Specifically, the dissertation focuses on the following
aspects of the problem:
Because executables do not have a notion of variables similar to the
variables in programs for which source code is available, one of the
important aspects of IR recovery is to determine a collection of
variable-like entities for the executable. The quality of the
recovered variables affects the precision of an analysis that gathers
information about memory accesses in an executable, and therefore, it
is desirable to recover a set of variables that closely approximate
the variables of the original source-code program. On average, our
technique is successful in identifying correctly over 88% of the local
variables and over 89% of the fields of heap-allocated objects. In
contrast, previous techniques, such as the one used in the IDAPro
disassembler, recovered 83% of the local variables, but 0% of the
fields of heap-allocated objects.
Recovering useful information about heap-allocated storage is another
challenging aspect of IR recovery. We propose an abstraction of
heap-allocated storage called recency-abstraction, which is somewhere
in the middle between the extremes of one summary node per malloc site
and complex shape abstractions. We used the recency-abstraction to
resolve virtual-function calls in executables obtained by compiling
C++ programs. The recency-abstraction enabled our tool to discover the
address of the virtual-function table to which the virtual-function
field of a C++ object is initialized in a substantial number of
cases. Using this information, we were able to resolve, on average,
60% of the virtual-function call sites in executables that were
obtained by compiling C++ programs.
To assess the usefulness of the recovered IR in the context of bug
hunting, we used CodeSurfer/x86 to analyze device-driver executables
without the benefit of either source code or symbol-table/debugging
information. We were able to find known bugs (that had been discovered
by source-code analysis tools), along with useful error traces, while
having a low false-positive rate.
(Click here to access the dissertation:
PDF.)
University of Wisconsin
The algorithms described in this dissertation are incorporated in a
tool we built for analyzing Intel x86 executables, called
CodeSurfer/x86.