A Next-Generation Platform for Analyzing Executables
T. Reps, G. Balakrishnan, J. Lim, and T. Teitelbaum
In recent years, there has been a growing need for tools that an
analyst can use to understand the workings of COTS components,
plugins, mobile code, and DLLs, as well as memory snapshots of worms
and virus-infected code. Static analysis provides techniques that can
help with such problems; however, there are several obstacles that
must be overcome:
We have developed static-analysis algorithms to recover
information about the contents of memory locations and how they are
manipulated by an executable. By combining these analyses with
facilities provided by the IDAPro and CodeSurfer toolkits, we have
created CodeSurfer/x86, a prototype tool for browsing, inspecting, and
analyzing x86 executables. From an x86 executable, CodeSurfer/x86
recovers intermediate representations that are similar to what would
be created by a compiler for a program written in a high-level
language. CodeSurfer/x86 also supports a scripting language, as well
as several kinds of sophisticated pattern-matching capabilities.
These facilities provide a platform for the development of additional
tools for analyzing the security properties of executables.
For many kinds of potentially malicious programs, symbol-table and
debugging information is entirely absent. Even if it is present,
it cannot be relied upon.
To understand memory-access operations, it is
necessary to determine the set of addresses accessed by each
operation. This is difficult because
While some memory operations use explicit memory addresses
in the instruction (easy), others use indirect addressing
via address expressions (difficult).
Arithmetic on addresses is pervasive. For instance, even when
the value of a local variable is loaded from its slot in an
activation record, address arithmetic is performed.
There is no notion of type at the hardware level, so address
values cannot be distinguished from integer values.
Memory accesses do not have to be aligned, so word-sized
address values could potentially be cobbled together from misaligned
reads and writes.
(Click here to access the paper: