Safety Checking of Machine Code

Zhichen Xu, Barton Miller, Thomas Reps
University of Wisconsin

Importing and executing untrusted foreign code has become an everyday occurrence: Web servers download plug-ins and applets; databases load type-specific extensions; and operating systems load customized policies and performance-measurement code. There are even proposals for loading application-specific policies into Internet routers. Certification of the safety of untrusted code is crucial in these domains.

In this paper, we show how to determine statically whether it is safe for untrusted machine code to be loaded into a trusted host system. In contrast to work that enforces a fixed safety policy for a specific source language (e.g., safe languages, certifying compilers, and typed-assembly languages), our safety-checking technique (i) operates directly on binary code, independent of the compiler or language, and does not require that source code be available; (ii) provides the ability to extend the host at a very fine-grained level, in that we allow the foreign code to manipulate the internal data structures of the host directly; (iii) enforces not only a default collection of safety conditions to prevent array out-of-bounds violations, address-alignment violations, uses of uninitialized variables, null-pointer dereferences, and stack-manipulation violations, but also provides the ability for the safety criterion to be extended according to an access policy specified by the host. The essence of our approach is to recover source-level type information (more precisely, typestate information) based on a small amount of annotation about the typestates of the initial inputs to the untrusted code, and then to apply some techniques that were originally developed for program verification to determine whether the untrusted code is safe.

We have implemented a prototype safety checker for SPARC machine-language programs. We applied the safety checker to several simple examples (ranging from code that contains just a few branches, to code that contains nested loops). The safety checker was able to mechanically synthesize the loop invariants in all cases, and was able to prove that these examples met the necessary safety conditions, in times ranging from less than a second to about 6 seconds on an UltraSPARC machine.

(Click here to access the paper.)