Recovery of Class Hierarchies and Composition Relationships from Machine Code

Venkatesh Srinivasan and Thomas Reps
University of Wisconsin

We present a reverse-engineering tool, called Lego, which recovers class hierarchies and composition relationships from stripped binaries. Lego takes a stripped binary as input, and uses information obtained from dynamic analysis to (i) group the functions in the binary into classes, and (ii) identify inheritance and composition relationships between the inferred classes. The software artifacts recovered by Lego can be subsequently used to understand the object-oriented design of software systems that lack documentation and source code, e.g., to enable interoperability. Our experiments show that the class hierarchies recovered by Lego have a high degree of agreement---measured in terms of precision and recall---with the hierarchy defined in the source code.

(Click here to access the paper: PDF; (c) Springer-Verlag.)