Software-Architecture Recovery from Machine Code

Venkatesh Karthik Srinivasan and Thomas Reps
University of Wisconsin

In this paper, we present a tool, called Lego, which recovers object-oriented software architecture from stripped binaries. Lego takes a stripped binary as input, and uses information obtained from dynamic analysis to (i) group the functions in the binary into classes, and (ii) identify inheritance and composition relationships between the inferred classes. The information obtained by Lego can be used for reengineering legacy software, and for understanding the architecture of software systems that lack documentation and source code. Our experiments show that the class hierarchies recovered by Lego have a high degree of agreement -- measured in terms of precision and recall -- with the hierarchy defined in the source code.

(Click here to access the paper: PDF.)