Numeric Program Analysis Techniques with Applications to Array Analysis
and Library Summarization
Denis Gopan
Numeric program analysis is of great importance for the areas of
software engineering, software verification, and security: to identify
many program errors, such as out-of-bounds array accesses and integer
overflows, which constitute the lion's share of security
vulnerabilities reported by CERT, an analyzer needs to establish
numeric properties of program variables. Many important program
analyses, such as low-level code analysis, memory-cleanness analysis,
and shape analysis, rely in some ways on numeric-program-analysis
techniques. However, existing numeric abstractions are complex
(numeric abstract domains are typically non-distributive, and form
infinite-height lattices); thus, obtaining precise numeric-analysis
results is by no means a trivial undertaking.
In this thesis, we develop a suite of techniques with the common goal
of improving the precision and applicability of numeric program
analysis. The techniques address various aspects of numeric analysis,
such as handling dynamically-allocated memory, dealing with programs
that manipulate arrays, improving the precision of extrapolation
(widening), and performing interprocedural analysis. The techniques
use existing numeric abstractions as building blocks. The
communication with existing abstractions is done strictly through a
generic abstract-domain interface. The abstractions constructed by our
techniques also expose that same interface, and thus, are compatible
with existing analysis engines. As a result, our techniques are
independent from specific abstractions and specific analysis engines,
can be easily incorporated into existing program-analysis tools, and
should be readily compatible with new abstractions to be introduced in
the future.
A practical application of numeric analysis that we consider in this
thesis is the automatic generation of summaries for library functions
from their low-level implementation (that is, from a library's
binary). The source code for library functions is typically not
available. This poses a stumbling block for many source-level program
analyses. Automatic generation of summary functions will both speed up
and improve the accuracy of library-modeling, a process that is
currently carried out by hand. This thesis addresses the automatic
generation of summaries for memory-safety analysis.
(Click here to access the thesis:
PDF.)
University of Wisconsin