Refinement of Path Expressions for Static Analysis
John Cyphert, Jason Breck, Zachary Kincaid, and Thomas Reps
Algebraic program analyses compute information about a program's
behavior by first (a) computing a valid path
expression—i.e., a regular expression that recognizes all feasible
execution paths (and usually more)—and then (b) interpreting
the path expression in a semantic algebra that defines the analysis.
There are an infinite number of different regular expressions that
qualify as valid path expressions, which raises the question
``Which one should we choose?''
While any choice yields a sound result, for many analyses the choice
can have a drastic effect on the precision of the results obtained.
This paper investigates the following two questions:
(Click here to access the paper:
PDF.)
We show that it is not satisfactory to compare two path expressions
E1 and E2 solely by means of the
languages that they generate.
Counter to one's intuition, it is possible for L(E2) ⊊ L(E1),
yet for E2 to produce a less-precise analysis result
than E1—and thus we would not want to perform the transformation
E1 → E2.
However, the exclusion of paths so as to analyze a smaller language of
paths is exactly the refinement criterion used by some prior methods.
In this paper, we develop an algorithm that takes as input a valid
path expression E, and returns a valid path expression E' that
is guaranteed to yield analysis results that are at least as good as
those obtained using E.
While the algorithm sometimes returns E itself, it typically does not:
(i) we prove a no-degradation result for the algorithm's base case—for
transforming a leaf loop (i.e., a most-deeply-nested loop);
(ii) at a non-leaf loop L, the algorithm treats each loop L' in
the body of L as an indivisible atom, and applies the leaf-loop algorithm to L;
the no-degradation result carries over to (ii), as well.
Our experiments show that the technique has a substantial impact:
the loop-refinement algorithm allows the implementation of
Compositional Recurrence Analysis to prove over 25% more assertions
for a collection of challenging loop micro-benchmarks.