Refinement of Path Expressions for Static Analysis

John Cyphert, Jason Breck, Zachary Kincaid, and Thomas Reps

Algebraic program analyses compute information about a program's behavior by first (a) computing a valid path expression—i.e., a regular expression that recognizes all feasible execution paths (and usually more)—and then (b) interpreting the path expression in a semantic algebra that defines the analysis. There are an infinite number of different regular expressions that qualify as valid path expressions, which raises the question ``Which one should we choose?'' While any choice yields a sound result, for many analyses the choice can have a drastic effect on the precision of the results obtained. This paper investigates the following two questions:

What does it mean for one valid path expression to be ``better'' than another?

Can we compute a valid path expression that is ``better,'' and if so, how?

We show that it is not satisfactory to compare two path expressions E₁ and E₂ solely by means of the languages that they generate. Counter to one's intuition, it is possible for L(E₂) ⊊ L(E₁), yet for E₂ to produce a less-precise analysis result than E₁—and thus we would not want to perform the transformation E₁ → E₂. However, the exclusion of paths so as to analyze a smaller language of paths is exactly the refinement criterion used by some prior methods. In this paper, we develop an algorithm that takes as input a valid path expression E, and returns a valid path expression E' that is guaranteed to yield analysis results that are at least as good as those obtained using E. While the algorithm sometimes returns E itself, it typically does not: (i) we prove a no-degradation result for the algorithm's base case—for transforming a leaf loop (i.e., a most-deeply-nested loop); (ii) at a non-leaf loop L, the algorithm treats each loop L' in the body of L as an indivisible atom, and applies the leaf-loop algorithm to L; the no-degradation result carries over to (ii), as well. Our experiments show that the technique has a substantial impact: the loop-refinement algorithm allows the implementation of Compositional Recurrence Analysis to prove over 25% more assertions for a collection of challenging loop micro-benchmarks.

(Click here to access the paper: PDF.)