Fast Graph Simplification for Interleaved Dyck-Reachability
Yuanbo Li, Qirun Zhang, and Thomas Reps
Many program-analysis problems can be formulated as graph-reachability problems.
Interleaved Dyck language reachability
(InterDyck-reachability) is a fundamental framework to express a wide variety
of program-analysis problems over edge-labeled graphs.
The InterDyck language represents an
intersection of multiple matched-parenthesis
languages (i.e., Dyck languages).
In practice, program analyses typically leverage one Dyck language to achieve
context-sensitivity, and other Dyck languages to model data dependences, such as
field-sensitivity and pointer references/dereferences.
In the ideal case, an InterDyck-reachability framework should model multiple
Dyck languages simultaneously.
Unfortunately, precise InterDyck-reachability is undecidable.
Any practical solution must over-approximate the exact answer.
In the literature, a lot of work has been proposed to over-approximate
the InterDyck-reachability formulation.
This paper offers a new perspective on improving both the precision
and the scalability of InterDyck-reachability:
we aim to simplify the underlying input graph G.
Our key insight is based on the observation that if an edge is not
contributing to any InterDyck-path, we can safely eliminate it
from G.
Our technique is orthogonal to the InterDyck-reachability formulation,
and can serve as a pre-processing step with any over-approximating
approaches for InterDyck-reachability.
We have applied our graph simplification algorithm to pre-processing
the graphs from a recent InterDyck-reachability-based taint analysis
for Android.
Our evaluation on three popular InterDyck-reachability algorithms
yields promising results.
In particular, our graph-simplification method improves both the
scalability and precision of all three InterDyck-reachability
algorithms, sometimes dramatically.
(Click here to access the paper:
PDF;
University of Wisconsin