8 Library of CIL Modules
We are developing a suite of modules that use CIL for program analyses and
transformations that we have found useful. You can use these modules directly
on your code, or generally as inspiration for writing similar modules. A
particularly big and complex application written on top of CIL is CCured
(../ccured/index.html).
8.1 Data flow analysis framework
The module dataflow.ml contains a parameterized framework for forward and
backward data flow analyses. You provide the transfer functions and this
module does the analysis.
8.2 Dominators
The module dominators.ml contains the computation of immediate
dominators. It uses the dataflow.ml module.
8.3 Points-to Analysis
The module ptranal.ml contains two interprocedural points-to
analyses for CIL: Olf and Golf. Olf is the default.
(Switching from olf.ml to golf.ml requires a change in
Ptranal and a recompiling cilly.)
The analyses have the following characteristics:
-
Not based on C types (inferred pointer relationships are sound
despite most kinds of C casts)
- One level of subtyping
- One level of context sensitivity (Golf only)
- Monomorphic type structures
- Field insensitive (fields of structs are conflated)
- Demand-driven (points-to queries are solved on demand)
- Handle function pointers
The analysis itself is factored into two components: Ptranal,
which walks over the CIL file and generates constraints, and Olf
or Golf, which solve the constraints. The analysis is invoked
with the function Ptranal.analyze_file: Cil.file ->
unit. This function builds the points-to graph for the CIL file
and stores it internally. There is currently no facility for clearing
internal state, so Ptranal.analyze_file should only be called
once.
The constructed points-to graph supports several kinds of queries,
including alias queries (may two expressions be aliased?) and
points-to queries (to what set of locations may an expression point?).
The main interface with the alias analysis is as follows:
-
Ptranal.may_alias: Cil.exp -> Cil.exp -> bool. If
true, the two expressions may have the same value.
- Ptranal.resolve_lval: Cil.lval -> (Cil.varinfo
list). Returns the list of variables to which the given
left-hand value may point.
- Ptranal.resolve_exp: Cil.exp -> (Cil.varinfo list).
Returns the list of variables to which the given expression may
point.
- Ptranal.resolve_funptr: Cil.exp -> (Cil.fundec
list). Returns the list of functions to which the given
expression may point.
The precision of the analysis can be customized by changing the values
of several flags:
-
Ptranal.no_sub: bool ref.
If true, subtyping is disabled. Associated commandline option:
--ptr_unify.
- Ptranal.analyze_mono: bool ref.
(Golf only) If true, context sensitivity is disabled and the
analysis is effectively monomorphic. Commandline option:
--ptr_mono.
- Ptranal.smart_aliases: bool ref.
(Golf only) If true, “smart” disambiguation of aliases is
enabled. Otherwise, aliases are computed by intersecting points-to
sets. This is an experimental feature.
- Ptranal.model_strings: bool ref.
Make the alias analysis model string constants by treating them as
pointers to chars. Commandline option: --ptr_model_strings
- Ptranal.conservative_undefineds: bool ref.
Make the most pessimistic assumptions about globals if an undefined
function is present. Such a function can write to every global
variable. Commandline option: --ptr_conservative
In practice, the best precision/efficiency tradeoff is achieved by
setting Ptranal.no_sub to false, Ptranal.analyze_mono to
true, and Ptranal.smart_aliases to false. These are the
default values of the flags.
There are also a few flags that can be used to inspect or serialize
the results of the analysis.
-
Ptranal.debug_may_aliases.
Print the may-alias relationship of each pair of expressions in the
program. Commandline option: --ptr_may_aliases.
- Ptranal.print_constraints: bool ref.
If true, the analysis will print each constraint as it is
generated.
- Ptranal.print_types: bool ref.
If true, the analysis will print the inferred type of each
variable in the program.
If Ptranal.analyze_mono and Ptranal.no_sub are both
true, this output is sufficient to reconstruct the points-to
graph. One nice feature is that there is a pretty printer for
recursive types, so the print routine does not loop.
- Ptranal.compute_results: bool ref.
If true, the analysis will print out the points-to set of each
variable in the program. This will essentially serialize the
points-to graph.
8.4 StackGuard
The module heapify.ml contains a transformation similar to the one
described in “StackGuard: Automatic Adaptive Detection and Prevention of
Buffer-Overflow Attacks”, Proceedings of the 7th USENIX Security
Conference. In essence it modifies the program to maintain a separate
stack for return addresses. Even if a buffer overrun attack occurs the
actual correct return address will be taken from the special stack.
Although it does work, this CIL module is provided mainly as an example of
how to perform a simple source-to-source program analysis and
transformation. As an optimization only functions that contain a dangerous
local array make use of the special return address stack.
For a concrete example, you can see how cilly --dostackGuard
transforms the following dangerous code:
int dangerous() {
char array[10];
scanf("%s",array); // possible buffer overrun!
}
int main () {
return dangerous();
}
See the CIL output for this
code fragment
8.5 Heapify
The module heapify.ml also contains a transformation that moves all
dangerous local arrays to the heap. This also prevents a number of buffer
overruns.
For a concrete example, you can see how cilly --doheapify
transforms the following dangerous code:
int dangerous() {
char array[10];
scanf("%s",array); // possible buffer overrun!
}
int main () {
return dangerous();
}
See the CIL output for this
code fragment
8.6 One Return
The module oneret.ml contains a transformation the ensures that all
function bodies have at most one return statement. This simplifies a number
of analyses by providing a canonical exit-point.
For a concrete example, you can see how cilly --dooneRet
transforms the following code:
int foo (int predicate) {
if (predicate <= 0) {
return 1;
} else {
if (predicate > 5)
return 2;
return 3;
}
}
See the CIL output for this
code fragment
8.7 Control-Flow Graphs
CIL can reduce high-level C control-flow constructs like switch and
continue to lower-level gotos. This completely eliminates some
possible classes of statements from the program and may make the result
easier to analyze (e.g., it simplifies data-flow analysis).
For a concrete example, you can see how cilly --domakeCFG
transforms the following code (note the fall-through in case 1):
int foo (int predicate) {
int x = 0;
switch (predicate) {
case 0: return 111;
case 1: x = x + 1;
case 2: return (x+3);
case 3: break;
default: return 222;
}
return 333;
}
See the CIL output for this
code fragment
8.8 Partial Evaluation and Constant Folding
The partial.ml module provides a simple interprocedural partial
evaluation and constant folding data-flow analysis and transformation. This
transformation requires the --domakeCFG option.
For a concrete example, you can see how cilly --domakeCFG --dopartial
transforms the following code (note the eliminated if branch and the
partial optimization of foo):
int foo(int x, int y) {
int unknown;
if (unknown)
return y+2;
return x+3;
}
int main () {
int a,b,c;
a = foo(5,7) + foo(6,7);
b = 4;
c = b * b;
if (b > c)
return b-c;
else
return b+c;
}
See the CIL output for this
code fragment
8.9 Reaching Definitions
The reachingdefs.ml module uses the dataflow framework and CFG
information to calculate the definitions that reach each
statement. After computing the CFG and calling computeRDs on a
function declaration, ReachingDef.stmtStartData will contain a mapping
form statement IDs to data about which definitions reach each
statement. In particular, it is a mapping from statement IDs to a
triple the first two members of which are used internally. The third
member is a mapping from variable IDs to Sets of integer options. If
the set contains Some(i), then the definition of that variable with ID
i reaches that statement. If the set contains None, then there is a
path to that statement on which there is no definition of that variable.
Also, if the variable ID is unmapped at a statement, then no definition
of that variable reaches that statement.
To summarize, reachingdefs.ml has the following interface:
-
computeRDs – Computes reaching definitions.
- ReachingDef.stmtStartData – contains reaching
definition data after computeRDs is called.
- ReachingDef.defIdStmtHash – Contains a mapping
from definition IDs to the ID of the statement in which
the definition occurs.
- getRDs – Takes a statement ID and returns
reaching definition data for that statement.
- instrRDs – Takes a list of instructions and the
definitions that reach the first instruction, and for
each instruction calculates the definitions that reach
either into or out of that instruction.
- rdVisitorClass – A subclass of nopCilVisitor that
can be extended such that the current reaching definition
data is available when expressions are visited.
8.10 Simple Memory Operations
The simplemem.ml module allows CIL lvalues that contain memory accesses
to be even futher simplified via the introduction of well-typed
temporaries. After this transformation all lvalues involve at
most one memory reference.
For a concrete example, you can see how cilly --dosimpleMem
transforms the following code:
int main () {
int ***three;
int **two;
***three = **two;
}
See the CIL output for this
code fragment
8.11 Simple Three-Address Code
The simplify.ml module further reduces the complexity of program
expressions and gives you a form of three-address code. After this
transformation all expressions will adhere to the following grammar:
basic::=
Const _
Addrof(Var v, NoOffset)
StartOf(Var v, NoOffset)
Lval(Var v, off), where v is a variable whose address is not taken
and off contains only "basic"
exp::=
basic
Lval(Mem basic, NoOffset)
BinOp(bop, basic, basic)
UnOp(uop, basic)
CastE(t, basic)
lval ::=
Mem basic, NoOffset
Var v, off, where v is a variable whose address is not taken and off
contains only "basic"
In addition, all sizeof and alignof forms are turned into
constants. Accesses to arrays and variables whose address is taken are
turned into "Mem" accesses. All field and index computations are turned
into address arithmetic.
For a concrete example, you can see how cilly --dosimplify
transforms the following code:
int main() {
struct mystruct {
int a;
int b;
} m;
int local;
int arr[3];
int *ptr;
ptr = &local;
m.a = local + sizeof(m) + arr[2];
return m.a;
}
See the CIL output for this
code fragment
8.12 Converting C to C++
The module canonicalize.ml performs several transformations to correct
differences between C and C++, so that the output is (hopefully) valid
C++ code. This may be incomplete — certain fixes which are necessary
for some programs are not yet implemented.
Using the --doCanonicalize option with CIL will perform the
following changes to your program:
-
Any variables that use C++ keywords as identifiers are renamed.
- C allows global variables to have multiple declarations and
multiple (equivalent) definitions. This transformation removes
all but one declaration and all but one definition.
- __inline is #defined to inline, and __restrict
is #defined to nothing.
- C allows function pointers with no specified arguments to be used on
any argument list. To make C++ accept this code, we insert a cast
from the function pointer to a type that matches the arguments. Of
course, this does nothing to guarantee that the pointer actually has
that type.
- Makes casts from int to enum types explicit. (CIL changes enum
constants to int constants, but doesn't use a cast.)