Previous Up Next

8  Library of CIL Modules

We are developing a suite of modules that use CIL for program analyses and transformations that we have found useful. You can use these modules directly on your code, or generally as inspiration for writing similar modules. A particularly big and complex application written on top of CIL is CCured (../ccured/index.html).

8.1  Data flow analysis framework

The module dataflow.ml contains a parameterized framework for forward and backward data flow analyses. You provide the transfer functions and this module does the analysis.

8.2  Dominators

The module dominators.ml contains the computation of immediate dominators. It uses the dataflow.ml module.

8.3  Points-to Analysis

The module ptranal.ml contains two interprocedural points-to analyses for CIL: Olf and Golf. Olf is the default. (Switching from olf.ml to golf.ml requires a change in Ptranal and a recompiling cilly.)

The analyses have the following characteristics: The analysis itself is factored into two components: Ptranal, which walks over the CIL file and generates constraints, and Olf or Golf, which solve the constraints. The analysis is invoked with the function Ptranal.analyze_file: Cil.file -> unit. This function builds the points-to graph for the CIL file and stores it internally. There is currently no facility for clearing internal state, so Ptranal.analyze_file should only be called once.

The constructed points-to graph supports several kinds of queries, including alias queries (may two expressions be aliased?) and points-to queries (to what set of locations may an expression point?).

The main interface with the alias analysis is as follows: The precision of the analysis can be customized by changing the values of several flags: In practice, the best precision/efficiency tradeoff is achieved by setting Ptranal.no_sub to false, Ptranal.analyze_mono to true, and Ptranal.smart_aliases to false. These are the default values of the flags.

There are also a few flags that can be used to inspect or serialize the results of the analysis.

8.4  StackGuard

The module heapify.ml contains a transformation similar to the one described in “StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks”, Proceedings of the 7th USENIX Security Conference. In essence it modifies the program to maintain a separate stack for return addresses. Even if a buffer overrun attack occurs the actual correct return address will be taken from the special stack.

Although it does work, this CIL module is provided mainly as an example of how to perform a simple source-to-source program analysis and transformation. As an optimization only functions that contain a dangerous local array make use of the special return address stack.

For a concrete example, you can see how cilly --dostackGuard transforms the following dangerous code:

  int dangerous() {
    char array[10];
    scanf("%s",array); // possible buffer overrun!
  }

  int main () {
    return dangerous();
  }
See the CIL output for this code fragment

8.5  Heapify

The module heapify.ml also contains a transformation that moves all dangerous local arrays to the heap. This also prevents a number of buffer overruns.

For a concrete example, you can see how cilly --doheapify transforms the following dangerous code:

  int dangerous() {
    char array[10];
    scanf("%s",array); // possible buffer overrun!
  }

  int main () {
    return dangerous();
  }
See the CIL output for this code fragment

8.6  One Return

The module oneret.ml contains a transformation the ensures that all function bodies have at most one return statement. This simplifies a number of analyses by providing a canonical exit-point.

For a concrete example, you can see how cilly --dooneRet transforms the following code:

  int foo (int predicate) {
    if (predicate <= 0) {
      return 1;
    } else {
      if (predicate > 5)
        return 2;
      return 3;
    }
  }
See the CIL output for this code fragment

8.7  Control-Flow Graphs

CIL can reduce high-level C control-flow constructs like switch and continue to lower-level gotos. This completely eliminates some possible classes of statements from the program and may make the result easier to analyze (e.g., it simplifies data-flow analysis).

For a concrete example, you can see how cilly --domakeCFG transforms the following code (note the fall-through in case 1):

  int foo (int predicate) {
    int x = 0;
    switch (predicate) {
      case 0: return 111;
      case 1: x = x + 1;
      case 2: return (x+3);
      case 3: break;
      default: return 222;
    }
    return 333;
  }
See the CIL output for this code fragment

8.8  Partial Evaluation and Constant Folding

The partial.ml module provides a simple interprocedural partial evaluation and constant folding data-flow analysis and transformation. This transformation requires the --domakeCFG option.

For a concrete example, you can see how cilly --domakeCFG --dopartial transforms the following code (note the eliminated if branch and the partial optimization of foo):

  int foo(int x, int y) {
    int unknown;
    if (unknown)
      return y+2;     
    return x+3;      
  }

  int main () {
    int a,b,c;
    a = foo(5,7) + foo(6,7);
    b = 4;
    c = b * b;      
    if (b > c)     
      return b-c;
    else
      return b+c; 
  }
See the CIL output for this code fragment

8.9  Reaching Definitions

The reachingdefs.ml module uses the dataflow framework and CFG information to calculate the definitions that reach each statement. After computing the CFG and calling computeRDs on a function declaration, ReachingDef.stmtStartData will contain a mapping form statement IDs to data about which definitions reach each statement. In particular, it is a mapping from statement IDs to a triple the first two members of which are used internally. The third member is a mapping from variable IDs to Sets of integer options. If the set contains Some(i), then the definition of that variable with ID i reaches that statement. If the set contains None, then there is a path to that statement on which there is no definition of that variable. Also, if the variable ID is unmapped at a statement, then no definition of that variable reaches that statement.

To summarize, reachingdefs.ml has the following interface:

8.10  Simple Memory Operations

The simplemem.ml module allows CIL lvalues that contain memory accesses to be even futher simplified via the introduction of well-typed temporaries. After this transformation all lvalues involve at most one memory reference.

For a concrete example, you can see how cilly --dosimpleMem transforms the following code:

  int main () {
    int ***three;
    int **two;
    ***three = **two; 
  } 
See the CIL output for this code fragment

8.11  Simple Three-Address Code

The simplify.ml module further reduces the complexity of program expressions and gives you a form of three-address code. After this transformation all expressions will adhere to the following grammar:
 basic::=
    Const _ 
    Addrof(Var v, NoOffset)
    StartOf(Var v, NoOffset)
    Lval(Var v, off), where v is a variable whose address is not taken
                      and off contains only "basic"

 exp::=
    basic
    Lval(Mem basic, NoOffset)
    BinOp(bop, basic, basic)
    UnOp(uop, basic)
    CastE(t, basic)
   
 lval ::= 
    Mem basic, NoOffset
    Var v, off, where v is a variable whose address is not taken and off
                contains only "basic"
In addition, all sizeof and alignof forms are turned into constants. Accesses to arrays and variables whose address is taken are turned into "Mem" accesses. All field and index computations are turned into address arithmetic.

For a concrete example, you can see how cilly --dosimplify transforms the following code:

  int main() {
    struct mystruct {
      int a;
      int b;
    } m;
    int local;
    int arr[3];
    int *ptr;

    ptr = &local;
    m.a = local + sizeof(m) + arr[2];
    return m.a; 
  } 
See the CIL output for this code fragment

8.12  Converting C to C++

The module canonicalize.ml performs several transformations to correct differences between C and C++, so that the output is (hopefully) valid C++ code. This may be incomplete — certain fixes which are necessary for some programs are not yet implemented.

Using the --doCanonicalize option with CIL will perform the following changes to your program:
  1. Any variables that use C++ keywords as identifiers are renamed.
  2. C allows global variables to have multiple declarations and multiple (equivalent) definitions. This transformation removes all but one declaration and all but one definition.
  3. __inline is #defined to inline, and __restrict is #defined to nothing.
  4. C allows function pointers with no specified arguments to be used on any argument list. To make C++ accept this code, we insert a cast from the function pointer to a type that matches the arguments. Of course, this does nothing to guarantee that the pointer actually has that type.
  5. Makes casts from int to enum types explicit. (CIL changes enum constants to int constants, but doesn't use a cast.)

Previous Up Next