Coping with Type Casts in C

Michael Siff, Satish Chandra, Thomas Ball, Krishna Kunchithapadam, and Thomas Reps

The use of type casts is pervasive in C. Although casts provide great flexibility in writing programs, their use obscures the meaning of programs, and can present obstacles during maintenance. Casts involving pointers to structures (C structs) are particularly problematic, because by using them, a programmer can interpret any memory region to be of any desired type, thereby compromising C's already weak type system.

This paper presents an approach for making sense of such casts, in terms of understanding their purpose and identifying fragile code. We base our approach on the observation that casts are often used to simulate object-oriented language features not supported directly in C. We first describe a variety of ways -- idioms -- in which this is done in C programs. We then develop a notion of physical subtyping, which provides a model that explains these idioms.

We have created tools that automatically analyze casts appearing in C programs. Experimental evidence collected by using these tools on a large amount of C code (over a million lines) shows that, of the casts involving struct types, most (over 90%) can be associated meaningfully -- and automatically -- with physical subtyping. Our results indicate that the idea of physical subtyping is useful in coping with casts and can lead to valuable software productivity tools.

(Click here to access the paper: PostScript, PDF.)