Detecting and Measuring Similarity in Code Clones
Randy Smith and Susan Horwitz
Most previous work on code-clone detection has focused on finding
identical clones, or clones that are identical up to identifiers and
literal values. However, it is often important to find similar clones,
too. One challenge is that the definition of similarity depends on the
context in which clones are being found. Therefore, we propose new
techniques for finding similar code blocks and for quantifying their
similarity. Our techniques can be used to find clone clusters, sets of
code blocks all within a user-supplied similarity threshold of each
other. Also, given one code block, we can find all similar blocks and
present them rank-ordered by similarity. Our techniques have been used
in a clone-detection tool for C programs. The ideas could also be
incorporated in many existing clone-detection tools to provide more
flexibility in their definitions of similar clones.