Review: Clone Detective

The Morning Brew #162 mentioned a tool Clone Detective which

is a Visual Studio integration that allows you to analyze C# projects for source code that is duplicated somewhere else. Having duplicates can easily lead to inconsistencies and often is an indicator for poorly factored code.

Today, a colleague and I spent some ILP time investigating the tool to determine whether it would help us in the Madgex environment or not.

We tried the tool out on a couple of projects, but due to it being a Visual Studio 2008 only plug-in our options were a bit limited. After installing the plug-in, the next thing we did was watch the video. This gave us a good overview of how it works, and where to find it (hiding under View -> Other Windows -> Clone Explorer). During the video it explained that effectively this tool breaks code down into a series of tokens, and then looks for other pieces of code which can also be broken down into the same series of tokens. The terminology took a little while to get a grasp of - there are clones and there are clone classes. A clone class is a series of tokens which is/may be repeated throughout the solution. A clone is an instance of this fragment. So, if there was a line of code which did something like a = b + c, the clone class would be [var] = [var] + [var], and the clones could be lines of code which are a = b + c or d = e + f (irrespective of whitespace or variable names etc).

It works across an entire solution, so we set it running and looked at the clones it found. Unfortunately, a lot of the clones that were detected for us were false positives - like properties which when tokenised are the same, but which can't really be re-factored. Additionally, some of the clone classes that it detected, are actually multiple presentations of the same code. For instance, one file I looked at had the following lines all marked as clones of different clone classes:

lines 20 - 43
lines 20 - 59
lines 24 - 43
lines 24 - 44
lines 37 - 59

which basically tells us that the area of code from lines 20 - 59 should be re-factored into (probably) one new piece of code which can then be re-analysed to find out if other re-factorings are also worthwhile.

The interface is easy to get to grips with, and is accessed either via its own panels (Clone Explorer, Clone Intersections and Clone Results) or by an indication in the code window with a context sensitive menu item Find Clones, or Show Clone Intersections. Clicking on an item in the explorer allows you to navigate into more detail and find the clone class or find all instances of it. In a big solution, with a lot of clones (false positives or otherwise) the right click to get clone class listing can take quite a while - in the order of 10 - 20 seconds - which can result in lots of re-right-clicking when one is impatient. This may indicate a problem with scalability.

The tool would have proved a lot more useful to us with a few extras:

Ability to ignore properties
Ability to list all clone classes at once, rather than having to right click on a file in the Clone Explorer and choose Show Clone Intersections. Then picking a file in there and right clicking to select Clone Class x -> Find All Occurrences
Ability to mark clone classes as "ignore" so it doesn't report on them again

So, our summary is, a nice idea but gives too many false positives and isn't refined enough at the moment to be of use within our environment on a regular basis. It may, however, still be useful as a one-off exercise (with patience) to find the key areas that should be addressed. In a less complex code base it might be worth another look. Additionally, if when working on some code we detect something that looks like its been copied and pasted, then it might well be worth running Clone Explorer to find other areas that could benefit from our refactoring.

This leaves us with the question, how else can we detect clones in our code base? Until we find something else, this is better than nothing. Does anyone have any suggestions of other tools which could address this area?