Wednesday, June 25, 2008 

More on Code Clones

I've been talking about code clones before. It's a simple metric that I've used in several projects with encouraging results.

Till no long ago, however, I thought code clones detection was useful mostly to:

1) Assess and monitor an interesting quality aspect of a product
This requires that we constantly monitor code clones. If some code already exists, we can create a baseline and enforce a rule that things can only get better, not worse. I usually monitor several internal quality attributes at build time, because that's a fairly flexible moment, where most tools allow to insert some custom steps.

2) Identify candidates for refactoring, mostly in large, pre-existing projects.
This requires, of course, a certain willingness to act on your knowledge, that is, to actually go ahead and refactor duplicated code.

Sometimes, when the codebase is large, resources are scarce, or the company interest in software quality is mostly a marketing statement disconnected from reality, a commitment to refactor the code is never taken, or never taken seriously, which is about the same.

Here comes the third use of code clones. It is quite obvious, and I should have considered it earlier, but for some reason I didn't. I guess I was somehow blinded by the idea that if you care about quality, you must get in there and refactor the damn code. Strong beliefs are always detrimental to creativity :-).

Now: clones are bad because (in most cases) you have to keep them in synch during maintenance. If you don't, something bad is gonna happen (and yes, if you do, you waste a lot of time anyway, so you could as well refactor; but this is that strong belief rearing its head again :-).
So, if you don't want to use a code clones list to start a refactoring campaign, what else can you do? Use it to make sure you didn't forget to update a clone!

Unfortunately, with the tools I know, a large part of this process can't be easily automated. You would have to run a clone detection tool and keep the log somewhere. Then, whenever you change some portion of code, you'll have to check if that portion is cloned elsewhere (from the log). You then port your change in the other clones (and test everything). The clones list must be periodically updated, also to account for changes coming from different programmers.

Better tools can be easily conceived. Ideally, this could be integrated in your IDE: as I suggested in Listen to Your Tools and Materials, editors could provide unobtrusive backtalk, highlighting the fact that you're changing a portion of code that has been cloned elsewhere. From there, you could jump into the other files, or ask the editor to apply the same change automatically. In the end, that would make clones more tolerable; while this is arguably bad, it's still much better than leave them out of synch.

From that perspective, I would say that another interesting place in our toolchain where we would benefit from an open, customizable process is the version control system. Ideally, we may want to verify and enforce rules right at check-in time, without the need to delay checks until build time. Open source tools are an obvious opportunity to create a better breed of version control systems, which so far (leaving a few religious issues aside) have been more or less leveled in term of available features.

Note: I've been writing this post on a EEE PC (the Linux version), and I kinda like it. Although I'm not really into tech toys, and although the EEE looks and feels :-) like a toy, it's just great to carry around while traveling. The tiny keyboard is a little awkward to use, but I'll get used to it...

Labels: , , ,

Hi Carlo,
I absolutely agree with you when you say that code clones are an indicator of poor quality code. But since poor quality often doesn’t have a lower bound, code clones are usually just one of the many design lacks a real world project has. With large projects involving a lot of programmers, sometimes even with a fast turn over – a scenario where each single programmer doesn’t have a precise view of the entire project, and analysts are not able to supervise each portion of code - a single programmer could misuse some side effect of some portion of code to perform the job; in this cases updating the clones could lead to unexpected errors! I personally think that refactoring is the only way out of these error prone situations :)
as I said, I tend to agree that refactoring is a better alternative. Mid-term, it's also cheaper.

The scenario you describe is problematic, and unfortunately all too common.
It's mostly a scenario of low accountability. Nobody as a clear responsibility for product quality. Nobody has the authority and time to enforce anything. The average programmer in those projects has little motivation to pursue quality. It's more like a temporary assignment until he can move to something better.
However, I'm afraid that's exactly where refactoring is not going to take place either. The only sensible refactoring would be at the organizational level, top-down :-).

Overall, I believe in having more options. Raising a red flag through code clones analysis, so that when you update the business logic somewhere you immediately know that (most likely) something has to be changed elsewhere too, is just one more option. Of course, from that point on, we have to take action.

Action requires refactoring, or porting the change yourself , or communicating with another team or colleague, or talking with the boss, or whatever else. Action is always context-dependent. That's why I like having more options :-).
Post a Comment

<< Home