Wednesday, June 25, 2008
More on Code Clones
I've been talking about code clones before. It's a simple metric that I've used in several projects with encouraging results.
Till no long ago, however, I thought code clones detection was useful mostly to:
1) Assess and monitor an interesting quality aspect of a product
This requires that we constantly monitor code clones. If some code already exists, we can create a baseline and enforce a rule that things can only get better, not worse. I usually monitor several internal quality attributes at build time, because that's a fairly flexible moment, where most tools allow to insert some custom steps.
2) Identify candidates for refactoring, mostly in large, pre-existing projects.
This requires, of course, a certain willingness to act on your knowledge, that is, to actually go ahead and refactor duplicated code.
Sometimes, when the codebase is large, resources are scarce, or the company interest in software quality is mostly a marketing statement disconnected from reality, a commitment to refactor the code is never taken, or never taken seriously, which is about the same.
Here comes the third use of code clones. It is quite obvious, and I should have considered it earlier, but for some reason I didn't. I guess I was somehow blinded by the idea that if you care about quality, you must get in there and refactor the damn code. Strong beliefs are always detrimental to creativity :-).
Now: clones are bad because (in most cases) you have to keep them in synch during maintenance. If you don't, something bad is gonna happen (and yes, if you do, you waste a lot of time anyway, so you could as well refactor; but this is that strong belief rearing its head again :-).
So, if you don't want to use a code clones list to start a refactoring campaign, what else can you do? Use it to make sure you didn't forget to update a clone!
Unfortunately, with the tools I know, a large part of this process can't be easily automated. You would have to run a clone detection tool and keep the log somewhere. Then, whenever you change some portion of code, you'll have to check if that portion is cloned elsewhere (from the log). You then port your change in the other clones (and test everything). The clones list must be periodically updated, also to account for changes coming from different programmers.
Better tools can be easily conceived. Ideally, this could be integrated in your IDE: as I suggested in Listen to Your Tools and Materials, editors could provide unobtrusive backtalk, highlighting the fact that you're changing a portion of code that has been cloned elsewhere. From there, you could jump into the other files, or ask the editor to apply the same change automatically. In the end, that would make clones more tolerable; while this is arguably bad, it's still much better than leave them out of synch.
From that perspective, I would say that another interesting place in our toolchain where we would benefit from an open, customizable process is the version control system. Ideally, we may want to verify and enforce rules right at check-in time, without the need to delay checks until build time. Open source tools are an obvious opportunity to create a better breed of version control systems, which so far (leaving a few religious issues aside) have been more or less leveled in term of available features.
Note: I've been writing this post on a EEE PC (the Linux version), and I kinda like it. Although I'm not really into tech toys, and although the EEE looks and feels :-) like a toy, it's just great to carry around while traveling. The tiny keyboard is a little awkward to use, but I'll get used to it...
Till no long ago, however, I thought code clones detection was useful mostly to:
1) Assess and monitor an interesting quality aspect of a product
This requires that we constantly monitor code clones. If some code already exists, we can create a baseline and enforce a rule that things can only get better, not worse. I usually monitor several internal quality attributes at build time, because that's a fairly flexible moment, where most tools allow to insert some custom steps.
2) Identify candidates for refactoring, mostly in large, pre-existing projects.
This requires, of course, a certain willingness to act on your knowledge, that is, to actually go ahead and refactor duplicated code.
Sometimes, when the codebase is large, resources are scarce, or the company interest in software quality is mostly a marketing statement disconnected from reality, a commitment to refactor the code is never taken, or never taken seriously, which is about the same.
Here comes the third use of code clones. It is quite obvious, and I should have considered it earlier, but for some reason I didn't. I guess I was somehow blinded by the idea that if you care about quality, you must get in there and refactor the damn code. Strong beliefs are always detrimental to creativity :-).
Now: clones are bad because (in most cases) you have to keep them in synch during maintenance. If you don't, something bad is gonna happen (and yes, if you do, you waste a lot of time anyway, so you could as well refactor; but this is that strong belief rearing its head again :-).
So, if you don't want to use a code clones list to start a refactoring campaign, what else can you do? Use it to make sure you didn't forget to update a clone!
Unfortunately, with the tools I know, a large part of this process can't be easily automated. You would have to run a clone detection tool and keep the log somewhere. Then, whenever you change some portion of code, you'll have to check if that portion is cloned elsewhere (from the log). You then port your change in the other clones (and test everything). The clones list must be periodically updated, also to account for changes coming from different programmers.
Better tools can be easily conceived. Ideally, this could be integrated in your IDE: as I suggested in Listen to Your Tools and Materials, editors could provide unobtrusive backtalk, highlighting the fact that you're changing a portion of code that has been cloned elsewhere. From there, you could jump into the other files, or ask the editor to apply the same change automatically. In the end, that would make clones more tolerable; while this is arguably bad, it's still much better than leave them out of synch.
From that perspective, I would say that another interesting place in our toolchain where we would benefit from an open, customizable process is the version control system. Ideally, we may want to verify and enforce rules right at check-in time, without the need to delay checks until build time. Open source tools are an obvious opportunity to create a better breed of version control systems, which so far (leaving a few religious issues aside) have been more or less leveled in term of available features.
Note: I've been writing this post on a EEE PC (the Linux version), and I kinda like it. Although I'm not really into tech toys, and although the EEE looks and feels :-) like a toy, it's just great to carry around while traveling. The tiny keyboard is a little awkward to use, but I'll get used to it...
Labels: article reference, metrics, profession, tools
Wednesday, June 11, 2008
Elsewhere
Once again, I can't find much time for blogging: nothing serious, just too much work (and a little fun too :-).
The bright side is that I'll soon finish a paper on function points, and that the paper will come with a free tool. I'll probably submit the paper to some journal, but the draft version will be available online from day 1. And yeah, I know, function points sound like a dull, overly explored subject, but remember the shirt folding idea: there is always something new to say!
The bright side is that I'll soon finish a paper on function points, and that the paper will come with a free tool. I'll probably submit the paper to some journal, but the draft version will be available online from day 1. And yeah, I know, function points sound like a dull, overly explored subject, but remember the shirt folding idea: there is always something new to say!



