Wednesday, June 21, 2006 

Re$earch Tool$

I've always been looking for interesting principles, techniques and tools, born inside the academy and with a good chance of working in the real world.
To be honest, there aren't many. Many ideas are interesting on paper, but hardly useful in the real world. However, over the years I've brought together a reasonable set of solid Software Engineering concepts that have proven useful within tough projects.
In many cases, some kind of quantitative notion is involved. It could be the estimation of the residual number of yet-to-find (not to fix!) bugs, or a LOC Vs. Cyclomatic Complexity chart for risk analysis, and so on.
Of course, when you deal with quantitative data, tools are useful and sometimes indispensable. We need tools to extract, manipulate and present data. Sometimes, combining simple utilities does the job; sometimes, you need some custom tool (like my BetterEstimate for probabilistic estimation).
In a recent issue of IEEE Transactions on Software Engineering (March 2006), two interesting tools have been presented. One (called Bunch) deals with automatic modularization. It is conceptually simple, quite flexible, and I would say it has a chance of working in some real world scenarios (I'll be following some experiments with it). One (called CP-Miner) deals with code clones (or "copy & paste" code), and is quite advanced, as it can track post-paste modifications, correlate them, and find potential bugs. I'm already using code-clones analysis with simpler tools, and would appreciate using a more powerful one.
There is, however, a large difference between Bunch and CP-Miner. Bunch is freely available (I don't think it's open source, but I really don't care). You can download the tool, use it, develop plug-in for it. The tool is also alive and kicking (which is somewhat surprising, because most academic tools die in a few months, usually before the editorial time of any article about them). CP-Miner, however, is not available. If you look in the authors' sites, it seems like it has been folded in some larger project and then either discarded or kept as a secret weapon.
Unfortunately, this is more the rule than the exception. The paper on CP-Miner cites CCFinder as a precursor (from other authors). I remember looking for that tool too, only to find out that it was given away only for academic purposes, with a great deal of control.
The same kind of information hiding :-)) is applied all too frequently inside academia. I remember asking for some raw data behind a metric validation, and getting a blank stare. Or downloading some tools, distributed as source code that couldn't even compile, and being told it had been abandoned just after publication.
In many cases, you get the impression that authors would love some funding for their re$earch, but being too shy for asking, they keep their toys for themselves, with the $ecret hope that $ome rich company will $hell out $ome hefty money just to have a look. Now, I understand the need for research funds. What those nice guys inside universities don't seem to understand is that a nice article is not enough to attract real money. Placing artificial barriers between products (whatever the product!) and buyers doesn't look like a genius strategy. But then, that looks too much like business, doesn't it :-).
Anyway, universities are not alone in the dying tool fest. A few days ago, I was looking for a tool (Acacia) to create a dependency graph for Bunch. Theoretically, it was an AT&T Research tool. In practice, it seems like it has disappeared too.
I should propose an amendment to the IEEE editing process, whereby the authors of any paper discussing a tool or a metric must submit a working copy of the tool and/or a copy of the raw data, freely available for readers. Guess what, I don't think it would ever pass :-).
Come al solito seguirti non è mai tempo perso :-) Mi ha decisamente incuriosito il CP-Miner, ma purtroppo come segnali non è scaricabile (credo lo sia stato in passato). Conosci altri tool che facciano qualcosa di simile e che sono testabili?
Buon fine settimana.
Ho usato con ragionevole soddisfazione Simian:

non si puo' realmente paragonare a CP-Miner ma almeno si puo' usare :-) e da' comunque risultati interessanti nei progetti medio-grandi, soprattutto quelli con qualche anno di vita (ma, come un mio cliente ha scoperto di recente, anche il codice "nuovo" riserva interessanti sorprese :-)).
This post has been removed by a blog administrator.
This post has been removed by a blog administrator.
This post has been removed by a blog administrator.
This post has been removed by a blog administrator.
Post a Comment

<< Home