Tuesday, July 22, 2008

 

SmartFP™ paper (and tool) online

As promised, I've uploaded a free, simple tool to calculate Function Points using a decision tree. I've also uploaded a (draft) paper describing the overall approach. The paper is still missing a case study, which would help, but I just wanted to put the whole thing online. I'll add the case study, and a few more details, before submitting the paper for publication.

The decision tree approach is quite simple, especially if you have some knowledge of function points. Although it may seem like a small change in perspective from the usual "counting" approach, the result is that we can save a lot of time doing a function point estimate, and in many cases we also get more robust results.

Experiences and feedback are welcome, as usual. You can find the whole thing on the SmartFP page.

Note: as I plan to make more tools and libraries freely available, I've also created a new "tools" page. So far, there is only a link to BetterEstimate and SmartFP, but more will come...

Labels: , ,


Wednesday, June 25, 2008

 

More on Code Clones

I've been talking about code clones before. It's a simple metric that I've used in several projects with encouraging results.

Till no long ago, however, I thought code clones detection was useful mostly to:

1) Assess and monitor an interesting quality aspect of a product
This requires that we constantly monitor code clones. If some code already exists, we can create a baseline and enforce a rule that things can only get better, not worse. I usually monitor several internal quality attributes at build time, because that's a fairly flexible moment, where most tools allow to insert some custom steps.

2) Identify candidates for refactoring, mostly in large, pre-existing projects.
This requires, of course, a certain willingness to act on your knowledge, that is, to actually go ahead and refactor duplicated code.

Sometimes, when the codebase is large, resources are scarce, or the company interest in software quality is mostly a marketing statement disconnected from reality, a commitment to refactor the code is never taken, or never taken seriously, which is about the same.

Here comes the third use of code clones. It is quite obvious, and I should have considered it earlier, but for some reason I didn't. I guess I was somehow blinded by the idea that if you care about quality, you must get in there and refactor the damn code. Strong beliefs are always detrimental to creativity :-).

Now: clones are bad because (in most cases) you have to keep them in synch during maintenance. If you don't, something bad is gonna happen (and yes, if you do, you waste a lot of time anyway, so you could as well refactor; but this is that strong belief rearing its head again :-).
So, if you don't want to use a code clones list to start a refactoring campaign, what else can you do? Use it to make sure you didn't forget to update a clone!

Unfortunately, with the tools I know, a large part of this process can't be easily automated. You would have to run a clone detection tool and keep the log somewhere. Then, whenever you change some portion of code, you'll have to check if that portion is cloned elsewhere (from the log). You then port your change in the other clones (and test everything). The clones list must be periodically updated, also to account for changes coming from different programmers.

Better tools can be easily conceived. Ideally, this could be integrated in your IDE: as I suggested in Listen to Your Tools and Materials, editors could provide unobtrusive backtalk, highlighting the fact that you're changing a portion of code that has been cloned elsewhere. From there, you could jump into the other files, or ask the editor to apply the same change automatically. In the end, that would make clones more tolerable; while this is arguably bad, it's still much better than leave them out of synch.

From that perspective, I would say that another interesting place in our toolchain where we would benefit from an open, customizable process is the version control system. Ideally, we may want to verify and enforce rules right at check-in time, without the need to delay checks until build time. Open source tools are an obvious opportunity to create a better breed of version control systems, which so far (leaving a few religious issues aside) have been more or less leveled in term of available features.

Note: I've been writing this post on a EEE PC (the Linux version), and I kinda like it. Although I'm not really into tech toys, and although the EEE looks and feels :-) like a toy, it's just great to carry around while traveling. The tiny keyboard is a little awkward to use, but I'll get used to it...

Labels: , , ,


Tuesday, May 13, 2008

 

Natural language

Some (most :-) of my clients are challenging. Sometimes the challenge comes from the difficult technical problems they face. That's the best kind of challenge.
Sometimes the challenge comes from people: that's the worst kind of challenge, and one that right now is better left alone.
Sometimes the challenge comes from the organization, which means it also comes from people, but with a different twist. Challenges coming from the organization are always tough, but overcoming those challenges can really make a difference.

One of my challenging clients is a rather large company in the financial domain. They are definitely old-school, and although upper management can perfectly see how software is permeating and enabling their business, middle management tend to see software as a liability. In their eternal search for lower costs, they moved most of the development offshore, keeping only an handful of designers and all the analysts in-house. Most often, design is done offshore as well, for lack of available designers on this side of the world.

Analysts have a tough job there. On one side, they have to face the rest of the company, which is not software-friendly. On the other side, they have to communicate clear requirements to the offshore team, especially to the designers, who tend to be very technology-oriented.
To make things more complicated, the analysts often find themselves working on unfamiliar sub-domains, with precise regulations but also with large gray areas that must be somehow understood and communicated.
Icing on the cake: some of those financial instruments do not even exist in the local culture of the offshore team, making communication as difficult as ever.

Given this overall picture, I've often recommended analysts to spend some time creating a good domain model (usually, a UML class diagram, occasionally complemented by some activity diagrams).
The model, with unambiguous associations, dependencies, multiplicities, and so on, will force them to ask the right questions, and will make it easier for the offshore designer to acquaint himself with the problem. Over time, this suggestion has been quite helpful.
However, as I said, the organization is challenging. Some of the analysts complained that their boss is not satisfied by a few diagrams. He wants a lengthy, wordy explanation, so he can read it over and see if they got it right (well, that's his theory anyway). The poor analyst can't possibly do everything in the allotted time.

Now, I always keep an eye on software engineering research. I've seen countless attempts to create UML diagrams from natural language specifications. The results are usually unimpressive.
In this case, however, I would need exactly the opposite: a tool to generate a precise, yet verbose domain description out of a formal domain model. The problem is much easier to solve, especially because analysts can help the tool, by using the appropriate wording.

Guess what, the problem must be considered unworthy, because there is a dearth of works in that area. In practice, the only relevant paper I've been able to find is Generating Natural Language specifications from UML class diagrams by Farid Meziane, Nikos Athanasakis and Sophia Ananiadou. There is also Nikos' thesis online, with a few more details.
The downside is that (as usual) the tool they describe does not seem to be generally available. I've yet to contact the authors: I just hope it doesn't turn out to be one of those Re$earch Tool$ that never get to be used.

From the paper above, I've also learnt about ModelExplainer , a similar tool from a commercial company. Again, the tool doesn't seem to be generally available, but I'll get in touch with people there and see.

Overall, the problem doesn't seem so hard, especially if we accept the idea that the analyst will help the tool, choosing appropriate wording. An XMI-to-NL (Natural Language) would make for a perfect open source project. Any takers? :-)

Labels: , , , ,


Wednesday, March 19, 2008

 

(Simple) Metrics

I've been using metrics for a long time (certainly more than 10 years now). I've been using metrics to control project quality (including my own stuff, of course), to define acceptance criteria for outsourced code, to understand the way people work, to "smell" large projects before attempting a refactoring activity, to help making an informed refactor / rewrite decision, to pinpoint functions or classes in need of a careful review, to estimate residual bugs, an so on.

Of course, I use different metrics for different purposes. I also combine metrics to get the right picture. In fact, you can now find several tools to calculate (e.g.) code metrics. You can also find many papers discussing (often with contradictory results) the correlation between any given metric and (e.g.) bug density. In most cases, those papers are misguided, as they look for correlation between a single metric and the target (like bug density). Reality is not that simple; it can be simplified, but not to that point.

Consider good old cyclomatic complexity. You can use it as-is, and it can be useful to calculate the minimum reasonable number of test cases you need for a single function. It's also known that functions with higher cyclomatic complexity tend to have more bugs. But it's also well known that (on average) there is a strong, positive correlation between cyclomatic complexity (CC) and lines of code (LOC). That's really natural: long functions tend to have a complex control flow. Many people have therefore discounted CC, as you can just look at the highly correlated (and easier to calculate) LOC. Simple reasoning, except it's wrong :-).

The problem with that, again, is trying to use just one number to understand something that's too complex to be represented by a single number. A better way is to get both CC and LOC for any function (or method) and then use quadrants.

Here is a real-world example, albeit from a very small program: a smart client invoking a few web services and dealing with some large XML files on the client side. It has been written in C# using Visual Studio, therefore some methods are generated by the IDE. Also, the XML parser is generated from the corresponding XSD. Since I'm concerned with code which is under the programmer's control, I've excluded all the generated files, resulting in about 20 classes. For each method, I gathered the LOC and CC count (more on "how" later). I used Excel to get the following picture:


As you can see, every method is just a dot in the chart, and the chart has been split in 4 quadrants. I'll discuss the thresholds later, as it's more important to understand the meaning of each quadrant first.

The lower-left quadrant is home for low-LOC, low-CC methods. These are the best methods around: short and simple. Most code ought to be there (as it is in this case).

Moving clockwise, the next you get (top-left) is for high LOC, low CC methods. Although most coding standards tend to somehow restrict the maximum length of any given method, it's pretty obvious that a long method with a small CC is not that bad. It's "linear" code, likely doing some initialization / configuration. No big deal.

The next quadrant (top-right) is for high LOC, high CC methods. Although this might seem the worst quadrant, it is not. High LOC means an opportunity for simple refactoring (extract method, create class, stuff like that). The code would benefit from changes, but those changes may require relatively little effort (especially if you can use refactoring tools).

The lower-right quadrant is the worst: short functions with high CC (there are none in this case). These are the puzzling functions which can pack a lot of alternative paths into just a few lines. In most cases, it's better to leave them alone (if working) or rewrite them from scratch (if broken). When outsourcing, I usually ask that no code falls in this quadrant.

For the project at hand, 3 classes were in quadrant 3, so candidate for refactoring. I took a look, and guess what, it was pretty obvious that those methods where dealing with business concerns inside the GUI. There were clearly 3 domain classes crying to be born (1 shared by the three methods, 1 shared by 2, one used by the remaining). Doing so brought to better code, with little effort. This is a rather ordinary experience: quadrants pinpoint problematic code, then it's up to the programmer/designer to find the best way to fix it (or decide to leave it as it is).

A few words on the thresholds: 10 is a rather generous, but somewhat commonly accepted threshold for CC. The threshold for LOC depends heavily on the overall project quality. I've been accepting a threshold of 100 in quality-challenged projects. As the quality improves (through refactoring / rewriting) we usually lower the threshold. Being a new development, I adopted 20 LOC as a rather reasonable threshold.

As I said, I use several different metrics. Some can be used in isolation (like code clones), but in most cases I combine them (for instance, code clones vs. code stability gives a better picture of the problem). Coupling and cohesion should also be considered as pairs, never as single numbers, and so on.
Quadrants are not necessarily the only tool: sometimes I also look at the distribution function of a single metric. This is way superior to what too many people tend to do (like looking at the "average CC", which is meaningless). As usual, a tool is useless if we can't use it effectively.

Speaking of tools, the project above was in C#, so I used Source Monitor, a good free tool working directly on C# sources. Many .NET tools work on the MSIL instead, and while that may seem like a good idea, in practice it doesn't help much when you want a meaningful LOC count :-).

Source Monitor can export in CSV and XML. Unfortunately, the CSV didn't contain the detailed data I wanted, so I had to use the XML. I wrote a short XSLT file to extract the data I needed in CSV format (I suggest you use the "save as" feature, as unwanted spacing / carriage returns added by browsers may cripple the result). Use it freely: I didn't put a license statement inside, but all [my] source code in this blog can be considered under the BSD license unless otherwise stated.

Labels: , , ,


Sunday, February 03, 2008

 

A few free tools

A few free tools I've found myself using and/or recommending in the last few days:

Refactor!™ for C++
A refactoring plug-in for Visual Studio. Even if you only use Extract Method, you'll like it. Downside: your IDE may get a bit sluggish.

CScout
A different refactoring/code mining tool. If you work on a large project, you'll love the ability to discover "redundant" #include in your header files. Disclaimer: I haven't tried that feature on really sick include files :-). If you do, let me know...

StarUML
(thanks to Fulvio Esposito who told me about it)
A Rational Rose lookalike in Delphi (open source). One order of magnitude faster than the average java behemoth. Limited to UML 1.4, but easy to use, reasonably stable, does a reasonable job at importing Rose files, and the "auto layout" feature is better than average.

VirtualBox
(thanks to Roberto Rossi who told me about it)
An open-source virtual machine for Windows / Linux. Reasonably stable, faster than average. I wish I had more time to learn about the internals and the architecture!

As usual, guys, this is not an endorsement or whatever else. Try that stuff at your own risk :-).

Labels: ,


Sunday, December 23, 2007

 

Down to Earth

My recent posts have been leaning on the conceptual side, and I don't want to be mistaken for an Architecture Astronaut, so here is some down-to-earth experience from yesterday.

I've developed a few application for mobile phones over the years, mostly as J2ME (Java 2 Micro Edition) midlets, or targeting Windows Mobile phones.
I never ventured into developing a Symbian application in C++. However, for several months I've entertained some ideas, and they require to go down to the metal on a Nokia phone. Nokia provides only a C++ toolkit to access the data I need, so it was time for me to get back to good old C++, which I'm not using much lately.
I just needed some spare time to install all the necessary tools, which I somehow expected to be an unpleasant experience. It got worse :-).

It all started rather nicely. Nokia offers a Visual Studio 2005 plug-in (Carbide.vs 3.0), and they say it works on Windows Vista. Cool, as I have installed Vista in my new Core 2 Quad computer (ok, that might not have been a smart move :-).
So I download the thing, and I soon discover I have to download a bunch of other stuff as well (ActivePerl, a separate SDK, and whatever else). Well, at least I don't get any broken link during download, which is something :-).

I go ahead and install everything; well, I try to. Nokia seems to have tested Carbide.vs 3.0 on Vista, but not the other stuff. The installers fail, some with a nice error message (telling me to change the PATH manually), some silently, some (like the optional Microsoft PowerToys) with a cryptic error message. I discard the PowerToys, fix the PATH manually, and start up Visual Studio.

I get a dialog box requiring registration, fight a little with the slowest Nokia server on the planet, and I'm game. Thanks to the installed wizards, I get a barebone Symbian application in a few clicks. To my surprise :-), it builds and works fine in the emulator. So far, so good. I just need to build a release version and upload it into the phone.

Ok, there is no way to set a different target, just the emulator. Exploring the file system, I realise that the ARM toolchain has not been properly installed by the Symbian SDK installer. That is, the files have been created, but setup has not been started. I do it manually. I get a few problems with PATH again, and fix that manually too. Lucky for me, I kept all the default installation paths, although that means I've got files everywhere on the hard drive. I suspected they never tried to install anywhere else.

Well, now I can choose a release target in Visual Studio. When you try to build, however, you get a sequence of interesting errors. Fixing them was like entering hell's gates :-).

To make a long story very short, here are a few relevant links:

http://discussion.forum.nokia.com/forum/showthread.php?p=366765#post366765
http://discussion.forum.nokia.com/forum/showthread.php?t=107308
http://wiki.forum.nokia.com/index.php/Windows_Vista
http://wiki.forum.nokia.com/index.php/Moving_to_Windows_Vista#CSL_Arm_Toolchain

It's kinda fun to read that yeah, well, Carbide.vs 3.0 was tested on Vista, but the underlying SDK was not, and it does not work :-).

As usual, fixes you find in forums don't really apply verbatim, but in the end, after fixing paths, perl scripts, makefiles, and stuff like that, you can make it compile and even link. I had to copy as.exe like others to make it link too. I was naive enough to believe it was over.

Trying to upload the software to the phone, I got a certificate error. The phone was set up to accept unsigned applications (worked fine with unsigned midlets), but it refused the Symbian app anyway. Ok, time to sign the application. There is a time-limited developer certificate in the SDK, so I use that and get a signed app. Uploading fails again, with a "corrupted file" message from the phone.

Reading docs and dozens of internet posts did not help. The most frequent suggestion was to reset your phone, which I refused to do. Well, at least I realized there is an absurd signing policy (should I say police?) in Symbian, which requires developers to shell out some serious money if they intend to develop powerful apps. Quite dumb, but I won't talk about that, except to say it once again: DUMB :-).

In the end I tried to generate a brand new certificate, instead of using the one from the SDK. Guess what, it worked! (pure serendipity). Sure, I get all kind of warnings when installing on the phone (because it's a self-generated certificate) but the application is working. Now it's time to try out my little idea. Well, as soon as I get some time to waste, that is :-).

Bottom line: we can blame it on Vista; after all, even Microsoft PowerToys had problems. But if you look at the structure of Carbide.vs (and I had to :-), you'll see it's just a bunch of perl scripts creating makefiles on the fly, invoking gnu tools with crossed fingers, in the hope that somehow the whole stuff works. Gee. We had better development environments in the 80s :-).
I don't know if Carbide.c++ is any better than Carbide.vs in terms of structure. Of course, since the SDK is malfunctioning on Vista, Carbide.c++ will share the same problems as Carbide.vs. Maybe it's better structured. I honestly hope so :-), because as far as Carbide.vs goes, there is only one word to define it: unprofessional.

Labels: , ,


Sunday, December 02, 2007

 

More synchronicity

While reading the same issue of IEEE Software that I mentioned in my previous post, I came across a paper from Ed Yourdon, "Celebrating Peopleware's 20th Anniversary". I mentioned Peopleware while answering a comment to Process as the company's homeostatic system, but so far, it's just a coincidence.

It got more interesting as I went further, as Ed says:
"my colleague Larry Constantine and I had borrowed an even earlier collection of Alexander's ideas from his 1964 book Notes on the Synthesis of Form as the basis for the structured-design concepts of coupling and cohesion."
Oh, look, Notes on the Synthesis of Form, that's another interesting coincidence :-).

Speaking of cohesion, I should note that the process described by Alexander (modeling relationships between requirements as a weighted graph) has strong resemblances to the process adopted in KAOS (which I've mentioned in several posts now). The purpose is different however, as Alexander aims to derive clusters of highly cohesive requirements mechanically, while KAOS is leaning more on the soft side, allowing people to "see" the interplay between requirements.

Funny enough, in the same issue there is a paper by Simon Buckingham Shum ("There's Nothing Like a Good Argument..."), which describes a tool (Compendium) "providing a flexible visual interface for managing the connections between information and ideas" (from the compendium website).
I haven't tried it out yet, but from the screenshots, it seems to embody everything we need to apply a KAOS-like technique to requirements analysis, and also keep track of major design decisions.

Gee, everything seems so interconnected these days :-)

Labels: , , ,


This page is powered by Blogger. Isn't yours?