Wednesday, March 19, 2008 

(Simple) Metrics

I've been using metrics for a long time (certainly more than 10 years now). I've been using metrics to control project quality (including my own stuff, of course), to define acceptance criteria for outsourced code, to understand the way people work, to "smell" large projects before attempting a refactoring activity, to help making an informed refactor / rewrite decision, to pinpoint functions or classes in need of a careful review, to estimate residual bugs, an so on.

Of course, I use different metrics for different purposes. I also combine metrics to get the right picture. In fact, you can now find several tools to calculate (e.g.) code metrics. You can also find many papers discussing (often with contradictory results) the correlation between any given metric and (e.g.) bug density. In most cases, those papers are misguided, as they look for correlation between a single metric and the target (like bug density). Reality is not that simple; it can be simplified, but not to that point.

Consider good old cyclomatic complexity. You can use it as-is, and it can be useful to calculate the minimum reasonable number of test cases you need for a single function. It's also known that functions with higher cyclomatic complexity tend to have more bugs. But it's also well known that (on average) there is a strong, positive correlation between cyclomatic complexity (CC) and lines of code (LOC). That's really natural: long functions tend to have a complex control flow. Many people have therefore discounted CC, as you can just look at the highly correlated (and easier to calculate) LOC. Simple reasoning, except it's wrong :-).

The problem with that, again, is trying to use just one number to understand something that's too complex to be represented by a single number. A better way is to get both CC and LOC for any function (or method) and then use quadrants.

Here is a real-world example, albeit from a very small program: a smart client invoking a few web services and dealing with some large XML files on the client side. It has been written in C# using Visual Studio, therefore some methods are generated by the IDE. Also, the XML parser is generated from the corresponding XSD. Since I'm concerned with code which is under the programmer's control, I've excluded all the generated files, resulting in about 20 classes. For each method, I gathered the LOC and CC count (more on "how" later). I used Excel to get the following picture:


As you can see, every method is just a dot in the chart, and the chart has been split in 4 quadrants. I'll discuss the thresholds later, as it's more important to understand the meaning of each quadrant first.

The lower-left quadrant is home for low-LOC, low-CC methods. These are the best methods around: short and simple. Most code ought to be there (as it is in this case).

Moving clockwise, the next you get (top-left) is for high LOC, low CC methods. Although most coding standards tend to somehow restrict the maximum length of any given method, it's pretty obvious that a long method with a small CC is not that bad. It's "linear" code, likely doing some initialization / configuration. No big deal.

The next quadrant (top-right) is for high LOC, high CC methods. Although this might seem the worst quadrant, it is not. High LOC means an opportunity for simple refactoring (extract method, create class, stuff like that). The code would benefit from changes, but those changes may require relatively little effort (especially if you can use refactoring tools).

The lower-right quadrant is the worst: short functions with high CC (there are none in this case). These are the puzzling functions which can pack a lot of alternative paths into just a few lines. In most cases, it's better to leave them alone (if working) or rewrite them from scratch (if broken). When outsourcing, I usually ask that no code falls in this quadrant.

For the project at hand, 3 classes were in quadrant 3, so candidate for refactoring. I took a look, and guess what, it was pretty obvious that those methods where dealing with business concerns inside the GUI. There were clearly 3 domain classes crying to be born (1 shared by the three methods, 1 shared by 2, one used by the remaining). Doing so brought to better code, with little effort. This is a rather ordinary experience: quadrants pinpoint problematic code, then it's up to the programmer/designer to find the best way to fix it (or decide to leave it as it is).

A few words on the thresholds: 10 is a rather generous, but somewhat commonly accepted threshold for CC. The threshold for LOC depends heavily on the overall project quality. I've been accepting a threshold of 100 in quality-challenged projects. As the quality improves (through refactoring / rewriting) we usually lower the threshold. Being a new development, I adopted 20 LOC as a rather reasonable threshold.

As I said, I use several different metrics. Some can be used in isolation (like code clones), but in most cases I combine them (for instance, code clones vs. code stability gives a better picture of the problem). Coupling and cohesion should also be considered as pairs, never as single numbers, and so on.
Quadrants are not necessarily the only tool: sometimes I also look at the distribution function of a single metric. This is way superior to what too many people tend to do (like looking at the "average CC", which is meaningless). As usual, a tool is useless if we can't use it effectively.

Speaking of tools, the project above was in C#, so I used Source Monitor, a good free tool working directly on C# sources. Many .NET tools work on the MSIL instead, and while that may seem like a good idea, in practice it doesn't help much when you want a meaningful LOC count :-).

Source Monitor can export in CSV and XML. Unfortunately, the CSV didn't contain the detailed data I wanted, so I had to use the XML. I wrote a short XSLT file to extract the data I needed in CSV format (I suggest you use the "save as" feature, as unwanted spacing / carriage returns added by browsers may cripple the result). Use it freely: I didn't put a license statement inside, but all [my] source code in this blog can be considered under the BSD license unless otherwise stated.

Labels: , , ,

Saturday, July 14, 2007 

Get the ball rolling, part 2 (of 4, most likely)

I hope you had some time to read the XP episode paper and ponder a little on what they managed to get in the end. As I've anticipated, I do I have a few issues with the results. It's not a matter of correctness: they pass all the tests. It's a matter of quality.
The design is questionable, and even the code is questionable. No big deal: after all, it's a little more than a toy example. It gets slightly worse when you consider all this stuff in perspective (at the end of the XP game, code is all you get; I'll get back to this later). Right now, here are the main flaws I see:

- overabundance of numerical literals.
This is programming 101: literals are bad (although they keep your code a few line shorter). Go over the code. You'll see numbers like 10 just about everywhere. Unfortunately, in bowling you got 10 pins and also 10 frames. You always have to read carefully to understand which is which. Now go back to the wikipedia page on bowling. See the mention of 5-pin bowling, 9-pin bowling, etc? Just go ahead and change the code to use 9 pins and 10 frames(9-pin bowling ain't that simple, but anyway). Guess what, none of the test cases will help, and it's not a one-line change as it ought to be. There is even a 21 in that code, being 10 * 2 + 1. I'll let you figure what that 10, 2, and 1 are :-).

- programming against an implementation, not against the problem
Some portion of the code are indeed quite good:

 public int scoreForFrame(int theFrame)
  {
    ball = 0;
    int score=0;
    for (int currentFrame = 0; 
         currentFrame < theFrame; 
         currentFrame++)
    {
      if (strike())
        score += 10 + nextTwoBalls();
      else if (spare())
        score += 10 + nextBall();
      else
        score += twoBallsInFrame();
    }
    return score;
  }

You can basically read it in plain english. Unfortunately, most of the rest is not of the same quality, for instance:

 public void add(int pins)
  {
    itsScorer.addThrow(pins);
    adjustCurrentFrame(pins);
  }
  private void adjustCurrentFrame(int pins)
  {
    if (firstThrowInFrame == true)
    {
      if (adjustFrameForStrike(pins) == false)
        firstThrowInFrame = false;
    }
    else
    {
      firstThrowInFrame=true;
      advanceFrame();
    }
  }
...
  private void advanceFrame()
  {
    itsCurrentFrame = Math.min(10, itsCurrentFrame + 1);
  }

"Adjust" a frame? Couldn't we get it right on the first place :-)? Even a Math.min call?? C'mon. There isn't any resemblance with the problem domain. It's just programming against the implementation, until things seems to work. Now, I've seen a lot of code like this over the years; it's the typical code people get after years of tweaking, except here it didn't take years, just hours.
Now, tweaking code till it works is already bad per se, but it's much worse when you consider that in XP, at the end of the game, the code is the design, and the test cases are the requirements. Again, everything is fine when the domain is trivial, largely known, well documented elsewhere (like in this case). But if we tackle a difficult new problem in an uncharted domain, and all we leave behind is code like that, I pity the souls that will join the team.

- Feature Envy: Game over Frame
According to Fowler, a method (or worse, a class) is exhibiting Feature Envy
when it "seems more interested in a class other than the one it actually is in". A variation of Feature Envy is when the other class does not even exist, but it is continually talked about in a method (or another class). Take Game. Excluding empty lines. Game is 45 lines long; if you exclude brace-only lines, it's 25 lines long. Of those, 16 contain the word "Frame". Is that suggesting you a missing abstraction?

- Feature Envy: Scorer over Ball
Just like above, Scorer is 57 lines long excluding empty lines (35 excluding also braces). Of those, 15 contain the word Ball (and 6, the word Frame). C'mon. We're talking about bad smells here, you agile guys are supposed to fix this stuff.

(note: the two previous issues may come from hindsight, as in my design, I do have a Frame and a Ball class).

- "Scorer"?
Good ol' Peter Coad once talked about the "er-er Principle": Challenge any class name that ends in "-er" (e.g. Manager or Controller). If it has no parts, change the name of the class to what each object is managing. If it has parts, put as much work in the parts that the parts know enough to do themselves.
That's OOP 101, because if you need a manager, it's because you got passive objects, and objects are about behaviour, not just data. You need a scorer because you haven't got the right classes.

Overall, this is not code you may want to maintain. Wanna give it a try? Ok, try to change to rules to, e.g., 3-6-9 bowling or low-ball bowling. Or make your own random rule, like "the 7th frame is only allowed one ball". See how easy it is. Hey, don't give me that YAGNI look :-), you are supposed to embrace change :-)).
It's worth repeating that in a real case, you'll be much worse off: you won't have a lenghty transcript of the session, telling you about how they come to put together that code. You'll have just the code; except that, in real-world projects, code will be 3 or 4 orders or magnitude bigger. You'll have to figure out everything else just from the code. If that's the average quality, well, good luck.

Of course, having only code to maintain means cheaper maintenance. Well, more exactly, it means that the mechanical act of maintenance (modifying stuff that needs to be changed) is cheaper. Of course, maintenance is not just about the mechanics: it is (mainly) about understanding the existing, understanding the new, understanding how to get from here to there.
Some redundancy (between wordy requirements, high-level design, and code) can help. Indeed, when I decided to design & implement my version of what above, I found it more convenient to read the wikipedia page, not to reverse engineer requirements from code. Of course, I also used the test cases they wrote to corroborate my understanding of the rules. Some redundancy can be helpful. Just ask your favorite aerospace engineer :-)).
This is why I always teach people to look for the bright and the dark side of everything, redundancy included (which of course, has well known shortcomings as well). That's why I don't like "extreme" approaches that pretend there is no downside in making "extreme" choices, only benefits.

So, on the bright side :-)), the code above is short (excluding blank lines and excluding test code, it totals 102 lines). Indeed, in a usenet post RCM reports that people who have "generated code" from the devilish overdesigned :-) UML diagram and added behavior got a whooping 400 lines.
Big deal. That was (in the authors' intention) a sketchy domain model, not a sensible design model. And code generation is usually a scam. That has nothing to do with diagrammatic reasoning (quite the opposite, I would say: it's diagrammatic nonreasoning: more on this next time).
Back to the bright side of the above, that code uses only 2 classes, which could be good if you don't like diagrams (which would help you understand the collaboration between multiple classes). Appalling enough, considering that the two authors together have something like 70 years programming experience, it looks like beginners' code. Which could be good in some environments. I'll get back to this another time.

Ok, enough bad cop for today. I'll show you my stuff in a few days (meanwhile, you can grind your teeth :-). I'll also describe my "process" and then draw some conclusions.

Labels: , , ,