Wednesday, September 26, 2007
Voyage in the Agile Memeplex
I must confess that while reading the "Decontextualization" and "Agilese" paragraphs, I couldn't help thinking "hey, I can't remember when I wrote this article" :-).
Indeed, context has been a constant theme in my recent postings. To get a quick recap, I did a little digging and here are my most relevant context-aware :-)) posts, with short excerpts.
Overloading + Template = Reuse (August 09, 2005)
"Using scary formulas without a context"
Quadrants ( September 26, 2006)
"Following recipes (that includes TDD, Pair Programming, and whatever else if not carefully considered in context) just doesn't look agile (or plain old sensible ;-) to me."
Slip Charts ( November 14, 2006)
"but information without action is useless. Now, action is always context-dependent. For the real project above [...]"
Improving performances in C++ code (November 24, 2006)
"Decoupling is a good design principle. However, design is always contextual. Blindly following non-contextual design principles is a recipe for failure. "
"Following rigid guidelines, outside the problem context, makes a lot of critical thinking disappear, which may be good for your cognitive load :-) but not for the ultimate result"
Design for Outsourcing (April 11, 2007)
"countless pages have been spent in debates over the up-front Vs. as-you-go design approaches. What is often missing in most debates is context: quite often, there is some truth in both sides, that is, given the proper context, a given approach might be better suited. It's usually the (faulty) assumption that some approach can always be successfully adopted that makes so many debates futile."
"Design should be therefore discussed in-context. Context includes the technological issues, the market issues, the organizational issues, the human issues, and so on"
"In that particular context, what was missing offshore was domain knowledge, not programming knowledge"
"but as I said, design is contextual, we don't make choices in a void (this, again, is why I don't like frameworks that have made too many choices for me)."
GUI [Anti]Patterns and Safety-Critical Systems (April 24, 2007)
"there are quite a few interesting point in that paper on safety-critical systems. You will recognize the centrality of contextual design, as safety-critical systems may have to compromise on other important aspects (e.g. user productivity)."
Get the ball rolling, part 3 (of 4, pretty sure) (July 17, 2007)
"In the last few months I've been repeating a mantra (also in this blog): design is always highly contextual"
Get the ball rolling, part 4 (of 4, told ya :-) (August 04, 2007)
"In this context, the wise thing to do is write code first."
"Context is the key. Just as ignoring gravity ain't safe, ignoring context ain't agile."
Non-linearity, Modeling and Correctness by Design (September 04, 2007)
"Context-unaware suggestions like this are always short-sighted."
"Let's consider each one in context, as true agility cannot be achieved by applying cookbook recipes."
As you can see, the frequency is increasing among my 2007 posts, mostly as a reaction to the total lack of context in too much literature. So, I'm glad Philippe took the time to write so vehemently about it.
Of course, although he focuses quite a bit on agility, lack of context has plagued most of the pre-agile literature as well. Indeed, most of the pre-agile methodologists never spent too much time discussing their values, beliefs, and so on (their memeplex). A few old-timers (like Tom DeMarco, Michael Jackson, etc) always kept context in mind. Many others didn't.
However, to paraphrase Niklaus Wirth in my 1997 interview, in our maturing field, more attention to context should be considered not a dispensable luxury, but a simple necessity.
Labels: article reference
Tuesday, September 04, 2007
Non-linearity, Modeling and Correctness by Design
Early detection of decision points, where an innocent-looking choice can significantly alter the development schedule (or the probability of getting the software right) is even more fascinating. Even if the final decision rules in favor of the additional complexity (hopefully in exchange for greater benefits), it is important to understand the consequences of our choices.
Let's see a real-world example, although on a small scale.
A few weeks ago I was working on a top secret :-)) application, basically a rich client invoking web services on a remote server.
The users must authenticate before they can use the rich client, and it was decided that they will use their email address as the user id (instantly removing collisions), and a PIN instead of a user-chosen password.
The PIN is assigned by a server-side procedure when the user is registered, and users cannot change their PIN. This simplifies a few administrative issues, and also prevents the widely spread problem of password reuse (see, for instance, "The domino effect of password reuse" by Ives, Walsh, and Schneider, Communications of the ACM, April 2004).
At this point, the login screen could look somewhat like this:

Of course, assuming there is some support on the server side, this is so trivial to implement on the client side that you don't need to design anything. There is just no benefit; you can simply code your way through it. Any half-decent widget library natively supports a "password style" edit box which won't show what you're typing, so we're talking about a few lines of code here.
Now, the problem with a machine-generated PIN (everything has a downside!) is that it's harder to remember than a password chosen by the user. However, if they can't remember the PIN, users will write it down, compromising security again. We could argue that, instead of having people write it down, we could as well have the client computer write it down. The "remember me on this computer" checkmark of fame was then proposed (to be used only on computers considered "safe"):

All right, no need to design anything yet. It's just a checkmark, and when it's on, you save the email and PIN. Just remember to clear the storage if it's turned off. Well, ok, there is some chance to put a bug in here, but it's still quite simple, and most people, I guess, will just add some logic inside the form (or "login component", if they feel fancy about reuse), try a few test cases, and declare success.
Of course, storing a human-readable PIN on the client computer is quite lame. It's not much better than a post-it on the user's desk. Can we make the computer a little more helpful? Sure :-). We can store an encrypted PIN on the client computer. The problem is, the encryption key must be inside the rich-client, and this makes it vulnerable. Indeed, if the client computer can decode the encrypted PIN, it's just slightly harder (even for a naive hacker) to get hold of the decrypted PIN itself. Since the same PIN was used for authentication purposes on a related web site, that was deemed too risky.
Technology often comes to the rescue: we decided to use asymmetric encryption. The public key will be hard-coded inside the rich-client, but only the private key (deployed in our secure server), can decrypt the PIN. Good. That means we have to send the encrypted PIN to the server, whether we get it from the edit box (originally unencrypted) or from the local storage (already encrypted).
Now, this may seem like a tiny change upon storing the unencrypted PIN, especially when public key encryption is already available in standard libraries, so it seems that we just need to add a few function calls, and that would be it.
Well, if you think so, think twice :-), as you should catch the glimpse of a non-linear increase in complexity. Let's see why.
When the program opens the authentication screen it must now check if you have the email and [encrypted] PIN on local storage. If you do, it will display the email and... well, it can't just set the encrypted PIN in the corresponding text box, for several reasons, like:
- the encrypted PIN has a different length than the unencrypted PIN
- you usually get the unencrypted PIN from the textbox and crypt it, and you don't want to encrypt it twice.
So you probably want to set a dummy string there.
Now, the user may (for instance) turn off and back on the "remember me" checkmark, at which point you still have to use the encrypted PIN on file. But if the user changes PIN, while leaving the checkmark on (possibly turning it on and off any number of times), you have to read an unencrypted PIN from the textbox, encrypt it, authenticate, and save the new email + encrypted PIN in a local storage. If the checkmark is just turned off, but it was initially on, you have to use the encrypted PIN on file, but clear the local storage. But not if it was initially off. And so on: there are a quite a few different scenarios. Which leads to the second part of this post: modeling and correctness by design.
When faced with a sudden increase in complexity, you can basically follow three paths:
1) [recommended] Negotiate this complexity away. In this case, we didn't want to, as it provided a significant benefit to the user.
2) [suicidal] Ignore it: just work as you would do if that additional complexity wasn't there. In many cases, that means: just code your way through it.
3) [professional] Deal with it. Make it manageable, remove any accidental complexity (as Brooks would say), leave only the essential complexity on the table, shape and represent your problem in a way that is amenable to reasoning.
Now, I know from experience that, faced with the above, quite a few people would go for (2). Maybe they don't see the complexity surge. Maybe they think code is the only important artifact. Maybe they just don't know any better. But they will just add code inside that form, test a little, find bugs, patch it, find more bugs, and so on. In the end, they will release a faulty program, because they didn't try all the scenarios, and by focusing on single cases they missed the big picture.
I'm really curious about how the TDD guys would approach this problem (more on this later), but I can tell you how I did. There is obviously a state machine behind the problem. So my best shot was to define the precise behavior of that state machine. It's not a trivial on-off machine, so I needed a representation which helps me think. I could use a state/event table, or I could use a diagram. I choose a diagram, for reasons I'll discuss shortly.
Now, I must admit that it took me more than I expected to get it right on paper. I can blame my tool (the CASE tool we use on this project is based on UML 1.4, and we had to tweak it more than a bit to adopt some useful UML 2 modeling concepts) but it was largely a learning process. When I started, I felt like in this specific case superstates were the key to keep the diagram simple, so I sketched a model based on superstates. It didn't look nice (read: obviously correct, easy to understand and to implement).
I started moving some concepts from superstates to concurrent (orthogonal) states. In the end I killed the superstates, paying a small price in increased transitions, but the final model was reasonably simple. At the very least, it shows the real complexity of the problem. Here it is: overall, it took me something like two and a half hours to nail it down in this form.

You may want to think a little about the problem and take a look at the diagram (hey, you might even find a bug :-). You'll see that the model is relatively abstract - it doesn't deal with the GUI side at all, and it shouldn't. There is no attempt to capture keystrokes or mouse clicks. What is important at this abstraction level is that the PIN or email has been changed, or that the checkmark has been turned on or off. Everything else belongs elsewhere (in the GUI components, in the authentication form). Creating a model at the right abstraction level is fundamental, as it allows you to concentrate on the important issues, while at the same time shielding the model itself from irrelevant changes (more on this later).
Part of those 2.5 hours went into shaping the diagram. Shaping is not the same as drawing. In a sense, shaping is to drawing as architecture is to structure. Shaping is an intentional activity, directed at revealing the fundamental nature of what we're modeling. Indeed, the biggest difference between a visual model and a state/transition table is that a table won't give you any opportunity for shaping.
Now, look at that diagram again. Can you see the symmetry in shape and the antisymmetry in meaning/behavior? Can you relate that to the problem we're trying to solve? Do the crossing of transitions 6 and 7 further reinforce your understanding of the strict relationship between antisymmetry and behavior? Do you see how information is lost as you move left-to-right in transitions 6 and 7, so that you cannot get back, while it is preserved in transitions 4-5 and 8-9, so you can always get back? Also, see that anomaly on the bottom left side? The shape is no longer symmetric there. Why? Is that right? Can this be avoided? At what cost?
And so on. There is a lot of reasoning that can be suggested and simplified by the right shape. In the end, when the shape of the model closely resembles the shape of the problem, you can basically see that the model is right.
Remember: code is a model too, with the extremely useful property of being executable. However, you just can't apply the same kind of reasoning to code. It is not the right model to think about the global picture.
Ok, time to move to coding. I must confess I usually don't code simple stuff like this. I would usually leave this to someone else, someone who as been working with me on the model. However, having spent so much time on the model, I wanted to see first-hand if my feeling was right: coding should now take very little time, yield relatively few lines, and no bugs, as it was designed for correctness. So I implemented this thing myself.
Now, this may surprise you, but the problem didn't suggest me any fancy implementation. No State pattern, no table-driven state machine implementation. I used only one class, which I named Credentials (no, AuthenticationManager is not a good name :-), so no class diagram this time.
Internally, Credentials uses three enum types to model the orthogonal states. I simply coded the state-dependent methods with an infamous switch/case.
Why? Quite simple! I do not expect the kind of fine-grained changes that more sophisticated implementations can handle gracefully. The State pattern is about extendibility of states. A table-driven implementation makes event re-routing easy. But I do not foresee any need for this.
My estimate is that this code will stay untouched with a 99% probability. The remaining 1% represents a potential for disruptive change (like, we use some authentication device), at which point, in this case, no sophisticated implementation would shield this code from scrapping. Anyway, it's a very small class, so no big deal (excluding braces, empty lines and a few comments, it took me 50 lines of code, including the hard-coded public key and a call to the encryption library). More on this later.
Being so short, it took very little to implement. I didn't keep track, but I would say less than half an hour, including some initial testing. I still managed to put a bug in it. Well, actually not in the state machine itself, but in the glue code inside the form: I called an event when I should have called another (ok ok I'm pretty dumb). Of course, the bug wasn't subtle, and I caught it immediately . No more bugs. Game over.
Now, take a look at the state machine again. All those numbers in red are not strictly part of UML, but they're extremely useful to define scenarios and test cases. For instance, a scenario where the user starts up the application for the first time, types the email and pin, sets the "remember me" checkmark and authenticates successfully can be simply represented as "0-2-1-9-15-22". If, next time, he turns the checkmark off and then back on, but doesn't change a thing and still authenticates successfully, the scenario would be "0-3-5-4-10-20", and so on.
Which leads us to the next subject: testing.
How many interesting test cases do we have here? Quite a few, and you can quickly derive a list of significant case by "navigating" the diagram. Remember to include cases where the user makes some mistake, fails to authenticate, and then corrects himself (or cancel). That is, something like "0-2-13-14-13-23" to get a short one. If you look carefully, you'll find at least a couple of dozen interesting scenarios, possibly more, depending on how much testing you think is enough.
Now, what is the TDD guy supposed to do? Simple: shun the diagram, think about a scenario, make it fail, make it work. Think about the next scenario, and so on. All code-centered. Chance to get it right and clean? Sorry, I think it's quite small. See my previous posts on the XP episode for further evidence.
Time for another confession:I didn't test all the scenarios myself. We have professional testers on this project. Time ago, I trained them on testing state-based software. They can derive a sensible test plan from the state diagram (plus their domain knowledge and understanding of the problem, of course), execute it, report bugs. Sure, it is my responsibility to give them working code, which I did :-), as they didn't find any problem.
We didn't write automatic tests. It could be done - especially because I have a Credentials class decoupled from GUI code. But again, there would be little value on those automatic tests, for the very same reasons that brought me to choose a simple switch/case implementation. Which leads to the final point: change management.
The common tenets of code-centric practices is that if you really want to sketch a diagram you're free to do it (thanks guys :-), but once you get your code working, you must scrap the diagram, as maintenance costs would raise if you don't. Context-unaware suggestions like this are always short-sighted.
You basically touch your code for one of these 3 reasons:
1) you have a bug to fix
2) design (but not requirements) change - let's simplify this to refactoring.
3) requirements change
Let's consider each one in context, as true agility cannot be achieved by applying cookbook recipes.
Say you have a bug to fix. Some scenario is not behaving as intended. Would you rather:
a) play the faulty scenario on the diagram above, understand if it's truly a defect, understand if it is a design defect (the diagram is wrong) or a coding defect (the diagram is right but the code is not behaving properly), at which point you probably know exactly which transition or state is misbehaving, go there and fix it, without losing sight of the global picture.
b) start your debugger and just sift through the code, fix it locally, and rely on a hopefully large set of test cases to guarantee you haven't broken anything.
Dunno about you, but I'll choose (a) in this case - I've seen way too many "simple" systems like this fail for lack of global understanding.
Note that if you expect bugs, you may want to invest on a testing suite. I didn't, so I didn't.
Say you want to refactor your code. Maybe you decided, for some pervert :-) reason, to adopt the State pattern. Fine. Guess what, I choose to draw only a state (not class) diagram exactly because the structural side was too simple to benefit from modeling. Refactor all you want, the state chart above will stay valid. You could even squeeze a little more value out of it, by using it to reason about the new structure you're going to impose on your software - how will you handle concurrent states, and so on. No need to scrap the diagram.
Say requirements change. Good. The change can be of a rather large magnitude, like you're no longer using a PIN but an authentication device. The diagram is now largely useless - although you might want to take some inspiration as you draw a new one. Truth is, considering the abstraction level of this diagram, if the diagram gets useless, you ain't gonna save a single line of code either. So if you scrap the diagram, you scrap the code as well.
But maybe the change is on a smaller scale, like: "when the checkmark is on, you cannot edit fields. You have to turn it off before you can change the email or PIN" (this might even make sense, to prevent accidental modifications). Again, what would you rather do:
a) understand the impact of the change on the diagram; change it to follow the new requirements. Understand correctness on the diagram, as I suggested above. See which states and transitions need to change. Go to your code and change it. Test again.
b) open your editor, look at your test code, change it to follow new requirements, hope you got enough test cases (and removed the ones no longer valid), start patching the old code, refactor as you go to keep it in a decent shape.
Again, dunno about you, but I'll choose (a) in this case - if you benefit from diagrammatic reasoning when you first think about the system, you'll benefit from diagrammatic reasoning when any significant change arise.
Ok, this is a very long post, so I'll cut it short. You can create software expecting bugs, or expecting correctness. Just be careful about the self-fulfilling expectation you choose :-).
Labels: article reference, design





