Monday, January 25, 2010 

[Long] Quote of the Day

"I learned this, at least, by my experiment: that if one advances confidently in the direction of his dreams, and endeavours to live the life which he has imagined, he will meet with a success unexpected in common hours. He will put some things behind, will pass an invisible boundary; new, universal, and more liberal laws will begin to establish themselves around and within him; or the old laws be expanded, and interpreted in his favour in a more liberal sense, and he will live with the license of a higher order of beings. In proportion as he simplifies his life, the laws of the universe will appear less complex, and solitude will not be solitude, nor poverty poverty, nor weakness weakness. If you have built castles in the air, your work need not be lost; that is where they should be. Now put the foundations under them."

Henry David Thoreau

Labels:

Sunday, January 10, 2010 

Delaying Decisions

Since microblogging is not my thing, I decided to start 2010 by writing my longer post ever :-). It will start with a light review of a well-known principle and end up with a new design concept. Fasten your seatbelt :-).

The Last Responsible Moment
When we develop a software product, we make decisions. We decide about individual features, we make design decisions, we make coding decisions, we even decide which bugs we really want to fix before going public. Some decisions are taken on the fly; some, at least in the old school, are somewhat planned.

A key principle of Lean Development is to delay decisions, so that:
a) decisions can be based on (yet-to-discover) facts, not on speculation
b) you exercise the wait option (more on this below) and avoid early commitment

The principle is often spelled as "Delay decisions until the last responsible moment", but a quick look at Mary Poppendieck's website (Mary co-created the Lean Development approach) shows a more interesting nuance: "Schedule Irreversible Decisions at the Last Responsible Moment".

Defining "Irreversible" and "Last Responsible" is not trivial. In a sense, there is nothing in software that is truly irreversible, because you can always start over. I haven't found a good definition for "irreversible decision" in literature, but I would define it as follows: if you make an irreversible decision at time T, undoing the decision at a later time will entail a complete (or almost complete) waste of everything that has been created after time T.

There are some documented definitions for "last responsible moment". A popular one is "The point when failing to decide eliminates an important option", which I found rather unsatisfactory. I've also seen some attempts to quantify that better, as in this funny story, except that in the real world you never have a problem which is that simple (very few ramifications in the decision graph) and that detailed (you know the schedule beforehand). I would probably define the Last Responsible Moment as follows: time T is the last responsible moment to make a decision D if, by postponing D, the probability of completing on schedule/budget (even when you factor-in the hypothetical learning effect of postponing) decreases below an acceptable threshold. That, of course, allows us to scrap everything and restart, if schedule and budget allows for it, and in this sense it's kinda coupled with the definition of irreversible.

Now, irreversibility is bad. We don't want to make irreversible decisions. We certainly don't want to make them too soon. Is there anything we can do? I've got a few important things to say about modularity vs. irreversibility and passive vs. proactive option thinking, but right now, it's useful to recap the major decision areas within a software project, so that we can clearly understand what we can actually delay, and what is usually suggested that we delay.

Major Decision Areas
I'll skip on a few very-high-level, strategic decisions here (scope, strategy, business model, etc). It's not that they can't be postponed, but I need to give some focus to this post :-). So I'll get down to the more ordinarily taken decisions.

People
Choosing the right people for the project is a well-known ingredient for success.

Approach/Process
Are we going XP, Waterfall, something in between? :-).

Feature Set
Are we going to include this feature or not?

Design
What is the internal shape (form) of our product?

Coding
Much like design, at a finer granularity level.

Now, "design" is an overly general concept. Too general to be useful. Therefore, I'll split it into a few major decisions.

Architectural Style
Is this going to be an embedded application, a rich client, a web application? This is a rather irreversible decision.

Platform
Goes somewhat in pair with Architectural Style. Are we going with an embedded application burnt into an FPGA? Do you want to target a PIC? Perhaps an embedded PC? Is the client a Windows machine, or you want to support Mac/Linux? A .NET server side, or maybe Java? It's all rather irreversible, although not completely irreversible.

3rd-Party Libraries/Components/Etc
Are we going to use some existing component (of various scale)? Unless you plan on wrapping everything (which may not even be possible), this often end up being an irreversible decision. For instance, once you commit yourself to using Hibernate for persistence, it's not trivial to move away.

Programming Language
This is the quintessential irreversible decision, unless you want to play with language converters. Note that this is not a coding decisions: coding decisions are made after the language has been chosen.

Structure / Shape / Form
This is what we usually call "design": the shape we want to impose to our material (or, if you live in the "emergent design" side, the shape that our material will take as the final result of several incremental decisions).

So, what are we going to delay? We can't delay all decisions, or we'll be stuck. Sure, we can delay something in each and every area, but truth is, every popular method has been focusing on just a few of them. Of course, different methods tried to delay different choices.

A Little Historical Perspective
Experience brings perspective; at least, true experience does :-). Perspective allows to look at something and see more than it's usually seen. For instance, perspective allows to look at the old, outdated, obsolete waterfall approach and see that it (too) was meant to delay decisions, just different decisions.

Waterfall was meant to delay people decisions, design decisions (which include platform, library, component decisions) and coding decisions. People decision was delayed by specialization: you only have to pick the analyst first, everyone else can be chosen later, when you know what you gotta do (it even makes sense -)). Design decision was delayed because platform, including languages, OS, etc, were way more balkanized than today. Also, architectural styles and patterns were much less understood, and it made sense to look at a larger picture before committing to an overall architecture.
Although this may seem rather ridiculous from the perspective of a 2010 programmer working on Java corporate web applications, most of this stuff is still relevant for (e.g.) mass-produced embedded systems, where choosing the right platform may radically change the total development and production cost, yet choosing the wrong platform may over-constrain the feature set.

Indeed, open systems (another legacy term from late '80s - early '90s) were born exactly to lighten up that choice. Choose the *nix world, and forget about it. Of course, the decision was still irreversible, but granted you some latitude in choosing the exact hw/sw. The entire multi-platform industry (from multi-OS libraries to Java) is basically built on the same foundations. Well, that's the bright side, of course :-).

Looking beyond platform independence, the entire concept of "standard" allows to delay some decision. TCP/IP, for instance, allows me to choose modularly (a concept I'll elaborate later). I can choose TCP/IP as the transport mechanism, and then delay the choice of (e.g.) the client side, and focus on the server side. Of course, a choice is still made (the client must have TCP/IP support), so let's say that widely adopted standards allow for some modularity in the decision process, and therefore to delay some decision, mostly design decisions, but perhaps some other as well (like people).

It's already going to be a long post, so I won't look at each and every method/principle/tool ever conceived, but if you do your homework, you'll find that a lot of what has been proposed in the last 40 years or so (from code generators to MDA, from spiral development to XP, from stepwise refinement to OOP) includes some magic ingredient that allows us to postpone some kind of decision.

It's 2010, guys
So, if you ain't agile, you are clumsy :-)) and c'mon, you don't wanna be clumsy :-). So, seriously, which kind of decisions are usually delayed in (e.g.) XP?

People? I must say I haven't seen much on this. Most literature on XP seems based on the concept that team members are mostly programmers with a wide set of skills, so there should be no particular reason to delay decision about who's gonna work on what. I may have missed some particularly relevant work, however.

Feature Set? Sure. Every incremental approach allows us to delay decisions about features. This can be very advantageous if we can play the learning game, which includes rapid/frequent delivery, or we won't learn enough to actually steer the feature set.
Of course, delaying some decisions on feature set can make some design options viable now, and totally bogus later. Here is where you really have to understand the concept of irreversible and last responsible moment. Of course, if you work on a settled platform, things get simpler, which is one more reason why people get religiously attached to a platform.

Design? Sure, but let's take a deeper look.

Architectural Style: not much. Quoting Booch, "agile projects often start out assuming a given platform and environmental context together with a set of proven design patterns for that domain, all of which represent architectural decisions in a very real sense". See my post Architecture as Tradition in the Unselfconscious Process for more.
Seriously, nobody ever expected to start with a monolithic client and end up with a three-tier web application built around a MVC pattern just by coding and refactoring. The architectural style is pretty much a given in many contemporary projects.

Platform: sorry guys, but if you want to start coding now, you gotta choose your platform now. Another irreversible decision made right at the beginning.

3rd-Party Libraries/Components/Etc: some delay is possible for modularized decisions. If you wanna use hibernate, you gotta choose pretty soon. If you wanna use Seam, you gotta choose pretty soon. Pervasive libraries are so entangled with architectural styles that it's relatively hard to delay some decisions here. Modularized components (e.g. the choice of a PDF rendering library) are simple to delay, and can be proactively delayed (see later).

Programming Language: no way guys, you have to choose right here, right now.

Structure / Shape / Form: of course!!! Here we are. This is it :-). You can delay a lot of detailed design choices. Of course, we always postpone some design decision, even when we design before coding. But let's say that this is where I see a lot of suggestions to delay decisions in the agile literature, often using the dreaded Big Upfront Design as a straw man argument. Of course, the emergent design (or accidental architecture) may or may not be good. If I had to compare the design and code coming out of the XP Episode with my own, I would say that a little upfront design can do wonders, but hey, you know me :-).

Practicing
OK guys, what follows may sound a little odd, but in the end it will prove useful. Have faith :-).
You can get better at everything by doing anything :-), so why not getting better at delaying decisions by playing Windows Solitaire? All you have to do is set the options in the hardest possible way:

now, play a little, until you have to make some decision, like here:

I could move the 9 of spades or the 9 of clubs over the 10 of hearts. It's an irreversible decision (well, not if you use the undo, but that's lame :-). There are some ramifications for both choices.
If I move the 9 of clubs, I can later move the king of clubs and uncover a new card. After that, it's all unknown, and no further speculation is possible. Here, learning requires an irreversible decision; this is very common in real-world projects, but seldom discussed in literature.
If I move the 9 of spades, I uncover the 6 of clubs, which I can move over the 7 of aces. Then, it's kinda unknown, meaning: if you're a serious player (I'm not) you'll remember the previous cards, which would allow you to speculate a little better. Otherwise, it's just as above, you have to make an irreversible decision to learn the outcome.

But wait: what about the last responsible moment? Maybe we can delay this decision! Now, if you delay the decision by clicking on the deck and moving further, you're not delaying the decision: you're wasting a chance. In order to delay this decision, there must be something else you can do.
Well, indeed, there is something you can do. You can move the 8 of aces above the 9 of clubs. This will uncover a new card (learning) without wasting any present opportunity (it could still waste a future opportunity; life it tough). Maybe you'll get a 10 of aces under that 8, at which point there won't be any choice to be made about the 9. Or you might get a black 7, at which point you'll have a different way to move the king of clubs, so moving the 9 of spades would be a more attractive option. So, delay the 9 and move the 8 :-). Add some luck, and it works:

and you get some money too (total at decision time Vs. total at the end)


Novice solitaire players are also known to make irreversible decision without necessity. For instance, in similar cases:

I've seen people eagerly moving the 6 of aces (actually, whatever they got) over the 7 of spades, because "that will free up a slot". Which is true, but irrelevant. This is a decision you can easily delay. Actually, it's a decision you must delay, because:
- if you happen to uncover a king, you can always move the 6. It's not the last responsible moment yet: if you do nothing now, nothing bad will happen.
- you may uncover a 6 of hearts before you uncover a king. And moving that 6 might be more advantageous than moving the 6 of aces. So, don't do it :-). If you want to look good, quote Option Theory, call this a Deferral Option and write a paper about it :-).

Proactive Option Thinking
I've recently read an interesting paper in IEEE TSE ("An Integrative Economic Optimization Approach to Systems Development Risk Management", by Michel Benaroch and James Goldstein). Although the real meat starts in chapter 4, chapters 1-3 are probably more interesting for the casual reader (including myself).
There, authors recap some literature about Real Options in Software Engineering, including the popular argument that delaying decisions is akin to a deferral option. They also make important distinctions, like the one between passive learning through deferral of decisions, and proactive learning, but also between responsiveness to change (a central theme in agility literature) and manipulation of change (relatively less explored), and so on. There is a a lot of food for thought in those 3 chapters, so if you can get a copy, I suggest that you spend a little time pondering over it.
Now, I'm a strong supporter of Proactive Option Thinking. Waiting for opportunities (and then react quickly) is not enough. I believe that options should be "implanted" in our project, and that can be done by applying the right design techniques. How? Keep reading : ).

The Invariant Decision
If you look back at those pictures of Solitaire, you'll see that I wasn't really delaying irreversible decisions. All decisions in solitaire are irreversible (real men don't use CTRL-Z). Many decisions in software development are irreversible as well, especially when you are in a tight budget/schedule, so starting over is not an option. Therefore, irreversibility can't really be the key here. Indeed, I was trying to delay Invariant Decisions. Decisions that I can take now, or I can take later, with little or no impact on the outcomes. The concept itself may seem like a minor change from "irreversible", but it allows me to do some magic:
- I can get rid of the "last responsible moment" part, which is poorly defined anyway. I can just say: delay invariant decisions. Period. You can delay them as much as you want, provided they are still invariant. No ambiguity here. That's much better.
- I can proactively make some decisions invariant. This is so important I'll have to say it again, this time in bold: I can proactively make some decisions invariant.

Invariance, Design, Modularity
If you go back to the Historical Perspective paragraph, you can now read it under a different... perspective :-). Several tools, techniques, methods can be adopted not just to delay some decision, but to create the option to delay the decision. How? Through careful design, of course!

Consider the strong modularity you get from service-oriented architecture, and the platform independence that comes through (well-designed) web services. This is a powerful weapon to delay a lot of decisions on one side or another (client or server).

Consider standard protocols: they are a way to make some decision invariant, and to modularize the impact of some choices.

Consider encapsulation, abstraction and interfaces: they allow you to delay quite a few low-level decisions, and to modularize the impact of change as well. If your choice turn out to be wrong, but it's highly localized (modularized) you may afford undoing your decision, therefore turning irreversible into reversible. A barebone example can be found in my old post (2005!) Builder [pattern] as an option.

Consider a very old OOA/OOD principle, now somehow resurrected under the "ubiquitous language" umbrella. It states that you should try to reflect the real-world entities that you're dealing with in your design, and then in your code. That includes avoiding primitive types like integer, and create meaningful classes instead. Of course, you have to understand what you're doing (that is, you gotta be a good designer) to avoid useless overengineering. See part 4 of my digression on the XP Episode for a discussion about adding a seemingly useless Ball class (that is: implanting a low cost - high premium option).
Names alter the forcefield. A named concept stands apart. My next post on the forcefield theme, by the way, will explore this issue in depth :-).

And so on. I could go on forever, but the point is: you can make many (but not all, of course!) decisions invariant, if you apply the right design techniques. Most of those techniques will also modularize the cost of rework if you make the wrong decision. And sure, you can try to do this on the fly as you code. Or you may want to to some upfront design. You know what I'm thinking.

OK guys, it took quite a while, but now we have a new concept to play with, so more on this will follow, randomly as usual. Stay tuned.

Labels: , , , , ,

Friday, January 01, 2010 

Inspirational reading

The Self as a Center of Narrative Gravity

By the way guys, happy new year : )

Labels: ,

Tuesday, December 15, 2009 

A little more on DSM and Gravity

In a recent paper ("The Golden Age of Software Architecture" Revisited, IEEE Software, July/August 2009), Paul Clements and Mary Shaw conclude by talking about Conformance Checking. Indeed, although many would say that the real design/architecture is represented by code, a few :-) of us still think that code should reflect design, and that conformance of code to design should be automatically checked when possible (not necessarily in any given project; not all projects are equal).
Conformance checking is not always simple; quoting Clements and Shaw: "Many architectural patterns, fundamental to the system’s design taken forward into code, are undetectable once programmed. Layers, for instance, usually compile right out of existence."

The good news is that layers can be easily encoded in a DSM. While doing so, I would use an extension of the traditional yes/no DSM, as I've anticipated in a comment to the previous post. While the traditional DSM is basically binary (yes/no), in many cases we are better off with a ternary DSM. That way, we can encode three different decisions:
Yes-now: there is a dependency, and it's here, right now.
Not-now: there is no dependency right now, but it wouldn't be wrong to have one.
Never: adding this dependency would violate a fundamental design rule.

A strong layered system requires some kind of isolation between layers. Remember gravity: new things are naturally attracted to existing things.
Attraction is stronger in the direction of simplicity and lack of effort: if no effort is required to violate architectural integrity, sooner or later it will be violated. Sure, conformance checking may help, but it would be better to set up the gravitational field so that things are naturally attracted to the right place.

The real issue, therefore, is the granularity of the DSM for a layered system. Given the fractal nature of software, a DSM can be applied at any granularity level: between functions, classes, "logical" components, "physical" components. Unless your system is quite small, you probably want to apply the DSM at the component level, which also means your layers should appear at the component level.

Note the distinction between logical and physical component. If you're working in a modern language/environment (like .NET or Java), creating a physical component is just a snap. Older languages, like C++, never got the idea of component into the standard, for a number of reasons; in fact, today this is one of the most limiting factors when working on large C++ system. In that case, I've often seen designer/programmers creating "logical" components out of namespaces and discipline. I've done that myself too, and it kinda works.

Here is the catch: binary separation between physical components is stronger than the logical separation granted from using different namespaces, which in turn is stronger than the separation between two classes in the same namespace, which is much stronger than the separation between two members of the same class.
More exactly, as we'll see in a forthcoming post, a binary component may act as a better shield and provide stronger isolation.

If a binary component A uses binary component B, and B uses binary component C, but does not reveal so in its interface (that is, public/protected members of public classes in B do not mention types defined in C) A knows precious nothing about C.
Using C from A requires that you discover C existence, then the existence of some useful class inside C. Most likely, to do so, you have to look inside B. At that point, adding a new service inside B might just be more convenient. This is especially true if your environment does not provide you with free indirect references (that is, importing B does not inject a reference to C "for free").
Here is again the interplay between good software design and properly designed languages: a better understanding of software forces could eventually help to design better languages as well, where violating a design rule should be harder than following the rule.

Now, if A and B are logical components (inside a larger, physical component D), then B won't usually act as a shield, mostly because the real (physical) dependency will be placed between D and C, not between B and D. Whatever B can access, A can access as well, without any additional effort. The gravitational field of B is weaker, and some code might be attracted to A, which is not what the designer wanted.

Therefore, inasmuch as your language allows you to, a physical component is always the preferred way to truly isolate one system from another.

OK, this was quite simple :-). Next time, I'll go back to the concept of frequency and then move to isolation!

Labels: , , , ,

Wednesday, October 21, 2009 

The Dependency Structure Matrix

Design is about making decisions; diagrams encode some of those decisions. Consider this simple component diagram:



We have 3 "physical" components (e.g. DLLs) X, C, D. X is further partitioned in 2 logical components: in this real-world case, the designer used namespaces to identify separate logical components inside a single physical component. The designers is also telling us that A and B depends on D, B depends on C, C depends on D. So far, so good.

UML diagrams, however, cannot easily convey some part of the reasoning. In a sense, to fully grasp the designer's intention, we have to understand not only what is in the diagram, but also what is not in the diagram. This may seem unusual, but is easily explained. Consider the picture above again. There is no dependency between A and C. Now, maybe A doesn't currently need to access C (and therefore there is no dependency) but if we need to access C from A tomorrow, it's just fine to add a dependency. Or maybe the designer's intent was to shield A from C, possibly using B as a man-in-the-middle.
That's not obvious from the diagram, and there is no place in the diagram to say that (not with a formal, standard UML syntax). Of course, good names may help. Replacing B with something more meaningful, maybe mentioning a bridge or proxy pattern, may suggest that A is not supposed to interact with C.

Is there a better way? Maybe something that can be actually checked against code? Checking code compliance with diagrams may seem so passe' or even plain absurd, given the current trend of discarding diagrams and/or reverse-engineering diagrams from code. Still, here is a real-world story:
The design above (which is, of course, largely simplified) was handed out from the original designers-implementers to a larger (offshore) team. They explained some of the design rationale (informally), and after a while, they left the company. Months later, the offshore team needed a new service from C inside A, so they did the simplest thing that can possibly work: they called C from A. After all, A and B are inside the same physical component. Whatever B can do, A can do too.
Unfortunately, a cornerstone of the original design was that A should never talk to C. The dependency was not in the diagram, because it was not supposed to exist, ever.

The team manager knew that, but given the size of the real X (about 500 KLOC) she couldn't possibly review all the changes from the offshore team. Of course, at least someone in the offshore team didn't fully grasp the designer's intent.

So, back to the original question: is there a better way? I could say "a forcefield diagram" :-), but in this specific case, there is also a well-known engineering tool: the Dependency Structure Matrix (also known as the Design Structure Matrix). A DSM encodes dependencies between "things". Not just dependencies, but also forbidden dependencies. See the following picture:



The 5 green "Y" cells correspond to the 5 existing dependencies; the "N" cells correspond to the "missing" dependencies, but they say something more: that those dependencies are forbidden. Now, this is a useful piece of information, something that can be easily checked against code. That does not mean that we can't change the design: it simply means we don't want to change the design inadvertently, just by typing in some code that was not supposed to be there. Checking code against the abstract design should just prompt a review; the design could be wrong, in which case, it should be changed (along with the DSM).

There is some interesting literature about DSM in software, most from Baldwin and Clark of "Design Rules" fame, but also from others (like one I mentioned back in 2005). There are also quite a few tools to reverse-engineer a DSM from code, which makes checking code against the designed DSM relatively trivial (the bad side is that some languages, like C++, are notably hard to reverse engineer, so tools are lacking; Java and C# have both free and commercial tools available). I'm not aware of any UML tool that can generate a DSM from the diagrams, but that's theoretically trivial, and could even be built as a plug-in for some CASE tools.

As usual, there is more to say about the DSM, gravity, and the forcefield. I'll save that for my next post!

Labels: ,

Monday, October 05, 2009 

A ForceField Diagram

The Design Rationale Diagram I discussed in my previous post is hardly complete, and it could be vastly improved by asking slightly different questions, leading to different decision paths. Still, it's a reasonable first-cut attempt to model the decision process. It can be used to communicate the reasoning behind a specific decision, in a specific context.

That, however, is not the way I really think. Sure, I can rationalize things that way, but it's not the way I store, recall, organize information inside my head. It's not the way I see the decision space.
In the end, software design is about things going together and things staying apart, at all the granularity levels (see also my post on partitioning).
As I progress in my understanding of forces, I tend to form clusters. Clusters are born out of attraction and rejection inside the decision space. I've found that thinking this way helps me reach a better understanding of my design instinct, and to communicate my thoughts more clearly.

Now, although I've been thinking about this for long while (not full-time, lucky me :-), I can't say I have found the perfect representation. The decision space in inherently multi-dimensional, and I always end up needing more dimensions that I can fit either in 2D or 3D. Over time, I tried several notations, inventing things from scratch or borrowing from other domains. Most were dead ends. In the end, I've chosen (so far :-) a very simple representation, based on just 3 concepts (possibly 4 or 5).

- nodes
Nodes represent information, which is our material. Information has fractal nature, and I don't bother if I'm mixing up levels. Therefore, a node may represent a business goal, or the adoption of a tool or library, or a nonfunctional requirement, or a specific component, class, function. While most methods are based on a strict separation of concepts, I find that very limiting.

- an attraction relationship
Nodes can attract each other. For instance, a node labeled "reliable" may attract a node labeled "redundant" when reasoning about the large display problem. I just connect the two nodes using a thick line with little "hands" on the ends. I place attracted nodes close to each other.

- a rejection relationship
Nodes can reject each other. For instance, stateful most clearly reject stateless :-). Some technology might be at odd with another. A subsystem must not depend on another. And so on. Nodes that reject each other are placed at some distance.

It's all very simple and unsophisticated. Here is an example based on the large display problem, inspired by the discussion on design rationale:



and here are two diagrams I've used in real-world projects recently, scaled down to protect the innocent:





The relationship between a node, a cluster, and an Alexandrian center is better left for another time. Still, a node in one diagram may represent an entire cluster, or an entire diagram. Right now I'm tempted to use a slightly different symbol (which would be the fourth) to represent "expandable" nodes, although I'm really trying to keep symbols to a bare minimum. I'm also using colors, but so far in a very informal way.

As simple as it is, I've found this diagram to be very effective as a reasoning device, while too many diagrams end up being mere documentation devices. I can sit in front of my (large :-) screen, think and draw, and the drawing helps me focus. I can draw this on a whiteboard in a meeting, and everyone get up to speed very quickly on what I'm doing.

This, however, is just half the story. We can surely work with informal concepts and diagrams, and that's fine, but what I'm trying to do is to add precision to the diagram. Precision is often confused with details, like "a class diagram is more precise if you show all the parameters and types". I'm not looking for that kind of "precision". Actually, I don't want this diagram to be redundant with code at all; we already have many code-like diagrams, and they all get down the same roads (generate code from diagrams or generate diagrams from code). I want a reasoning device: when I want to code, I'm comfortable with code :-).

I mostly want to add precision about relationships. Why, for instance, is there an attraction between Slow Client and Stateful? Informally, because if we have a stateful system, the slow client can poll on its own terms, or alternatively, because the client may use a sophisticated subscription based on the previous state. Those options, by the way, could be represented on the forcefield diagram itself (adding more nodes, or a nested diagram); but that's still the "informal" reasoning. Can we make it any more formal, precise, grounded on sound principles?

This is where the ongoing work on concepts like gravity, frequency, and so on kicks in. Slow Client and Stateful are attracted because on a finer granularity (another, perhaps better, diagram) "Slow Client" means a publisher and a subscriber operating at different frequencies, and a stateful repository is a well-known strategy (a pattern!) to provide Isolation between systems operating at different frequencies (together with synchronization or transactions).

Now, I haven't introduced the concept of Isolation yet (though I mentioned something on my Facebook page :-), so this is sort of a spoiler :-)), but in the end I hope to come up with a simple reasoning system, where you can start with informal concepts and refine nodes and forces until you reach the "universal", fractal forces I'm discussing in the "Notes on Software Design" posts. That would give a solid ground to the entire diagram.

A final note on the forcefield diagram: at this stage, I'm just using Visio, or more exactly, I'm abusing some stencils in the Visio library. I wanted something relatively organic, mindmap-like. Maybe one day I'll move back to some 3D ideas (molecular structures come to mind), but I've yet to see how this scales to newer concepts, larger problems, and so on. If you want to play with it, I can send you the VSS file with the stencils.

Ok, I'll get back to Frequency (and Interference and Isolation and more :-) soon. Before that, however, I'd like to take a diversion on the Dependency Structure Matrix. See ya!

Labels: ,

Monday, August 31, 2009 

Representing Design Rationale Inside Activity Diagrams

Design is about making choices. We often do so on the fly, leaning on experience and intuition, by talking about the problem with colleagues, or borrowing from literature (e.g. patterns). We also make some choice by habit, which is a different form of experience, one that has higher risk of becoming disconnected with the real problem.
Most of this process is tacit, and even when we discuss choices openly, it doesn't get recorded. Sometimes, a list of pros/cons is made when there is some disagreement about the best option.

This all works well when the problem is simple, but sometimes even experienced designers feel like they're not grasping the essential issues, that something has not yet been found, named, disentangled. This is when having yet one more tool can prove useful.
Now, I don't usually go through the effort to model and transcribe the rationale behind each and every design choice I make. It could be interesting, also from a pedagogical point of view, but it would take a lot of time and would probably disrupt my thought processes. However, when the issues are particularly thorny/unclear, or when there is a large disagreement on the best choice (or even on the goals and criteria), I've found that getting design rationale out of our individual heads and talk on a shared representation can move things a little forward.

Over the years, I've tried out a number of tools, approaches, and so on; lately, I've tried using Activity Diagrams in a rather unorthodox way, to represent my reasoning about design, not design itself. The idea is not to encode your decisions ex-post, but ex-ante, while you're thinking (that is, while they're still options, not decisions). Also, the diagram must be considered quite fluid, as it shows our current understanding, and we're building the diagram to improve our understanding.

Enough talk, let's see a realistic example. I'll refer to the Large Display problem I discussed a few months ago. Actually, I'll just cover the initial choice between using a real-time database or an IPC/messaging system. It's gonna be quite a mouthful anyway!

To start, I'll have to draw a line between a messaging system and a RTDB, and that in itself is not easy. I'll go for a very simple distinction, because my goal here is not really to talk about RTDBs, but about design rationale (the usual "look at the moon, not at the finger" concept).
So, consider a control system that reads some data from the field and then needs to publish those data for other processes. It could just send data through a messaging (publish/subscribe) system. Here I define a messaging system as stateless, meaning it simply keeps track of subscriptions, and sends everything that is published to the subscribers (according to some criteria, like message type or tag). It does not keep an history, or a snapshot of what has been last sent. Therefore, it cannot apply some filters, like "notify me only if the difference between the previous value and the current value is above a threshold" because the previous value is just not stored. Also, when a subscriber is started, it cannot get the current snapshot of the system, because it is not there: it will have to wait for messages to come, incrementally. Shortly stated, a RTDB will keep a snapshot of the system, and well, you can figure out the difference. Of course, a RTDB is also more complex.

So, how do we choose between a messaging system and a RTDB? We may write down a long list of pro/cons, but that's really unstructured, and that's not the way our brain works. To provide more structure, I use an Activity Diagram with orthogonal swimlanes (all the following pictures are taken from Star UML, a free tool that is rather fast and unobtrusive).
The vertical swimlanes are flexible: they represent the main concerns. The horizontal swimlanes are fixed: they provide structure.


For the Large Display problems, we could start with a few main concerns like performance, reliability, cost, and so on. We just drop the names on the vertical swimlanes. My template is then partitioned in 3 horizontal bands: the root question, the reasoning, the outcome. Everything inside is dynamic, and changes as we understand more: even the root questions may change, as we discover larger or smaller, independent problems. Sometimes, even the main concerns change, as we discover options or issues we didn't consider before.

We can focus on just one concern right now, let's say performance (don't we all like performance? :-). A first-cut, interesting top-level question could be: is the published data rate high or low? If the rate is low and we have no persistent state, when you turn on the large display you see nothing: you have to wait till some data gets published. On the other hand, if the rate is high, it may even overwhelm the display system: there is little need to refresh a value a thousand times per second. That actually depends on the display: if it's a real-time plot, you may want a high refresh rate too.

Ok, we could start modeling this part of our reasoning using the familiar activity diagram symbols. Actually, since most of the nodes here would be decision nodes, I just omit the diamond and use an activity node with multiple outgoing paths to show choices.


Note: The empty boxes are just placeholders for some later reasoning. It's just laziness on my side :-) and they wouldn't appear in a real diagram.

Now, this seems just like a decision tree, but it's slightly different. First, it's a decision graph: common choices between paths are shared, and this is a precious information because it shows crucial choices (more on this later). Second, it's a multifaceted graph: every vertical swimlane shows a facet of a more complex reasoning; for instance, what is good for performance might not be good for reliability or cost.

Let's try to move ahead a little. When the incoming data rate is higher than what [most] clients need, we have basically two choices:
1) smarter subscriptions; they could still be rather dumb, like "no more than 3 times per second" or much smarter like "when relative change is higher than 5%, but no more than 5 times per second". Note that the latter is more suited to a RTDB than to a stateless messaging system.
2) change paradigm and move to client-initiated polling. The clients will ask for data with their own timing. Of course, at this point we give up the possibility of not asking for data if the value has not changed. Anyway, this again requires some kind of stateful middleware; a messaging system won't do.
When data rate is low, but high startup time for clients is not an option, we can't wait for data to come: we have to poll, at least at startup. So, polling can solve two problems, of course at expense of bandwidth if it is the only available option.



While drawing this, we may come to the conclusion that we need to ask better questions: are we building a publisher-driven or a client-driven system? If it's client-driven, it cannot be stateless! What do we really know about clients? How many there will be? What about publishers? What is the typical data rate and configuration? What are we aiming for? Do we need to narrow the expectations? This might change the top question (client Vs. publisher driven) or even some concern. That's fine, it means the technique is working :-) and that it's helping us thinking.

Now, it would take quite a lot of time to explore all the facets of even a simple system like this. Actually, most people won't even do it in real life: they will fall in love with one idea, spend most of their time preaching and rationalizing about the virtues of their idea, and never really take the time to go through this kind of process. Still, trying to work out the "Reliability" swimlane would prove interesting. For instance, a common technique to achieve reliability is redundancy. Redundancy is much easier for a stateless system. Redundancy is easier when clients don't have to subscribe at all, but can simply poll. And so on. If you have some spare time, you may want to give it a try.

The notation I use is quite informal. I could improve that easily: UML is fairly flexible; so far I didn't, because people can grasp it anyway, even when I drop in the < < or > > to represent options or when I have just one arrow coming out, meaning that I've just decomposed a choice and a consequence. It's just a reasoning workflow, and I haven't felt the need to make it any more precise than that.

Back to the forcefield: the rationale is not the forcefield. The rationale, however, is talking about forces and centers. Outcomes (messaging and RTDB) are centers. Main choices, like "client driven" or "stateless", are again centers. Those centers are attracting or rejecting each other. This is the forcefield. This is closer to the way I think in the back of my mind, how I "see" the system, how I keep options open. Now, I just need a way to show this. That's for my next post :-).

Labels: ,