Monday, April 30, 2007
Metaphors and Design
This careful scrutiny of works and visions is something we seldom do in the software realm. Many software engineers are too concerned with the latest newfangled technology to give a damn about what some guy said 50 years ago (or 5, or 1). That has rather dramatic consequences on the overall maturity of the field. Good design requires vision. Vision (almost by definition :-) won't come from blindly following a trendy technology.
Unfortunately, even if we do want to learn from great software thinkers, there is a dearth of valuable material. I've often suggested to get acquainted with the works of David Parnas, whose influence on modern programming paradigms is undisputed. I can also recommend to read some of the Edsger W. Dijkstra manuscripts.
A nice start could be My recollections of operating system design. Sure, you won't find anything about the latest trends, but hey, here he's talking about inventing the concept of interrupt, the difficulty of separation of concern, early concurrency, and most importantly, about a discipline of thinking.
An interesting passage: "[...] created a host of new problems, so new that we didn't really know how to think and talk about them and what metaphors to use".
Abstraction is a cornerstone of software development (once you outgrow small programs, where programming on the metal is often the best thing). Yet abstraction is not enough: we need to find the right metaphors; poor metaphors are poor thinking tools.
It is interesting to see how people often try to sidestep metaphors, looking for a more machine-oriented way of thinking. This can be useful when you are trying to get familiar with some new concepts: for many years, people have found it useful to understand the mechanics behind inheritance, as a way to learn some OOP concepts. Of course, you can't effectively think at that level all the time: sooner or later, the whole inheritance+polymorphism metaphor must kick in, freeing you from low-level concerns.
This concept is also central in a short paper by Gregor Kiczales (It's Not Metaprogramming) about a "direct semantics" for AOP.
Kiczales is right claiming that although both metaprogramming and AOP might be seen as ways to manipulate code (through code generation or by method interception), this is a low-level view, and we should move toward more abstract thinking to design AO systems.
Unfortunately, it seems to me that the usual AOP concepts are still somewhat too low-level to engage in abstract thinking and modeling, and that some progress is still needed to find the appropriate metaphors. Maybe they'll emerge as AOP patterns, as the AOP dictionary seems quite established by now.
Curiously enough, Kiczales talks about virtual machines as a metaprogramming concept, while I see them as valuable AOD metaphors (for some problems - it's not a universal metaphor). I think that the main distinction is that he's talking about building a virtual machine at the extra-linguistic level, while I'm talking about building a virtual machine (so to speak) at the programming language level. I'll soon elaborate this concepts in a few posts.
Labels: AOP, article reference, design
Tuesday, April 24, 2007
GUI [Anti]Patterns and Safety-Critical Systems
Testing and debugging the DLL involved running both clients, which under Visual Studio is not immediate: you can run one (the startup project) with the usual start button, but you have to manually select the other project in solution explorer, right-click, choose "debug" and then "start new instance". No big deal, except that I kept clicking on the wrong menu item ("Add" instead of "Debug"). I had to pay a lot of attention to avoid this mistake, but anyway, in a couple of hours I made something like 10 mistakes or so (to the general amusement :-).
As I said, no big deal, because Visual Studio is hardly a safety-critical application. However, in the past few days, I've been exchanging a few ideas on how to improve the GUI of a safety-critical application, and that episode came back to my mind. Why was I choosing the wrong item? Was there anything to be learned? There are at least two reasonable explanations:
1) I'm rather dumb, and I must be kept away from safety-critical applications.
2) Something in the design of that pop-up menu was misleading.
Now, (1) is a serious possibility :-). To quote Edsger Dijkstra in Programming Considered as a Human Activity, "I have only a very small head and must live with it". But (2) is also a possibility worth exploring.
So, here is the suspect menu, in its compact and expanded form:


Now, I'm not the first to think about GUI and safety-critical systems. There is quite some literature about it, including some (small) pattern language. An interesting paper is Patterns for Designing Safety-Critical Interactive Systems by Mahemoff and Hussey (I'm providing a link to citeseer cache, as the original link seems to be broken).
Within that paper, the pattern that seems most relevant for the situation at hand is "Intended Action":
Problem: How can we enhance assurance that a user's action matches their intention?
Just what we need! Unfortunately the solution part is not that helpful:
Solution: Arrange the user interface so that affordances are provided which reduce the likelihood that an error will occur when the user executes a task.
The examples provided are quite good (especially if you're familiar with Donald Norman's works), but they do not immediately apply to a simple pop-up menu.
However, given a pattern we can easily form an anti-pattern. In this case, the anti-pattern would be:
Arrange the user interface so that affordances are provided which increase the likelihood that an error will occur when the user executes a task.
Is there anything like that in the menu above? Well, sure there is. Note how many items have a distinct visual clue on the left, relating the item to a familiar (and distinct) icon in the menu bar. This is a good application of the pattern above.
Also, notice how both Add and Debug provide an identical visual clue on the right, to indicate that they can be expanded. My small head was obviously confused by the two identical clues, and I was clicking on the wrong item (probably because I was visually scanning for a picture, not for text; that's the "rather dumb" part, I suppose :-). This is an instance of the antipattern.
Interestingly, the icons on the left and on the right seem to have the same purpose: they indicate what it's going to happen if you click on the item. The icon on the left represents the action that will be started. The icon on the right indicates that a sub-menu will be opened.
Note, however, that at a conceptual level they serve a remarkably different purpose: the icon on the left is concerned with application-level functionality, the icon on the right is hinting at navigational-level functionality. That's why the icons on the left are different, while the icons on the right (if present) are identical (an arrow), and may lead to confusion (I want to stress the idea that it's no big deal for Visual Studio: I'm just using a real-world example to talk about issues that would be relevant in the design of a critical-safety GUI).
Now, can we fix that? Well, sure. A simple fix would be to keep the arrow on the right, but still add an icon to the left (e.g. the icon of the first item in the sub-menu, if any). This will give a clue on what's below, and I'd surely click on the entry that looks like the button I've just clicked on the toolbar.
An alternative idea would be to change the indication on the right (e.g. by placing the icon of the first sub-menu item right before or after the arrow), although this would conflict with the common Windows paradigm (which wasn't designed for safety-critical applications anyway).
Yet another could be changing the arrow to a somewhat less eye-catching shape. Probably, you can come up with a few more strategies. As usual, in real-world safety-critical systems some (well-designed) experiments would be needed to determine the magnitude of the problem and the efficacy of every candidate solution.
A final note: there are quite a few interesting point in that paper on safety-critical systems. You will recognize the centrality of contextual design, as safety-critical systems may have to compromise on other important aspects (e.g. user productivity). Bainbridge's ironies of automation are also extremely interesting, and you may want to spend a few minutes pondering on them :-).
Labels: article reference, design, HCI
Sunday, April 22, 2007
AOP and Layered Virtual Machines
Let's start looking at the issue from a different angle. Unless you're working on a small embedded system, chances are you're already running your application on a virtual machine. I don't mean something like the JVM or the .NET CLR (although that might be the case). The operating system itself is presenting you with a virtual machine.
In many cases (virtual memory, I/O virtualization, etc) the virtual machine is intercepting your code at a very small granularity, doing some pre/post processing or routing your call to some piece of code. In those cases, your code is totally oblivious (caller obliviousness). Being the OS a general-purpose virtual machine, we also have callee obliviousness (see also my post, Some notes on AOP for a little more on obliviousness).
Let's stay with this concept a little more. The OS is also offering you a set of services which must be explicitly invoked: they can't be hidden through mere virtualization. If I want to open a file and write something, I've to open the file and write something :-). The OS can abstract away a lot of concerns (the file system implementation, the disk driver implementation, etc) but this is not a "ility" that can be somehow injected - it's a functional concern.
Or maybe it is. Maybe it's just a matter of abstraction. Maybe you don't really want to open a file and write something. Maybe you just want to persist an object model. Lo and behold, persistence can be easily modeled as an hidden service, but in a different virtual machine: one that is not working at the OS level, but at the object model level (like the ubiquitous JVM).
In practice, we always have a hierarchy of virtual machines, at different layers and with different granularity. What we often miss is the ability to modify the behaviour of existing VMs (which was the idea behind the Meta Object Protocol concept) or to easily create new (application-level) VMs.
When you look at AOP from this perspective, what a set of pointcuts and advices often does is to create a specialized virtual machine at the application level. Recently, this perspective helped me get a better grip on the design heuristics I'm using in mixed OOP-AOP environments. I'm not completely satisfied, and I've no claims of universality, but it was worth sharing.
On a side note, I did some research to check if the idea had already been explored. I didn't find much, with one notable exception: Concerned about separation by Mili, Sahraoui, Lounis (FASE/ETAPS, 2006). Although the focus is different, the authors elaborate on the idea that some non-functional concerns at the application level can be seen as a functional concern at the virtual machine level. I like this idea quite a bit.
Labels: AOP, article reference, design
Saturday, April 14, 2007
More .NET / STA madness...
I was just writing a piece of code to coordinate two cooperating processes under Windows, and that involved signaling a kernel object and waiting on another.
Now, the Win32 API has a nice function to do just that: SignalObjectAndWait. I was writing the code in C#, but hey, the WaitHandle happens to have a similar function (SignalAndWait). A WaitHandle isn't much more than a wrapper over kernel objects, so that wasn't really surprising: that function ought to be just a wrapper over the API.
What was surprising was the exception I got trying to use it: apparently, you cannot call that function from an STA thread. By the way, my thread happened to be in an STA just because of code Visual Studio itself had generated. I'm not using COM and I shouldn't be bothered with this stuff.
Again, COM is raising its ugly head behind all the .NET stuff. Again, some framework implementer thought it was wise to protect some unaware user from getting his message pump stuck. Which is kind of ridiculous, as you can just as easily get your message pump stuck by decomposing the forbidden call in a signal and a wait. Oh well...
Now, the Win32 API has a nice function to do just that: SignalObjectAndWait. I was writing the code in C#, but hey, the WaitHandle happens to have a similar function (SignalAndWait). A WaitHandle isn't much more than a wrapper over kernel objects, so that wasn't really surprising: that function ought to be just a wrapper over the API.
What was surprising was the exception I got trying to use it: apparently, you cannot call that function from an STA thread. By the way, my thread happened to be in an STA just because of code Visual Studio itself had generated. I'm not using COM and I shouldn't be bothered with this stuff.
Again, COM is raising its ugly head behind all the .NET stuff. Again, some framework implementer thought it was wise to protect some unaware user from getting his message pump stuck. Which is kind of ridiculous, as you can just as easily get your message pump stuck by decomposing the forbidden call in a signal and a wait. Oh well...
Wednesday, April 11, 2007
Design for Outsourcing
What is often missing in most debates is context: quite often, there is some truth in both sides, that is, given the proper context, a given approach might be better suited. It's usually the (faulty) assumption that some approach can always be successfully adopted that makes so many debates futile.
Design should be therefore discussed in-context. Context includes the technological issues, the market issues, the organizational issues, the human issues, and so on. For instance, a recurring problem among many companies is:
- they have more work to do that they can possibly do.
- they are reluctant to hire new developers; even if they do, they believe it will take a significant time to get the new hires up to speed.
- they are reluctant to outsource some developments. The usual complaint is that just explaining the problem, following progress, training some external personnel on the business issues is more effort than just doing the damn thing.
We all know how it ends:
- considerable friction between management and (disgruntled) developers.
- delayed or canceled projects, possibly some lost market opportunity.
- little or nothing is learnt from the experience, so next time it's the same game all around.
Now, this is not a technical issue. It's an organizational issue, and as such, it can't be completely solved at the technical level. However, it's a relatively common context, and as designers, we should take this into serious consideration.
In an old post, I discussed how to use quadrants to divide activities, to find some tasks better suited to offshoring (in a particular context, where the offshore team didn't have much domain expertise). The basic idea was that some tasks (technology oriented, stable requirements) where better suited to offshoring.
Of course, offshoring and outsourcing are quite different matters. However, there are many similarities that might be worth exploring. Indeed, for sake of brevity, that post didn't mention two important issues:
- technology Vs. domain is just a simplified view of strongly - loose coupled.
- tasks are a consequence of design; we can change the design to move some critical mass of tasks into a different quadrant.
Let's review the two concepts:
- In that particular context, what was missing offshore was domain knowledge, not programming knowledge. Transferring domain knowledge would have bogged us down for quite some time. However, this is just a (real-world) example. In other cases, transferring knowledge about a huge database schema would slow you down. Or about a particular middleware you're using. Or about a specific framework (which is why, by the way, I don't like invasive frameworks), And so on. Knowledge must be transferred because it is not isolated (not enough information hiding, not enough separation of concerns, and so on) or because it has not yet been encoded into an executable form (that is, it's just in your head, but not yet in code). The tasks better suited to outsourcing (or to offshoring) are obviously those with the minimum coupling, therefore with the minimum need for knowledge transfer.
- Tasks are a consequence of design. Design is under our control: we decide where the effort must go (extendability, reusability, or... outsourceability :-). We can change the structure, the approach, even twist some requirements (c'mon: we always do) to increase the number of loosely-coupled components. Of course, this is not going to be free: you'll have to compromise elsewhere (performances, observability, etc), but as I said, design is contextual, we don't make choices in a void (this, again, is why I don't like frameworks that have made too many choices for me).
Note that I'm talking about outsourcing tasks (which most likely, translates into outsourcing components) as opposed to outsourcing applications. There would be a lot to say about this, under the perspective of risk management, but I'll save that for another post. Suffice to say that there are certainly applications that can't be economically outsourced, but which have significant components that can be economically outsourced.
Bottom line: we can gradually break the loop above, by designing software in a different way, so as to move more tasks into the outsourcing sector (loose-coupled, stable requirements). This requires, in my opinion, some degree of up-front design. Of course, I'm aware that up-front design has been given a rough time by more than a few agilist. But it's also quite obvious that most of the critics made an implicit assumption upfront = big, extensibility and reuse oriented. Which is quite a narrow view of design.
So, next time you find yourself saying "nobody can help us with this", try a different angle: can we change the structure so that somebody can help us on this? This may be all you need to get out of an otherwise deadly self-fulfilling expectation.
P.S.
for the paranoids :-) out there: no, I'm not in the outsourcing business :-)). And yes, I've helped quite a few people to reshape their design to make outsourcing easier :-).
Labels: design, project management
Tuesday, April 10, 2007
COM, Apartments, and Windows Presentation Foundation
As it turned out, however, COM made its way into Windows Presentation Foundation, and guess what, in its worst form.
One of the ugliest concepts in COM is the threading model. A component must declare a threading model (STA, Apartment, Free, Both, Neutral). Each thread must also initialize the COM engine by calling either CoInitialize (mostly legacy stuff) or CoInitializeEx, specifying the threading model it's gonna use.
When calls are made among uncompatible threading models, some proxy stuff kicks in , and the call is executed in a different thread (if you ever wondered why your single-threaded application, using COM, happens to create half a dozen threads under your shoulders, that's why).
So, for instance, in the following trivial case:

if Main calls CoInitializeEx specifying Apartment Threading, any direct call from Main to C1 will be executed in a different thread (yes, free threaded components cannot be called directly by apartment-threaded threads; you need "both" or "neutral" for that). Any direct call from Main to C2 will be executed in the same thread. If you change the parameter in CoInitializeEx to free threaded, of course, the opposite will happen: any call to C2 will be executed in a different thread. Whatever you do, any call from C1 to C2 will execute in a different thread, as the two threading models are not compatible. You can easily experiment with a little C++/ATL code, by simply logging the thread ID inside function calls.
I won't go in further details about the subtle differences between "free", "both", and "neutral": this stuff is well-documented, although too many developers never had the time or the will to learn it with the necessary depth. Luckily, we might say, this is old stuff, .NET has come to age and we can finally forget about apartments.
Turns out things are not that simple. Many companies have old, unmanaged C++ applications, and want to extend them with .NET components. This works fine, especially when you use C++/CLI as the glue language. Windows Forms components are (as far as I've seen) threading-model agnostic: indeed, you can take a Windows Form application, change the threading model attribute from STA to MTA, and (as far as I've seen) everything works fine. Not so with Windows Presentation Foundation. If you try to use WPF from an MTA thread, you get an error message, stating that the only supported threading model is STA. That's absurd to say the least, but I guess it's what you get when you involve an IUnknown worshipper (and proudly so) in the design of some new technology. Dear Microsoft, any chances you're gonna fix this?





