Tuesday, May 13, 2008

 

Natural language

Some (most :-) of my clients are challenging. Sometimes the challenge comes from the difficult technical problems they face. That's the best kind of challenge.
Sometimes the challenge comes from people: that's the worst kind of challenge, and one that right now is better left alone.
Sometimes the challenge comes from the organization, which means it also comes from people, but with a different twist. Challenges coming from the organization are always tough, but overcoming those challenges can really make a difference.

One of my challenging clients is a rather large company in the financial domain. They are definitely old-school, and although upper management can perfectly see how software is permeating and enabling their business, middle management tend to see software as a liability. In their eternal search for lower costs, they moved most of the development offshore, keeping only an handful of designers and all the analysts in-house. Most often, design is done offshore as well, for lack of available designers on this side of the world.

Analysts have a tough job there. On one side, they have to face the rest of the company, which is not software-friendly. On the other side, they have to communicate clear requirements to the offshore team, especially to the designers, who tend to be very technology-oriented.
To make things more complicated, the analysts often find themselves working on unfamiliar sub-domains, with precise regulations but also with large gray areas that must be somehow understood and communicated.
Icing on the cake: some of those financial instruments do not even exist in the local culture of the offshore team, making communication as difficult as ever.

Given this overall picture, I've often recommended analysts to spend some time creating a good domain model (usually, a UML class diagram, occasionally complemented by some activity diagrams).
The model, with unambiguous associations, dependencies, multiplicities, and so on, will force them to ask the right questions, and will make it easier for the offshore designer to acquaint himself with the problem. Over time, this suggestion has been quite helpful.
However, as I said, the organization is challenging. Some of the analysts complained that their boss is not satisfied by a few diagrams. He wants a lengthy, wordy explanation, so he can read it over and see if they got it right (well, that's his theory anyway). The poor analyst can't possibly do everything in the allotted time.

Now, I always keep an eye on software engineering research. I've seen countless attempts to create UML diagrams from natural language specifications. The results are usually unimpressive.
In this case, however, I would need exactly the opposite: a tool to generate a precise, yet verbose domain description out of a formal domain model. The problem is much easier to solve, especially because analysts can help the tool, by using the appropriate wording.

Guess what, the problem must be considered unworthy, because there is a dearth of works in that area. In practice, the only relevant paper I've been able to find is Generating Natural Language specifications from UML class diagrams by Farid Meziane, Nikos Athanasakis and Sophia Ananiadou. There is also Nikos' thesis online, with a few more details.
The downside is that (as usual) the tool they describe does not seem to be generally available. I've yet to contact the authors: I just hope it doesn't turn out to be one of those Re$earch Tool$ that never get to be used.

From the paper above, I've also learnt about ModelExplainer , a similar tool from a commercial company. Again, the tool doesn't seem to be generally available, but I'll get in touch with people there and see.

Overall, the problem doesn't seem so hard, especially if we accept the idea that the analyst will help the tool, choosing appropriate wording. An XMI-to-NL (Natural Language) would make for a perfect open source project. Any takers? :-)

Labels: , , , ,


Friday, April 25, 2008

 

Can AOP inform OOP (toward SOA, too? :-) [part 2]

Aspect-oriented programming is still largely code-centric. This is not surprising, as OOP went through the same process: early emphasis was on coding, and it took quite a few years before OOD was ready for prime time. The truth about OOA is that it never really got its share (guess use cases just killed it).

This is not to say that nobody is thinking about the so-called early aspects. A notable work is the Theme Approach (there is also a good book about Theme). Please stay away from the depressing idea that use cases are aspects; as I said a long time ago, it's just too lame.

My personal view on early aspects is quite simple: right now, I mostly look for cross-cutting business rules as candidate aspects. I guess it's quite obvious that the whole "friend gift" concept is a business rule cutting through User and Subscription, and therefore a candidate aspect. Although I'm not saying that all early aspects are cross-cutting business rules (or vice-versa), so far this simple guideline has served me well in a number of cases.

It is interesting to see how early aspects tend to be cross-cutting (that is, they hook into more than one class) but not pervasive. An example of pervasive concern is the ubiquitous logging. Early aspects tend to cut through a few selected classes, and tend to be non-reusable (while a logging aspect can be made highly reusable).
This seems at odd with the idea that "AOP is not for singleton", but I've already expressed my doubt on the validity of this suggestion a long time ago. It seems to me that AOP is still in its infancy when it comes to good principles.

Which brings me to obliviousness. Obliviousness is an interesting concept, but just as it happened with inheritance in the early OOP days, people tend to get carried over.
Remember when white-box inheritance was applied without understanding (for instance) the fragile base class problem?
People may view inheritance as a way to "patch" a base class and change its behaviour in unexpected ways. But truth is, a base class must be designed to be extended, and extension can take place only through well-defined extensions hot-points. It is not a rare occurrence to refactor a base class to make extension safer.

Aspects are not really different. People may view aspects as a way to "patch" existing code and change its behaviour in unexpected ways. But truth is, when you move outside the safe realm of spectators (see my post above for more), your code needs to be designed for interception.
Consider, for instance, the initial idea of patching the User class through aspects, adding a data member, and adding a corresponding data into the database. Can your persistence logic be patched through an aspect? Well, it depends!
Existing AOP languages can't advise any given line: there is a fixed grammar for pointcuts, like method call, data member access, and so on. So if your persistence code was (trivially)

class User
{
void Save()
{
// open a transaction
// do your SQL stuff
// close the transaction
}
}
there would be no way to participate to the transaction from an aspect. You would have to refactor your code, e.g. by moving the SQL part in a separate method, taking the transaction as a parameter. Can we still call this obliviousness? That's highly debatable! I may not know the details of the advice, but I damn sure know I'm being advised, as I refactored my code to be pointcut-friendly.

Is this really different from exposing a CreateSubscription event? Yeah, well, it's more code to write. But in many data-oriented applications, a well-administered dose of events in the CRUD methods can take you a long way toward a more flexible architecture.

A closing remark on the SOA part. SOA is still much of a buzzword, and many good [design] ideas are still in the decontextualized stage, where people are expected to blindly follow some rule without understanding the impact of what they're doing.

In my view, a crucial step toward SOA is modularity. Modularity has to take place at all levels, even (this will sound like an heresy to some) at the database level. Ideally (this is not a constraint to be forced, but a force to be considered) every service will own its own tables. No more huge SQL statements traversing every tidbit in the database.

Therefore, if you consider the "friend gift" as a separate service, it is only natural to avoid tangling the User class, the Subscription class, and the User table with information that just doesn't belong there. In a nutshell, separating a cross-cutting business rule into an aspect-like class will bring you to a more modular architecture, and modularity is one of the keys to true SOA.

Labels: , , ,


Thursday, April 24, 2008

 

Can AOP inform OOP (toward SOA, too? :-) [part 1]

Note: I began tinkering with this idea several months ago, inspired by some design work I was doing on a real-world project. Initially, I thought I could pimp up this stuff into a full-fledged article. Months came by and I didn't, so (in the spirit of my old concept of Blogging as Destructuring) I thought I could just as well say something here instead.

Consider a web site offering some kind of service to users. Users have to register (yeah :-), and they can then buy a subscription to several services. A simple class diagram can model this easily:



In most cases, the underlying database schema wouldn't be much different.

In a real case, there might be different kind of services, each requiring a derived class, but we can ignore this issue right now. Also, there might be several kind of subscription (time-based, pay-per-use, and so on), but again, let's ignore that issue right now, and just concentrate on time-base subscriptions.

A subscription can be renewed at any time. In the object model, this would translate into a Renew method in class Subscription. Renew could take a parameter, like the extension of the renewal. Most likely, it would add a new record to the Subscription table (to keep track of the whole subscription history for that user), and possibly create a new Subscription object. So far, so good.

Now the marketing guys come up with a nice idea: whenever you register, you can provide the email address of a friend who has already registered. If you do, everytime you renew a subscription your friend will get some kind of gift, like a free extension or whatever. This may generate a few more leads, meaning a little more business.

From a purely OO mindset, this may lead us to perform a little maintenance on the existing model. We can add a relationship from User to User to model the "friend" relationship, or we could just add a field like friendEmail (possibly left empty). At the database level, adding a field is probably easier.
We can also modify the Renew method to check for the presence of a friend in the subscribing user, and if so, invoke some Gift logic. I won't draw a modified diagram for this scenario: I guess it's just too obvious.

Now, this approach obviously works. However, you may recognize that the whole "friend gift" concept is a cross-cutting concern: it cuts through User (requiring new data) and through Subscription (requiring new logic). More on detecting cross-cutting concerns during analysis and design (and on the difference between cross-cutting and pervasive concerns) next time.

In the AOP world, we could approach the problem differently. We could define a FriendGift aspect. The aspect may add a new data member to the User class (the friendEmail), and intercept the persistence logic to save / load that data from the database. The aspect may also intercept Renew and perform the Gift logic if required.
Actually the aspect doesn't have to modify User (class and table); indeed, it would be better not to, as the persistence logic might not be easy to intercept (more on this next time). The aspect could just use a different database table to store the friend email.

Using an UML-like notation, we could model this approach as:



Interestingly, the database is probably different from the previous scenario: Friend would map to its own table.

What if we are not using an AOP-enabled language? For instance, we might be using plain old C#. Can we still borrow some ideas from the above? I believe so. In the same sense as OO thinking can inform traditional structured programming, AOP thinking can inform traditional object oriented programming.

If we don't have pointcuts, join points and aspects, we may still have events (in C#/.NET, or callback in other languages, and so on). Sure, we have to forego obliviousness (which might not be so bad: more on this next time). But we can come up with a more modular and decoupled design by reasoning along the AOP lines:



No rocket science here :-). Just a different form, same function. Different database, a more general Subscription class, unaffected by a business rule which may change at any time. Also the User class and table are left unaffected.

More on this, and a few comments on the SOA part, in just a few days (I hope :-).

Labels: ,


Tuesday, April 15, 2008

 

Old Problems, new Solutions

In a comment to my previous post, I mentioned an half-baked idea about a (truly) modern architecture for GUI applications.

In fact, I've been experimenting with different approaches to GUI construction for "classic" Windows applications, Web applications, Smart Clients calling web services, and even Web applications calling web services for quite a while now.

Some were dead ends. Others have been a successful intermediate step toward something better. I usually pass on the good ideas to my clients, and meanwhile, I keep trying different approaches on small scale, low-risk projects. Because the "traditional" way of building GUI applications sucks big time. So does the traditional thinking about "good" GUI constructions, that is, that you need a tightly integrated language / ide / library / operating system to do anything decent.

I also don't like the idea that I should buy-in a monolithic framework for GUI construction. Different problems are better solved in different ways: we should be free to mix and match different approaches in different forms (within the same application), or even within the single form, or component.

Over time, a few people told me not to bother: Microsoft, Sun, Apple (not to mention a thousand smart guys working for smaller companies) have been working at that for years. It's an old problem, and it's unlikely that anybody can come up with a radically different solution.

It's very poor thinking. There is no limit to human creativity. Take a look at How To Fold A T-Shirt In 2 Seconds. The problem is much older than GUI construction. Uncountable people have been faced with the same problem and they didn't come up with anything similar so far. Sure, there are pros and cons in this technique (as usual), but it's a radically different (and much faster) approach.

You may want to go through the explained version, as I did. I actually tried that a few times, and it works :-).

Labels: , ,


Sunday, April 13, 2008

 

Quote of the day past

Life is either a daring adventure or nothing (Helen Keller).
Good luck Ma, you've been strong and brave beyond my imagination.
Have fun!

Friday, April 04, 2008

 

Asymmetry

I'm working on an interesting project, trying to squeeze all the available information from sampled data and make that information useful for non-technical users. I can't provide details, but in the end it boils down to reading a log file from a device (amounting to about 1 hour of sampled data from multiple channels), do the usual statistics, noise filtering, whatever :-), calculate some pretty useful stuff, and create a report that makes all that accessible to a business expert.

The log file is (guess what :-) in XML format, meaning it's really huge. However, thanks to modern technology, we just generated a bunch of classes from the XSD and let .NET do the rest. Parsing is actually pretty fast, and took basically no time to write.
In the end, we just get a huge collection of SamplingPoint objects. Each Sampling point is basically a structure-like class, synthesized from the XSD:

class SamplingPoint
{
public DateTime Timestamp { // get, set }
public double V1 { // get, set }
// ...
public double Vn { // get, set }
}

each value (V1...Vn) is coming from a different channel and may have a different unit of measurement. They're synchronously sampled, so it made sense for whoever developed the data acquisition module to group them together and dump them together in a single SamplingPoint tag.

We extract many interesting facts from those data, but for each Vi (i=1...N) we also show some "traditional" statistics, like average, standard deviation and so on.
Reasoning about average and standard deviation is not for everyone: I usually consider an histogram of the distribution much easier to understand (and to compare with other histograms):



Here we see the distribution of V1 over time: for instance, V1 had a value between 8 and 9 for about 6% of the time. Histograms are easy to read, and users quickly asked to see histograms for each V1..Vn over time. Actually, since one of the Vj is monotonically increasing with time, they also asked to see the histogram of the remaining Vi against Vj too. So far, so good.

Now, sometimes I hate writing code :-). It usually happens when my language doesn't allow me to write beautiful code. Writing a function to calculate the histogram of (e.g.) V1 against time is trivial: you end up with a short piece of code taking an array of SamplingPoints and using the V1 and Timestamp properties to calculate the histogram. No big deal.

However, that function is not reusable, exactly because it's using V1 and Timestamp. You can deal with this in at least 3 unpleasant :-) ways:

1) you don't care: you just copy/paste the whole thing over and over. If N = 10, you get 19 almost-identical functions (10 for time, 9 for Vj).

2) you restructure your data before processing. Grouping all the sampled data at a given time in a single SamplingPoint structure makes a lot of sense from a producer point of view, but it's not very handy from a consumer point of view. Having a structure of arrays (of double) instead of an array of structures would make everything so much simpler.

3) you write an "accessor" interface and N "accessors" classes, one for each Vi. You write your algorithms using accessors. Passing the right accessors (e.g. for time and V1) will get you the right histogram.

All these options have some pros and cons. In the end, I guess most people would go with (2), because that brings us into the familiar realm of array-based algorithms.

However, stated like this, it seems more like a "data impedance" problem between two subsystems than a language problem. Why did I say it's the language fault? Because the language is forcing me to access data members with compile-time names, and does not (immediately) allow me to access data members using run-time names.

Don't get me wrong: I like static typing, and I like compiled languages. I know from experience that I tend to make little stupid mistakes, like typing the wrong variable name and stuff like that. Static typing and compiled languages catch most of those stupid mistakes, and that makes my life easier.

Still, the fact that I like something doesn't mean I want to use that thing all the time. I want to have options. Especially when those options would be so simple to provide.

In a heavily reflective environment like .NET, every class can be easily considered an associative container, from the property/data member names to property/data member values. So I shold be able to write (if I wanted):

SamplingPoint sp = ... ;
double d1 = sp[ "V1" ] ;

which should be equivalent to

double d1 = sp.V1 ;

Of course, that would make my histogram code instantly reusable: I'll just pass the run-time names of the two axes. You can consider this equivalent to built-in accessors.

Now, I could implement something like that on my own, using reflection. It's not really difficult: you just have to gracefully handle collections, nested objects, and so on. Unfortunately, C# (.NET) do not allow a nice implementation of the concept, mostly for a bunch or constraints they added to conversion operators: no generic conversion operators (unlike C++), no conversion to/from Object, and so on. In the end you may need a few more casts that you'd like to, but it can be done.

I'll also have to evaluate the performance implications for this kind of application, but I know it would make my life much easier in other applications (like binding smart widgets to a variety of classes, removing the need for braindead "controller" classes). It's just a pity that we don't have this as built-in language feature: it would be much easier to get this right (and efficient) at the language level, not at the library level (at least, given C# as it is now).

Which brings me to the concept of symmetry. A few months ago I stumbled upon a paper by Jim Coplien and Zhao Liping (Understanding Symmetry in Object-Oriented Languages, published in Journal of Object Technology, an interesting, free publications that's filling the void left by the demise of JOOP). Given my interest on the concept of form in software, the paper caught my attention, but I postponed further thinking till I could read more on the subject (there are several papers on symmetry in Cope's bibliography, but I need a little more time than I have). A week ago or so, I've also found (and read) another paper from Zhao in Communications of ACM, March 2008: Patterns, Symmetry, and Symmetry breaking.

Some of their concepts sound odd to me. The concept of symmetry is just fine, and I think it may help to unravel some issues in language design.
However, right now the idea that patterns are a way to break symmetry doesn't feel so good. I would say exactly the opposite, but I really have to read their more mathematically-inclined papers before I say more, because words can be misleading, while theorems usually are not :-).

Still, the inability to have built-in, natural access to fields and properties through run-time names struck me as a lack of symmetry in the language. In this sense, the Accessor would simply be a way to deal with that lack of symmetry. Therefore it seems to me that patterns are a way to bring back symmetry, not to break symmetry. In fact, I can think of many cases where patterns "expose" some semantic symmetry that was not accessible because of (merely) syntactic asymmetry.

More on this as I dig deeper :-).

Labels: , , , ,


Thursday, March 27, 2008

 

Zen and the Art of Fixing

A little known fact about me is that I like fixing things. Mostly TV sets, but over the years I've tried to repair (with various degrees of success :-) any electrical, electronic or mechanical device that needed some fixing. Indeed, when I was about 14 I made a small business out of rescuing old vacuum-tube radios, fixing and then selling them to collectors.
Even today, I occasionally talk about diagnosing and repairing a defective TV when teaching about modular design, modular reasoning, design for testability, and so on.

In fact, fixing a device has a lot in common with fixing a buggy program. Some devices (like old washing machines) are relatively simple. You just need to know some basic concepts, and apply some common sense. Sure, it may take some creativity to fix even the simplest device (especially if you can't find the right replacement parts), but overall, the problem is often self-evident, and you "only" need to resort to your ability, often just manual ability.

Electronic devices, however, can fail in complex, even puzzling ways. You need a better understanding of what's going on under the hood. You may need special tools (you can't get much far repairing a TV if you ain't got an oscilloscope) but sometimes it's just plain old intuition (or sheer luck :-). You need to know some tricks of the trade (like using a light bulb to discriminate problems in the power supply vs. horizontal deflection), but overall, what you need more is rational system thinking.

The same is true when reasoning about complex software failures. We often have a large numbers of parts (hardware, firmware, drivers, OS, libraries, our own code) which can fail for a large number of reasons. Your best allied is rational system thinking. Your worst enemy is the all-so-common uncircumstantiated certainty that the fault must lie exactly somewhere (usually outside our own code :-). Overall, I would say my experience fixing stuff has made me a better troubleshooter, helping me to find obscure bugs in systems I knew very little about.

Still, I don't like to fix computers. I guess I'm already spending too much time around computers, and besides, today the integration scale is so high that you can hardly fix a broken motherboard. However, there is always an exception, and this one is interesting enough to be worth telling.

A couple of years ago (yeah :-) I rescued a notebook before it was thrown away. After a fall, it wouldn't even turn on. The CD-ROM was visibly damaged, but the screen was intact. I took it home and left it alone for a few months, till I had some time to kill.

I'm usually lucky, and in fact, I discovered that by pushing the power plug a little heavier than usual, the notebook would indeed power on. That's usually just a broken joint; it took forever :-) to disassemble the notebook and expose the PCB, but after that, it was quite easy to fix. Now the computer would turn on, start booting XP, and die a few seconds later. I suspected the HD was damaged too, and replaced it with a spare one (with W2K installed). Similar story: it would start booting the OS, then dump a blue screen reporting that “the boot device was not accessible”.

That's usually due to a broken IDE controller, or a faulty RAM chip. I tried replacing the RAM, but still got the same problem. However, the IDE controller was somehow working, as the computer would indeed start booting. Weird :-).

I didn't give up and got another HD, with good old MS-DOS installed. The notebook booted like a charm, and all the applications seemed to work. Weird again: the IDE controller seemed fine. Again, it could be a faulty RAM chip, since MS-DOS only uses the first 640KB.

I connected a USB CD-ROM and tried an old version of Knoppix on CD: knoppix uses all the available RAM, so that was like a definitive RAM test. It worked fine, but as soon as I tried using the HD, it would simply lock up. So, RAM was fine, the IDE controller was fine under MS-DOS but failed under other operating systems.
My diagnosis was that DMA was at fault. I tried to disable DMA at the BIOS level with no luck. I also disabled DMA in that W2K HD (using another PC, of course), but it would still lock, just like Knoppix.

At that point, I contemplated using an external HD (via USB), and perhaps installing a tweaked version of XP which can boot from a USB device (details on the BartPE page). I could even rewire one of the USB ports and install the whole thing internally, since the missing CD-ROM left a lot of space. But it looked like a damn lot of work :-), so I did nothing and left the notebook around for more than one year.

A few days ago, having a little time to kill again, I tried a different experiment. I got a 1GB USB stick, downloaded the latest version of Knoppix, and put it on the stick (instead of a CD-ROM). I followed this tutorial to speed things up.
It worked fine, and booting from the USB stick was very quick. However, at that point I noticed (from the boot log) that this newer version of Knoppix ran with DMA disabled (I guess the old one I tried before didn't). Time to test my DMA controller theory!

I put an HD inside, booted Knoppix from the USB stick, and guess what, no hang-up! I could access the HD with not problem whatsoever. At that point, it was hard to resist: I had to try installing knoppix on the HD itself. Again, I followed a tutorial to speed things up: this one is about a previous version, but still valid. The proof of the pudding is in the eating: I removed the USB stick, tried booting and Lo!, it worked. The world is still somehow deterministic :-).
By the way: Knoppix takes forever to boot from the HD. Overall, it would be better to keep the USB stick, maybe rewiring a USB port to keep everything inside.

Now, in a perfect world, I would have found a way to really "fix" the notebook, that is, to repair the DMA controller. I suspect it's just another loose joint from the fall. Most likely, some pin of some surface-mounted chip suffered some mechanical stress. This world, however, ain’t perfect, so I did nothing else, leaving it as a working Knoppix notebook. Fun is over, time to get back to work :-)

Labels: ,


This page is powered by Blogger. Isn't yours?