Tuesday, May 13, 2008 

Natural language

Some (most :-) of my clients are challenging. Sometimes the challenge comes from the difficult technical problems they face. That's the best kind of challenge.
Sometimes the challenge comes from people: that's the worst kind of challenge, and one that right now is better left alone.
Sometimes the challenge comes from the organization, which means it also comes from people, but with a different twist. Challenges coming from the organization are always tough, but overcoming those challenges can really make a difference.

One of my challenging clients is a rather large company in the financial domain. They are definitely old-school, and although upper management can perfectly see how software is permeating and enabling their business, middle management tend to see software as a liability. In their eternal search for lower costs, they moved most of the development offshore, keeping only an handful of designers and all the analysts in-house. Most often, design is done offshore as well, for lack of available designers on this side of the world.

Analysts have a tough job there. On one side, they have to face the rest of the company, which is not software-friendly. On the other side, they have to communicate clear requirements to the offshore team, especially to the designers, who tend to be very technology-oriented.
To make things more complicated, the analysts often find themselves working on unfamiliar sub-domains, with precise regulations but also with large gray areas that must be somehow understood and communicated.
Icing on the cake: some of those financial instruments do not even exist in the local culture of the offshore team, making communication as difficult as ever.

Given this overall picture, I've often recommended analysts to spend some time creating a good domain model (usually, a UML class diagram, occasionally complemented by some activity diagrams).
The model, with unambiguous associations, dependencies, multiplicities, and so on, will force them to ask the right questions, and will make it easier for the offshore designer to acquaint himself with the problem. Over time, this suggestion has been quite helpful.
However, as I said, the organization is challenging. Some of the analysts complained that their boss is not satisfied by a few diagrams. He wants a lengthy, wordy explanation, so he can read it over and see if they got it right (well, that's his theory anyway). The poor analyst can't possibly do everything in the allotted time.

Now, I always keep an eye on software engineering research. I've seen countless attempts to create UML diagrams from natural language specifications. The results are usually unimpressive.
In this case, however, I would need exactly the opposite: a tool to generate a precise, yet verbose domain description out of a formal domain model. The problem is much easier to solve, especially because analysts can help the tool, by using the appropriate wording.

Guess what, the problem must be considered unworthy, because there is a dearth of works in that area. In practice, the only relevant paper I've been able to find is Generating Natural Language specifications from UML class diagrams by Farid Meziane, Nikos Athanasakis and Sophia Ananiadou. There is also Nikos' thesis online, with a few more details.
The downside is that (as usual) the tool they describe does not seem to be generally available. I've yet to contact the authors: I just hope it doesn't turn out to be one of those Re$earch Tool$ that never get to be used.

From the paper above, I've also learnt about ModelExplainer , a similar tool from a commercial company. Again, the tool doesn't seem to be generally available, but I'll get in touch with people there and see.

Overall, the problem doesn't seem so hard, especially if we accept the idea that the analyst will help the tool, choosing appropriate wording. An XMI-to-NL (Natural Language) would make for a perfect open source project. Any takers? :-)

Labels: , , , ,

As far as I know, the necessity to distill a (almost) natural language description from a class model was born quite early in OO technology: a seminal idea is already present in OOSC 2nd Ed. by (my beloved :)) B.Meyer.

Documenting the static picture of a OO system (i.e. simply the class structure) should be relatively easy (even I can think about a couple of trivial algorithms); at the very least, in order to perform the task it should be possible to build on the well-known entity-relationships modelling theory.

Anyway, my trivial implementation:

- given a class structure

1) for any class in the model, list its dependencies on other classes.
Of course, at this stage the natural language description is quite vague: "this class is related to..."

1.1) any dependency can be at the very least described as inheritance or aggregation (in that case, we can get from the UML model also the arity of the relationship).
At this stage, we can refine a bit our verbose description: "this class ACTS_AS..." (sorry, I use to be suspicious of the classic "IS_A"), "this class OWNS..." (if course, interpretation of a reference as owning can be misleading, but we can be more confident if we take into account the arity and the acyclic (rather than cyclic) structure among a group of classes)

2) as a further stage, I suppose we can enrich our description of the system by analysing groups of related classes: we can isolate cyclic structures or highlighting deep or recursive dependencies (I am thinking about your notes about the granularity of reusable groups of classic from you old series on SysOOD published on CP)

3) maybe (I am not really confident about this particular point) it is worth of investigating whether some class-clusters as defined above exhibit similar topological structures: this fact could point out intriguing analogies (i.e.: similar patterns which could reflect some properties of the underlying model)

Of course, generating a human-readable text from such point looks viable: it won't look like Proust, but can be useful in describing a model to someone without showing him boxes and arrows!

Guido Marongiu
E un bel diagramma a classi per esplicitare come realizzare "XMI-to-NL" ?

E se non fosse un progetto open source, visto che poi il tool serve soprattutto ai boss / manager ?

Romano: that's a subtle, but very important point. The tool is not not for the managers. They want the report, not the tool. The tool is to help the analyst, which unfortunately has no budget.

The "open source" part is mostly because it's something that can be built piecemeal, with new ideas being added and tried out by a community of people. Requirements are also "immediately obvious" to an experienced software developer, which doesn't hurt. I don't think that each and every project a good candidate for OS. This one seems to be.

Paradoxically enough (or maybe not) a class diagram wouldn't be the best way to start thinking about this project. This is a translator, and as such, grammars, production rules, rewriting rules, abstract interpretations and so on would probably be a better approach to get started.
Guido: in most cases, I wouldn't let the tool "guess" the OWNS relationship. I would help the tool through stereotypes, role names, relationship names.

In most cases, a very bare-bone approach could be enough. It is true, however, that for extensive models we should also organize the document based on some kind of clustered graph of classes. I would leave that for version 2 :-).
Quick update:

- I tried to get in touch with the company behind ModelExplainer, but their email consistently failed with a "user unknown" error.

- I tried to contact the author of the paper and thesis I mentioned. No failure, but no answer either.

Maybe linking the Re$earch Tool$ post was kinda prophetic... :-)
Hi ,

The software tool is not available. It written for academic purposes

Post a Comment

<< Home