Tuesday, December 15, 2009 

A little more on DSM and Gravity

In a recent paper ("The Golden Age of Software Architecture" Revisited, IEEE Software, July/August 2009), Paul Clements and Mary Shaw conclude by talking about Conformance Checking. Indeed, although many would say that the real design/architecture is represented by code, a few :-) of us still think that code should reflect design, and that conformance of code to design should be automatically checked when possible (not necessarily in any given project; not all projects are equal).
Conformance checking is not always simple; quoting Clements and Shaw: "Many architectural patterns, fundamental to the system’s design taken forward into code, are undetectable once programmed. Layers, for instance, usually compile right out of existence."

The good news is that layers can be easily encoded in a DSM. While doing so, I would use an extension of the traditional yes/no DSM, as I've anticipated in a comment to the previous post. While the traditional DSM is basically binary (yes/no), in many cases we are better off with a ternary DSM. That way, we can encode three different decisions:
Yes-now: there is a dependency, and it's here, right now.
Not-now: there is no dependency right now, but it wouldn't be wrong to have one.
Never: adding this dependency would violate a fundamental design rule.

A strong layered system requires some kind of isolation between layers. Remember gravity: new things are naturally attracted to existing things.
Attraction is stronger in the direction of simplicity and lack of effort: if no effort is required to violate architectural integrity, sooner or later it will be violated. Sure, conformance checking may help, but it would be better to set up the gravitational field so that things are naturally attracted to the right place.

The real issue, therefore, is the granularity of the DSM for a layered system. Given the fractal nature of software, a DSM can be applied at any granularity level: between functions, classes, "logical" components, "physical" components. Unless your system is quite small, you probably want to apply the DSM at the component level, which also means your layers should appear at the component level.

Note the distinction between logical and physical component. If you're working in a modern language/environment (like .NET or Java), creating a physical component is just a snap. Older languages, like C++, never got the idea of component into the standard, for a number of reasons; in fact, today this is one of the most limiting factors when working on large C++ system. In that case, I've often seen designer/programmers creating "logical" components out of namespaces and discipline. I've done that myself too, and it kinda works.

Here is the catch: binary separation between physical components is stronger than the logical separation granted from using different namespaces, which in turn is stronger than the separation between two classes in the same namespace, which is much stronger than the separation between two members of the same class.
More exactly, as we'll see in a forthcoming post, a binary component may act as a better shield and provide stronger isolation.

If a binary component A uses binary component B, and B uses binary component C, but does not reveal so in its interface (that is, public/protected members of public classes in B do not mention types defined in C) A knows precious nothing about C.
Using C from A requires that you discover C existence, then the existence of some useful class inside C. Most likely, to do so, you have to look inside B. At that point, adding a new service inside B might just be more convenient. This is especially true if your environment does not provide you with free indirect references (that is, importing B does not inject a reference to C "for free").
Here is again the interplay between good software design and properly designed languages: a better understanding of software forces could eventually help to design better languages as well, where violating a design rule should be harder than following the rule.

Now, if A and B are logical components (inside a larger, physical component D), then B won't usually act as a shield, mostly because the real (physical) dependency will be placed between D and C, not between B and D. Whatever B can access, A can access as well, without any additional effort. The gravitational field of B is weaker, and some code might be attracted to A, which is not what the designer wanted.

Therefore, inasmuch as your language allows you to, a physical component is always the preferred way to truly isolate one system from another.

OK, this was quite simple :-). Next time, I'll go back to the concept of frequency and then move to isolation!

Labels: , , , ,

Friday, April 04, 2008 

Asymmetry

I'm working on an interesting project, trying to squeeze all the available information from sampled data and make that information useful for non-technical users. I can't provide details, but in the end it boils down to reading a log file from a device (amounting to about 1 hour of sampled data from multiple channels), do the usual statistics, noise filtering, whatever :-), calculate some pretty useful stuff, and create a report that makes all that accessible to a business expert.

The log file is (guess what :-) in XML format, meaning it's really huge. However, thanks to modern technology, we just generated a bunch of classes from the XSD and let .NET do the rest. Parsing is actually pretty fast, and took basically no time to write.
In the end, we just get a huge collection of SamplingPoint objects. Each Sampling point is basically a structure-like class, synthesized from the XSD:

class SamplingPoint
{
public DateTime Timestamp { // get, set }
public double V1 { // get, set }
// ...
public double Vn { // get, set }
}

each value (V1...Vn) is coming from a different channel and may have a different unit of measurement. They're synchronously sampled, so it made sense for whoever developed the data acquisition module to group them together and dump them together in a single SamplingPoint tag.

We extract many interesting facts from those data, but for each Vi (i=1...N) we also show some "traditional" statistics, like average, standard deviation and so on.
Reasoning about average and standard deviation is not for everyone: I usually consider an histogram of the distribution much easier to understand (and to compare with other histograms):



Here we see the distribution of V1 over time: for instance, V1 had a value between 8 and 9 for about 6% of the time. Histograms are easy to read, and users quickly asked to see histograms for each V1..Vn over time. Actually, since one of the Vj is monotonically increasing with time, they also asked to see the histogram of the remaining Vi against Vj too. So far, so good.

Now, sometimes I hate writing code :-). It usually happens when my language doesn't allow me to write beautiful code. Writing a function to calculate the histogram of (e.g.) V1 against time is trivial: you end up with a short piece of code taking an array of SamplingPoints and using the V1 and Timestamp properties to calculate the histogram. No big deal.

However, that function is not reusable, exactly because it's using V1 and Timestamp. You can deal with this in at least 3 unpleasant :-) ways:

1) you don't care: you just copy/paste the whole thing over and over. If N = 10, you get 19 almost-identical functions (10 for time, 9 for Vj).

2) you restructure your data before processing. Grouping all the sampled data at a given time in a single SamplingPoint structure makes a lot of sense from a producer point of view, but it's not very handy from a consumer point of view. Having a structure of arrays (of double) instead of an array of structures would make everything so much simpler.

3) you write an "accessor" interface and N "accessors" classes, one for each Vi. You write your algorithms using accessors. Passing the right accessors (e.g. for time and V1) will get you the right histogram.

All these options have some pros and cons. In the end, I guess most people would go with (2), because that brings us into the familiar realm of array-based algorithms.

However, stated like this, it seems more like a "data impedance" problem between two subsystems than a language problem. Why did I say it's the language fault? Because the language is forcing me to access data members with compile-time names, and does not (immediately) allow me to access data members using run-time names.

Don't get me wrong: I like static typing, and I like compiled languages. I know from experience that I tend to make little stupid mistakes, like typing the wrong variable name and stuff like that. Static typing and compiled languages catch most of those stupid mistakes, and that makes my life easier.

Still, the fact that I like something doesn't mean I want to use that thing all the time. I want to have options. Especially when those options would be so simple to provide.

In a heavily reflective environment like .NET, every class can be easily considered an associative container, from the property/data member names to property/data member values. So I shold be able to write (if I wanted):

SamplingPoint sp = ... ;
double d1 = sp[ "V1" ] ;

which should be equivalent to

double d1 = sp.V1 ;

Of course, that would make my histogram code instantly reusable: I'll just pass the run-time names of the two axes. You can consider this equivalent to built-in accessors.

Now, I could implement something like that on my own, using reflection. It's not really difficult: you just have to gracefully handle collections, nested objects, and so on. Unfortunately, C# (.NET) do not allow a nice implementation of the concept, mostly for a bunch or constraints they added to conversion operators: no generic conversion operators (unlike C++), no conversion to/from Object, and so on. In the end you may need a few more casts that you'd like to, but it can be done.

I'll also have to evaluate the performance implications for this kind of application, but I know it would make my life much easier in other applications (like binding smart widgets to a variety of classes, removing the need for braindead "controller" classes). It's just a pity that we don't have this as built-in language feature: it would be much easier to get this right (and efficient) at the language level, not at the library level (at least, given C# as it is now).

Which brings me to the concept of symmetry. A few months ago I stumbled upon a paper by Jim Coplien and Zhao Liping (Understanding Symmetry in Object-Oriented Languages, published in Journal of Object Technology, an interesting, free publications that's filling the void left by the demise of JOOP). Given my interest on the concept of form in software, the paper caught my attention, but I postponed further thinking till I could read more on the subject (there are several papers on symmetry in Cope's bibliography, but I need a little more time than I have). A week ago or so, I've also found (and read) another paper from Zhao in Communications of ACM, March 2008: Patterns, Symmetry, and Symmetry breaking.

Some of their concepts sound odd to me. The concept of symmetry is just fine, and I think it may help to unravel some issues in language design.
However, right now the idea that patterns are a way to break symmetry doesn't feel so good. I would say exactly the opposite, but I really have to read their more mathematically-inclined papers before I say more, because words can be misleading, while theorems usually are not :-).

Still, the inability to have built-in, natural access to fields and properties through run-time names struck me as a lack of symmetry in the language. In this sense, the Accessor would simply be a way to deal with that lack of symmetry. Therefore it seems to me that patterns are a way to bring back symmetry, not to break symmetry. In fact, I can think of many cases where patterns "expose" some semantic symmetry that was not accessible because of (merely) syntactic asymmetry.

More on this as I dig deeper :-).

Labels: , , , ,

Sunday, May 27, 2007 

Metaprogramming, OOP, AOP (Part 2)

The Object Oriented approach is certainly familiar to most of my readers - more than a few are basically born with it. OOP is about allocating behaviour to classes, and connecting classes into more structured collaborations. We could largely say that when we design the OO way, we're building a type system where some high-level task can be "naturally" carried out. When confronted with designing a solution for string localization, the most natural reaction is probably to reuse the existing hierarchy/lattice of GUI components (assuming one already exists, which makes a lot of sense).

Assuming we're still free to modify the existing library classes (I'll get back to this later), we could easily approach the problem as follows:
- identify a root class, e.g. Control, and add an abstract Localize responsibility.
- identify a derived class (let's call it ContainerControl), that is, a control which is mainly used as a container for other controls. Here Localize could provide a default implementation, which is basically to iterate over all the inner controls and forward the Localize call.
- make sure Localize is called at the appropriate time, e.g. before the control is drawn, or when the current language changes. There are several performance issues here, quite interesting in practice, but largely irrelevant to the discussion.
- Localize must have access to some kind of Dictionary class which keeps track of translations for different languages. For sake of simplicity, we could assume Dictionary to be a singleton, so that we don't have to worry about passing parameters around.

In a sense, that could pretty much be it. Each concrete control (e.g. Label, Button, ListView, etc) will implement its own localization logic. Label and Button will get their own text from the dictionary; ListView may forward the call to each ColumnHeader (to keep a design similar to the .NET version). And so on.

There are obviously a few issues with this simple (almost simplistic) design:
- there is no much reuse between controls; Label and Button will implement Localize basically in the same way, in two different places. We could fix this by implementing the common part in Control, but still we're not pushing reuse much further.
- controls have no idea of their context (that is, the meaningful nesting of ContainerControls from their parent up to the main window) during localization. This could be useful to share localization strings and/or to localize strings while considering context. However, this can be fixed in various ways, e.g. by pushing contextual information from a ContainerControl to contained controls, in the Localize call, or by walking up to the parent, and so on.
- there is no separation of concerns. Actually, localization has just become a cross-cutting concern, which is being implemented in many different classes (although all of them belong to the same hierarchy).
- (this is by far the worst problem) if you don't own the type system, and the owner didn't put in the Localize responsibility, you're out of luck, unless your language allows you to add polymorphic behaviour to an existing hierarchy.

Let's play with the last issue for a while. An unfortunate consequence of OO thinking is that it's always hard to consider a class "complete". For instance, your favorite String class may not have that regular-expression-based Split function you love so much. In most cases, you can put that function into another class, usually with a sad name like StringUtilities :-), and live with that. If your language allows extension methods a-la LINQ, you can add methods to existing classes without breaking the natural syntax for method invocation. However, you need true open classes if you want to add polymorphic behaviour to an existing hierarchy. Extension methods won't cut it. Mixin, or mixin layers, could be used to some advantage.

Having said that, is there an OO way to deal with all the problems above, while staying within the realm of a "restrictive" language, without open classes, and without having to change the library source code?
Turns out you can push some responsibility outside the control and inside a newborn class. In fact, if you look at the implementation of your average control, you might find that it has to keep the unlocalized text somewhere (e.g. in a string that you can also set at design time). That string is then passed to the dictionary as a key at run-time (possibly with some context information) to get a translation in the current language. Now, if you've internalized OO thinking, you'll recognize that the control is managing the strings. That's because the string doesn't know any better. Now, suppose you change all instances of string inside controls (all those which requires localization anyway) with a smarter string, let's say LocalizedString. A LocalizedString would store the unlocalized version but return the localized version when you read it. It will take care of Dictionary lookup, caching, watching for changes in the current language, and so on, all in a single place. No more duplications. Better separation of concerns. The programmer will declaratively say (by using string or LocalizedString) if he/she needs a language-independent or language-dependent string. That would be it.

Again, there are a few issues with this design:

- it's quite harder for the string to find out about its context. While controls can usually navigate to their parent (another Control), it would be quite hard for a string to navigate to whatever object is storing the string. While a Control may have a Localize method which takes parameter, our LocalizedString should be able to return the localized text through a parameterless read (ideally, a conversion operator, so that we can blissfully ignore the whole localization issue).
We could approach this problem from a few angles, and for a simple problem like string localization, we could even succeed. Unfortunately, the technique doesn't work so well for other problems, like persistence. You may resort to reflection, but I'll get to this later.
- just like before, if you don't own the code (or can afford to change it anyway) you're in big troubles: adopting this design is a pervasive change inside the whole Control hierarchy. What is worse, open classes won't help you a bit.

What we're witnessing is a major issue with OOP. Thinking in objects is like creating a foundation. If you get it wrong, you're done. If it's your own code, this might not be a big deal - refactoring is always possible (not necessarily cheap, but possible). But if you're dealing with third-party code, you often suffer from an impedance mismatch between their design and your needs. I do not consider open source as a cure - even if (e.g.) the .NET framework was open source, I would not venture in any massive change, because that's maintenance hell.

Given this major issue with OOP, I find it rather myopic that some features, like open classes and structural conformance to interfaces, are not more widely adopted by language designers. For languages like Java and C#, it's also rather myopic that you can't declaratively say that whenever an object of type T1 is requested, an object of type T2 (usually derived from T1) must be created instead. Not building these features in an OOP language means ignoring real problems, which could be solved without compromising type safety or efficiency. There are also real-world languages (like Objective C) which have shown how to deal with some of these issues in practice. Too bad we, as a community, don't seem to learn enough from the past :-)

Back to our problem: so far, we tried to allocate behaviour inside the existing principal decomposition, or to push it inside a finer-grained class (which should then be used pervasively inside the principal decomposition). Is there a third approach? I didn't play the interface card yet, but that seems quite at odd with the problem. Coming up with a useful ILocalizable (bleah :-) interface, which does not require pervasive, cross-cutting changes inside the principal decomposition to be implemented, seems quite hard to say the least.

There is, of course, the reflective/introspective card yet to play. However, this brings us so close to the AOP way of thinking, that I'll discuss this option while looking at my toy problem from the AOP perspective.

Labels: , ,