Friday, April 04, 2008 


I'm working on an interesting project, trying to squeeze all the available information from sampled data and make that information useful for non-technical users. I can't provide details, but in the end it boils down to reading a log file from a device (amounting to about 1 hour of sampled data from multiple channels), do the usual statistics, noise filtering, whatever :-), calculate some pretty useful stuff, and create a report that makes all that accessible to a business expert.

The log file is (guess what :-) in XML format, meaning it's really huge. However, thanks to modern technology, we just generated a bunch of classes from the XSD and let .NET do the rest. Parsing is actually pretty fast, and took basically no time to write.
In the end, we just get a huge collection of SamplingPoint objects. Each Sampling point is basically a structure-like class, synthesized from the XSD:

class SamplingPoint
public DateTime Timestamp { // get, set }
public double V1 { // get, set }
// ...
public double Vn { // get, set }

each value (V1...Vn) is coming from a different channel and may have a different unit of measurement. They're synchronously sampled, so it made sense for whoever developed the data acquisition module to group them together and dump them together in a single SamplingPoint tag.

We extract many interesting facts from those data, but for each Vi (i=1...N) we also show some "traditional" statistics, like average, standard deviation and so on.
Reasoning about average and standard deviation is not for everyone: I usually consider an histogram of the distribution much easier to understand (and to compare with other histograms):

Here we see the distribution of V1 over time: for instance, V1 had a value between 8 and 9 for about 6% of the time. Histograms are easy to read, and users quickly asked to see histograms for each V1..Vn over time. Actually, since one of the Vj is monotonically increasing with time, they also asked to see the histogram of the remaining Vi against Vj too. So far, so good.

Now, sometimes I hate writing code :-). It usually happens when my language doesn't allow me to write beautiful code. Writing a function to calculate the histogram of (e.g.) V1 against time is trivial: you end up with a short piece of code taking an array of SamplingPoints and using the V1 and Timestamp properties to calculate the histogram. No big deal.

However, that function is not reusable, exactly because it's using V1 and Timestamp. You can deal with this in at least 3 unpleasant :-) ways:

1) you don't care: you just copy/paste the whole thing over and over. If N = 10, you get 19 almost-identical functions (10 for time, 9 for Vj).

2) you restructure your data before processing. Grouping all the sampled data at a given time in a single SamplingPoint structure makes a lot of sense from a producer point of view, but it's not very handy from a consumer point of view. Having a structure of arrays (of double) instead of an array of structures would make everything so much simpler.

3) you write an "accessor" interface and N "accessors" classes, one for each Vi. You write your algorithms using accessors. Passing the right accessors (e.g. for time and V1) will get you the right histogram.

All these options have some pros and cons. In the end, I guess most people would go with (2), because that brings us into the familiar realm of array-based algorithms.

However, stated like this, it seems more like a "data impedance" problem between two subsystems than a language problem. Why did I say it's the language fault? Because the language is forcing me to access data members with compile-time names, and does not (immediately) allow me to access data members using run-time names.

Don't get me wrong: I like static typing, and I like compiled languages. I know from experience that I tend to make little stupid mistakes, like typing the wrong variable name and stuff like that. Static typing and compiled languages catch most of those stupid mistakes, and that makes my life easier.

Still, the fact that I like something doesn't mean I want to use that thing all the time. I want to have options. Especially when those options would be so simple to provide.

In a heavily reflective environment like .NET, every class can be easily considered an associative container, from the property/data member names to property/data member values. So I shold be able to write (if I wanted):

SamplingPoint sp = ... ;
double d1 = sp[ "V1" ] ;

which should be equivalent to

double d1 = sp.V1 ;

Of course, that would make my histogram code instantly reusable: I'll just pass the run-time names of the two axes. You can consider this equivalent to built-in accessors.

Now, I could implement something like that on my own, using reflection. It's not really difficult: you just have to gracefully handle collections, nested objects, and so on. Unfortunately, C# (.NET) do not allow a nice implementation of the concept, mostly for a bunch or constraints they added to conversion operators: no generic conversion operators (unlike C++), no conversion to/from Object, and so on. In the end you may need a few more casts that you'd like to, but it can be done.

I'll also have to evaluate the performance implications for this kind of application, but I know it would make my life much easier in other applications (like binding smart widgets to a variety of classes, removing the need for braindead "controller" classes). It's just a pity that we don't have this as built-in language feature: it would be much easier to get this right (and efficient) at the language level, not at the library level (at least, given C# as it is now).

Which brings me to the concept of symmetry. A few months ago I stumbled upon a paper by Jim Coplien and Zhao Liping (Understanding Symmetry in Object-Oriented Languages, published in Journal of Object Technology, an interesting, free publications that's filling the void left by the demise of JOOP). Given my interest on the concept of form in software, the paper caught my attention, but I postponed further thinking till I could read more on the subject (there are several papers on symmetry in Cope's bibliography, but I need a little more time than I have). A week ago or so, I've also found (and read) another paper from Zhao in Communications of ACM, March 2008: Patterns, Symmetry, and Symmetry breaking.

Some of their concepts sound odd to me. The concept of symmetry is just fine, and I think it may help to unravel some issues in language design.
However, right now the idea that patterns are a way to break symmetry doesn't feel so good. I would say exactly the opposite, but I really have to read their more mathematically-inclined papers before I say more, because words can be misleading, while theorems usually are not :-).

Still, the inability to have built-in, natural access to fields and properties through run-time names struck me as a lack of symmetry in the language. In this sense, the Accessor would simply be a way to deal with that lack of symmetry. Therefore it seems to me that patterns are a way to bring back symmetry, not to break symmetry. In fact, I can think of many cases where patterns "expose" some semantic symmetry that was not accessible because of (merely) syntactic asymmetry.

More on this as I dig deeper :-).

Labels: , , , ,

Seems like the JOT website is not accessible. The URL should be fine though.

I often use the (free)VisualRoute service to understand this kind of failure: in this case, it seems like a server at ETH Zurich is malfunctioning (sorry, the dump would wrap in some horrible way here :-).

With some luck, it will be back soon.

I didn't look into this concept of symmetry, but for the problem(s) at hand, have you considered using a delegate? Something like this:

class SamplingPoint
public double V1 { /* get, set */ }
public double V2 { /* get, set */ }

class HistogramCalculator
public delegate double AccessorDelegate(SamplingPoint);

public double[] CalculateHistogram(SamplingPoint[] pts, AccessorDelegate ad)
// Calculate the histogram here...

// To access the value of the property for the i-th point, do this:
double v = ad(pts[i]);

// More code here...

class App
public static void Main(string[] args)
SamplingPoint[] pts = LoadPointsFromSomewhere();

double[] histogramV1 = HistogramCalculator.CalculateHistogram(pts,
delegate(SamplingPoint pt) { return pt.V1; });

double[] histogramV2 = HistogramCalculator.CalculateHistogram(pts,
delegate(SamplingPoint pt) { return pt.V2; });

// More code here...

(disclaimer: the above code is almost certainly broken: I don't have a copy of Visual Studio installed, so I couldn't check it, and after five months of unemployment, my C# is becoming very rusty)

The syntactical overhead is minimal, as the delegate can be declared inline, and the performance hit should not be much worse than that of a virtual method. And you don't have to give up static typing. The code given is for C# 2.0, which has a very ugly syntax for delegates, but I believe (though I never used it) that C# 3.0 sports a nicer one, more similar to what is usually found in funcional languages.

You also mentioned in passing the problem of binding a widget to, say, a specific piece of data. I use the same approach here. For every form, I create a "BindingManager" object, which has, for every type of widget, a method to create a binding between the widget itself and a couple of delegates, the first one to retrieve the piece of data, and the second one to set it. This is a code snippet, for a CheckBox control:

class MyForm : Form
BindingManager bm;

CheckBox cb;

public MyForm

bm = new BindingManager(this);

delegate { /* Return the value that the CheckBox is supposed to dislay */ },
delegate(bool newValue)
// This method is invoked when the user, clicking on the screen, changes the state
// of the CheckBox. It's supposed to syncronize the "model" with whatever is being
// now displayed on screen

The same approach works of course for other types of controls, and can be used, as a sometimes more convenient alternative to the direct use of events, also for controls that do not display data, but only trigger an action, such as a button or a menu entry. In this case you just bind the button to whatever method is supposed to be invoked when the button is pressed, and maybe to a second method that indicates whether the action is available at any given time, so that you can enable/disable the button accordingly. I realize that this description may not be too clear, but I can always post a real piece of code to clarify matters if need be.
have you considered using a delegate?
Nice catch! Although an Accessor delegate is not conceptually different from an Accessor interface, the functional slant of C# makes the delegate-based code shorter. In this case, it's also obvious that there is no need for a full-blown interface (see my Interfaces Vs. Delegates (or Events) for more). So, yeap, delegates would be fine.

class HistogramCalculator
Good ol' Peter Coad would tell you the class should be named just Histogram! :-)

the performance hit should not be much worse than that of a virtual method
All benchmarks I've seen rate delegates at about 8-10 times slower than virtual methods; see, for instance, Writing Faster Managed Code: Know What Things Cost. For an accessor is a little too much, but access through run-time names could hardly cost less.

In the end, I decided to go with solution (2) and re-structure the data before processing. There is just too much numerical stuff going on (besides histograms) to accept a constant overhead. If I reshape it, I pay the cost only once per channel.

For every form, I create a "BindingManager" object[...]
I realize that this description may not be too clear

Oh, it was actually very clear. I've seen stuff like that used in other languages (like Delphi).
Do you really get much benefit from going through a BindingManager, instead of just wiring up the events yourself?
I understand some benefits on the other side (model to GUI through binding), mostly because C# is so limited it's hard to come up with a nice generic class to transparently trigger an event whenever an object of an existing type change its value.
And (I know it sound obvious :-) you still have to write those methods.

What I really want is to bind GUI stuff to my domain model at design time without writing any code. I've used this kind of technique in several projects, especially web [services] based applications, but I've yet to come up with a nice library I can show.

As I said, it's simple to get access to data through run-time names, using reflection; it's difficult to get a nice, natural syntax for that in C#, because the language is so limited as soon as you move a few steps away from the mainstream problems.

My best result so far is a small, but real-world Windows Form application: a function point calculator using a very different approach to FP. It required exactly 0 lines of custom code, just an XML to generate the domain model and some wiring at design time for reusable custom controls.

I know it sounds weird :-), sooner or later I'll post something about this. It's really a post-modern GUI architecture, years ahead of the MVC crap...

after five months of unemployment, my C# is becoming very rusty
Hey, you're too good to be unemployed! Consider sending me a CV, one of my clients might be looking for you :-) [can't promise, but I think it's worth a shot]
Good ol' Peter Coad would tell you the class should be named just Histogram! :-)

...and CalculateHistogram(...) should just be named Calculate(...). Now you mention it, maybe there's a bit of redundancy
there :-)

What I really want is to bind GUI stuff to my domain model at design time without writing any code.

I had no idea what you where looking for (you just mentioned it quicky), I had a much simpler problem to solve, I just wanted to code each form using two classes, a model that contains all the abstract data and logic but knows nothing about forms, controls and events and a view/form that takes care of the presentation part, but does not contain any "nontrivial" code, and I used delegates to do the bindings. And yes, I'm still stuck with that model/view crap, and I was even pretty happy with it (although now I'm a bit ashamed to admit it :-). I'm really curious to see your alternative approach, I have no clue what you've come up with. Hope you can post something about it sooner rather than later. I knew you weren't a big fan of MVC, I think you wrote something about it in Computer Programming ages ago, but it looks like your dislike for it has deepened since. I seem to remember that you mentioned there that you didn't like the fact that the model ended up being a rather dumb class, containing only data, while all the interesting logic was in the view. My personal experience is that that's true when you use that pattern to build widgets, but it's the opposite when you apply it to entire forms, in this case all the logic ends up in the model, while the view/form becomes extremely dumb, usually consisting only in a long contructor, where you initialize the controls and create the bindings, and hardly any other methods. And that's exactly what I want. It would be interesting to know more about why you dislike that pattern so much, I find it somewhat surprising.

Consider sending me a CV

Thanks! That would be great! If nothing else, because working with you as software architect would be really, really interesting. At the moment though I'm just looking for a "summer job", as I'm planning on doing quite a bit of travelling next autumn and winter. No point wasting such a good chance for something like that. And I still need to make up my mind as to where I'm going to go next. I've been working in Dublin in the last few years, but I was kind of thinking of moving to London at some stage. In a year or so, would the offer still be valid? Could I send my CV then? And do you have many clients in London, by the way?
because working with you as software architect would be really, really interesting.

Rereading my post, I realized that this sentence conveys a different meaning from the one I had in mind. What I meant was: because having you as the software architect in the team would be really, really interesting.
I'm really curious to see your alternative approach, I have no clue what you've come up with
So far it's an half-baked idea, but I hope I'll get to a point where I can show something decent :-) soon enough...

About the unemployment stuff, I was about to add "unless it's a volountary leave or something". Guess I should have :-). You seem to have everything under control! And no, sorry, I'm not working in the London area at this time...
Post a Comment

<< Home