Tuesday, June 26, 2007

 

Got Multicore? Think Asymmetric!

Multicore CPU are now widely available, yet many applications are not tapping into their true potential. Sure, web applications, and more generally container-based applications have an inherent degree of coarse parallelism (basically at the request level), and they will scale fairly well on new CPU. However, most client-side applications don't fall in the same pattern. Also, some server-side applications (like batch processing) are not intrinsically parallel as well. Or maybe they are?

A few months ago, I was consulting on the design of the next generation of a (server-side) banking application. One of the modules was a batch processor, basically importing huge files into a database. For several reasons (file format, business policies), the file had to be read sequentially, processed sequentially, and imported into the database. The processing time was usually dominated by a single huge file, so the obvious technique to exploit a multicore (use several instances to import different files in parallel) would have not been effective.
Note that when we think of parallelism in this way, we're looking for symmetric parallelism, where each thread performs basically the same job (process a request, or import a file, or whatever). There is only so much you can do with symmetrical parallelism, especially on a client (more on this later). Sometimes (of course, not all the times), it's better to think asymmetrically, that is, model the processing as a pipeline.

Even for the batch application, we can see at least three stages in the pipeline:
- reading from the file
- doing any relevant processing
- storing into the database
You can have up to three different threads performing these tasks in parallel: while thread 1 is reading record 3, thread 2 will process record 2, and thread 3 will store [the processed] record 1. Of course, you need some buffering in between (more on this in a short while).
Actually, in our case, it was pretty obvious that the processing wasn't taking enough CPU to justify a separate thread: it could be merged with the read file operation. What was actually funny (almost exhilarating :-) was to discover that despite the immensely powerful database server, storing into the database was much slower than reading from the file (truth to be said, the file was stored in an immensely powerful file server as well). A smart guy in the bank quickly realized that it was our fault: we could have issued several parallel store operations, basically turning stage two of the pipeline into a symmetrical parallel engine. That worked like a charm, and the total time dropped by a factor of about 6 (more than I expected: we were also using the multi-processor, multi-core DB server better, not just the batch server multicore CPU).

Just a few weeks later (meaningful coincidence?), I stumbled across a nice paper: Understand packet-processing performance when employing multicore processors by Edwin Verplanke (Embedded Systems Design Europe, April 2007). Guess what, their design is quite similar to ours, an asymmetric pipeline with a symmetric stage.

Indeed, the pipeline model is extremely useful also when dealing with legacy code which has never been designed to be thread-safe. I know that many projects aimed at squeezing some degree of parallelism out of that kind of code fails, because the programmers quickly find themselves adding locks and semaphores everywhere, thus slowing down the beast so much that there is either no gain or even a loss.
This is often due to an attempt to exploit symmetrical parallelism, which on legacy, client-side code is a recipe for resource contention.Instead, thinking of pipelined, asymmetrical parallelism often brings some good results.
For instance, I've recently overheard a discussion on how to make a graphical application faster on multicore. One of the guy contended that since the rendering stage is not thread-safe, there is basically nothing they can do (except doing some irrelevant background stuff just to keep a core busy). Of course, that's because he was thinking of symmetrical parallelism. There are actually several logical stages in the pipeline before rendering takes place: we "just" have to model the pipeline explicitly, and allocate stages to different threads.

As I've anticipated, pipelines need some kind of buffering between stages. Those buffers must be thread safe. The banking code was written in C#, and so we simply used a monitor-protected queue, and that was it. However, in high-performance C/C++ applications we may want to go a step further, and look into lock-free data structures.

A nice example comes from Bjarne Stroustrup himself: Lock-free Dynamically Resizable Arrays. The paper has also a great bibliography, and I must say that the concept of descriptor (by Harris) is so simple and effective that I would call it a stroke of genius. I just wish a better name than "descriptor" was adopted :-).

For more predictable environments, like packet processing above, we should also keep in mind a simple, interesting pattern that I always teach in my "design patterns" course (actually in a version tailored for embedded / real-time programming, which does not [yet] appear on my website [enquiries welcome :-)]. You can find it in Pattern Languages of Program Design Vol. 2, under the name Resource Exchanger, and it can be easily made lock-free. I don't know of an online version of that paper, but there is a reference in the online Pattern Almanac.
If you plan to adopt the Resource Exchanger, make sure to properly tweak the published design to suit your needs (most often, you can scale it down quite a bit). Indeed, over the years I've seen quite a few hard-core C programmers slowing themselves down in endless memcpy calls where a resource exchanger would have done the job oh so nicely.

A final note: I want to highlight the fact that symmetric parallelism can still be quite effective in many cases, including some kind of batch processing or client-side applications. For instance, back in the Pentium II times, I've implemented a parallel sort algorithm for a multiprocessor (not multicore) machine. Of course, there were significant challenges, as the threads had to work on the same data structure, without locks, and (that was kinda hard) without having one processor invalidating the cache line of the other (which happens quite naturally in discrete multiprocessing if you do nothing about it). The algorithm was then retrofitted into an existing application. So, yes, of course it's often possible to go symmetrical, we just have to know when to use what, at which cost :-).

Labels: , , , , , , ,


 

Two Years of Blogging!

I started blogging on June 26Th, 2005. In the beginning, I thought I would write short, frequent posts about what I was thinking / reading / doing during the day. As I said in Blogging as Destructuring, lowering expectations of thoroughness, completeness, even relevance [...] allows a much faster flow of ideas.
Eventually, I began writing longer (and less frequent) posts, somehow closer to short articles. Seems like old habits are hard to drop :-). I should probably strive for a better balance, but anyway, I hope you enjoyed some of the recent stuff too.
Got some suggestions? Drop me a line : )

Saturday, June 23, 2007

 

C++, Java, C# and... D

Ho aggiornato un mio vecchio (ma in un certo senso "storico") articolo: C++, Java, C#: qualche considerazione, includendo una nota su D e C++/CLI relativamente al tentativo di riconciliare distruzione deterministica e garbage collection, ed alle relative limitazioni. Rationale: mi e' stato fatto notare come D venga occasionalmente citato come un esempio di perfetta soluzione al problema, mentre in pratica non si discosta significativamente dal C#.

Labels:


Wednesday, June 20, 2007

 

LinkedIn?

I started receiving invitations to join LinkedIn a couple of years ago, from colleagues, old friends, but also from people I've never met in real life.
I must confess I've never taken the time to create a profile, partially for lack of time, partially because I'm somewhat skeptical about the real benefits of doing so, partially because I've been lulling myself, thinking that I already enjoy a good visibility.
However, in the last few days I've experienced a flurry of invitations, and being somehow sensitive to meaningful coincidences :-), I began thinking that creating a barebone profile won't take more than a few minutes (not sure I'll ever find the motivation to go much furter, but that's hard to say).
Moreover, a quick look on the LinkedIn site proved that many people way more visible than me, including for instance Bjarne Stroustrup, are now part of that network. I'm sure quite a few of you guys have a profile there too.
So here is the question: Did you find the thing any useful? Have you experienced any significant nuisance? Is is worth the trouble? (if you're too shy to say it here :->, tell me at pescio@eptacom.net). Thanks!

Labels:


Tuesday, June 19, 2007

 

Client Side-Augmented Web Applications

In the last few posts I've been writing a lot about AOP, and very little about what I'm doing every other day. It's plain impossible to catch up, but here is something that has kept me busy for quite a few days lately: Client Side Augmented Web Applications (I should file for a trademark here :-). What I mean is a regular web application, that can be regularly used as a stand-alone application or, when you install some additional modules on the client, can also interact closely with other applications on the client side (e.g. the usual office suite, and so on).

Naturally, that means web pages must have a way to send data to client side application, and to obtain data from the client side application. For several (good) reasons, we wanted this data exchange to be as transparent as possible to the web app developers. Also, we didn't want to write different web applications (regular and augmented). That would have had a negative impact on development and maintenance times, and it could also have proven to be an inconvenience for the users. This had some impact on page navigation design, which could be an interesting subject for a future post.

Now, I can't get into the specifics of the project, or disclose much about the design of the whole infrastructure I've designed and built (yes, I still enjoy writing code :-). However, I can show you the final result. If you want your ASP.NET page to obtain (e.g.) the filename and title of your Word document, and send back to Word a corporate-wide document number, all you have to do is add a few decorated properties in your .aspx.cs source file, like:

public partial class DocumentProperties : System.Web.UI.Page
  {
  [AskClient("filename")]
  public string Filename
    {
    get
      {
      return filename;
      }
    set
      {
      filename = value;
      }
    }

  [AskClient( "title" )]
  public string Title
    {
    get 
      { 
      return title; 
      }
    set 
      { 
      title = value; 
      }
    }

  [SendClient("description")]  
  public string Description
    {
    get
      {
      return description;
      }
    set
      {
      description = value;
      }
    }

  // here goes the usual stuff (methods, data members)
  }

that's it. The attributes define the property name as known on the client side; the invisible infrastructure will take care of everything else. In a sense, communication between the web application and the client side application has been modeled as a virtual machine concern, and the attributes are used to tell the virtual machine where interception is needed. Of course, this is only the tip of the iceberg. There is a lot going on under the hood, also on the client side, as your browser and your favorite application are usually not good friends, and to be honest not even acquaintances.

Is my abstraction leaky? Sure it is. To make all that stuff working, you also have to drop a non-visual component into your page at design time. That component, if you follow the virtual machine metaphor, is indeed the implementation of the (server-side) virtual machine layer that will deal with the communication concern.
If you don't drop the component into the page, the virtual machine layer just won't kick in, and your attributes will stay there silently, stone cold. This is a functional leak that I'm aware of, and that as a designer I have accepted. In fact, all the alternatives I've considered to avoid this leak had some undesirable consequences, and keeping that leak brought the best overall balance. Besides, there are ways to somehow hide the need to drop the control into the page (like using page inheritance or a master page), so it's really no big deal.

A final reflection. I do not believe that something like this could come out of refactoring code that was simply meant "to work". It's relatively trivial to make a web application and a client-side application talk. It's quite different to make this talk transparent. Having prototyped the first option (just to make sure a few ideas could actually work) I can honestly say that without the necessary design effort (and skill), it's extremely unlikely to come close to the final result I got.

Time to drop a few numbers: overall, I've spent roughly 20% of total development time experimenting with ideas (throwaway code), 35% designing, 30% coding (this includes some unit testing), 10% doing end-to-end testing, 5% debugging. Given the relative novelty of several techniques I adopted, I should actually consider the 20% prototyping an inherent portion of the design activity: you can't effectively design much beyond the extent of your knowledge. Sometimes, the best way to gather more knowledge is to talk; sometimes, to read; sometimes, to write code. Of course, I was looking for knowledge, not for code to keep and cherish, so I happily scrapped it away to build much better code. In the end, that's often the difference between intentional and accidental architecture.

Labels: , , ,


Sunday, June 10, 2007

 

Metaprogramming, OOP, AOP (Part 3)

We can now take a look at localization from the AOP perspective. While doing so, I'll use the Virtual Machine Metaphor I introduced in a previous post. Just like before, my goal is not to solve the problem per se, but to explore the designer's reasoning when a specific perspective (or metaphor) is adopted.

AOP, like OOP, is not a universal solution to programming problems. There is a set of problems that can be tackled more effectively using AOP, just like there is a set of problems that can be tackled more effectively using OOP.
When we design the OOP way, for instance, we often try to capture polymorphic behaviour (functional and non-functional behaviour), so to maximize the extensibility of the resulting design.
We often hear that AOP is mostly about non-functional, cross-cutting behaviour, like tracing, exception handling, transactions. The problem with the functional/non functional distinction, however, is that sometimes the distinction is a little blurred (I mentioned QoS in a previous post).
Now, is localization non-functional? I would say so, but this could be open for debate. Is localization cross-cutting? As we saw in the previous post, it usually is, although with some effort we can encapsulate it neatly in a LocalizedString class (assuming we're designing the base framework; we can't retrofit this solution without changing the source code). Again, it seems a little blurry.

Let's try the Virtual Machine Metaphor and see if it gets any clearer. It's really quite simple: all you have to do is ask yourself, is it theoretically conceivable to build a virtual machine that, when running my program, will take care localization? If your guts answer is yes, then you can go further with the metaphor, without getting down to the gory details of pointcuts and advices (or other implementation techniques) too soon.

A virtual machine, to be any useful, must intercept some events and add (or change) behaviour. For instance, your operating system is acting as a virtual machine when it provides you with page-based virtual memory.
Therefore, to make localization a virtual machine concern, the virtual machine needs a way to intercept either your (read) access to localizable strings, or the graphical rendering of localizable strings. In both cases, it has to return (or render) the localized string instead.

Here we see need for the first AOP tenet: quantification; see the paper from Filman and Friedman that I mentioned in Some notes on AOP. We must teach the virtual machine where interception is needed; in other words, the localization behaviour of our virtual machine must be quantified over the type system. This can be done (basically) in three ways:

1) Through a direct mapping with a type: this is akin to have a LocalizableString class. The virtual machine would then intercept any attempt to read a LocalizableString object, recover enough context, access a dictionary, and return the localized string (we still share the same problem with "localization context" that we had with OOP; I'll get back to this later.). It may seem more difficult to apply this strategy at the rendering level, as there is an unbound number of classes that will render text to the GUI, through an unbound set of arbitrarily named methods. As we did in the OOP case, however, we could try to look at a finer granularity: if there is a bound set of classes/methods in the supporting library where text can be rendered, the virtual machine could intercept those calls instead. Note, however, that at this point the virtual machine would have no way to know whether the text must be localized or not. Trace-based AOP systems may help, but when you have an hard time picturing a viable interception strategy for your virtual machine, it's probably better to look at one of the following alternatives.

2) Through decorations inside the application code. Examples of decorations are .NET attributes, Java annotations, but also the old-fashioned marker interfaces or plain ugly naming standards :-). The virtual machine would then intercept any attempt to read a regular string object, provided it has been tagged with some decoration. Note the subtle difference with above. You don't need to specify a type, but you do need to somehow tell your virtual machine where interception is needed. You keep your type system "clean" from the localization concern at the class level, but not at the data member level. Again, this strategy is fine for string access, but not so much for rendering.

3) Through explicit interception instructions, outside the application (functional) code. These instructions tells the virtual machine where a change in behaviour is required. They can be encoded in a separate XML file, or written in an AOP language like AspectJ. These instructions, in turn, can use some universal quantification (mostly through wildcards and regular expressions) or explicit naming of nonconforming instances. For instance, we may want to intercept read access to all string properties in all classes derived from a specific form type, plus read access in a finite set of classes, except a few properties we don't want to localize. The more powerful the quantification language you have, the better and more concisely you can define the set of points where interception is needed. Note that once again, intercepting rendering seems easy, but it's nowhere trivial to understand when localization is needed (more on this below). This may suggest, again without much low-level thinking, that intercepting string access is more viable than intercepting text rendering.

Each way has its own merits, and we can now look at them under the second AOP tenet: obliviousness.

Strategy (1), direct mapping to a type, allows for callee obliviousness (the callee being, in this case, the virtual machine). Indeed (still ignoring context), the virtual machine would have no need to know anything about the caller: it will just provide a uniform service every time a LocalizableString is accessed. There won't be, however, caller obliviousness, as the caller will have to use the appropriate type to enable localization.

Strategy (2) is very similar to (1), as far as obliviousness is concerned. The caller has to add decoration (no obliviousness). The callee (again, the virtual machine) will enjoy complete obliviousness: every tagged string must be localized with uniform behaviour.

Strategy (3) leads to caller obliviousness, and as such, is particularly interesting for legacy code (more on this on a future post about AOP, but if you compare the OOP and AOP approach about localization, this should strike you as the biggest difference!). The callee can still be oblivious. The explicit instructions, however, won't be. They are highly dependent on the caller, and if what you got is a legacy application, you're bound to have a non-uniform set of interception points. We may decide to go with low-level text rendering interception, trying to get caller, callee, and interception instructions obliviousness. This is not going to work. You'll need trace-based interception to discriminate when translation is needed, and the trace will be highly caller-dependent.

Just a few words on context: if you consider (for instance) the form name as the localization context, every strategy will need to recover the form name as well. I'll leave this as the dreaded exercise for the reader :-).

So here it is: by thinking about how a virtual machine could provide a service, we can quickly evaluate the fitness of AOP as a candidate solution for a problem; you can also reason about different interception flavors (depending on the implementation language, you may have more than one option) and evaluate pros and cons. The same metaphor can be applied in mixed situations (e.g. attributes or annotations + reflection), where we accept a few explicit calls in exchange for largely oblivious caller/callee in a traditional OOP language.

As we noted, the biggest difference with good ol' OOP lies in strategy (3), where no knowledge is stored at the type level (or data member level). There, we can appreciate a significant contribution of AOP to the evolution of existing systems, that's often forgotten in language design, but which is probably more interesting than factoring out logging and transactions :-). Stay tuned!

Labels: ,


This page is powered by Blogger. Isn't yours?