Monday, January 19, 2009 

Visual C++ 2008 bug (C++/CLI)

I wrote some faulty code recently. Actually, the code was right, but the compiler didn't like it anyway. It took a while to discover it was a (known) compiler bug, so I hope I can save someone else's time by providing a link to relevant the Microsoft page: Static variable in native method causes exception c0020001 during process exit.

Shortly stated, if you use a local static variable (inside a function/method), and the type of that variable has a native/unmanaged destructor, the compiler gets confused and emits a managed destructor instead. Of course, that happens if your compilation unit is in mixed mode (native + managed).

Unfortunately, in my case the local static variable was inside a template, shared between native and mixed compilation units. In the end, we didn't adopt Microsoft suggestion of a separate unit (which kinda makes all the C++/CLI magic to disappear), but I just changed the type of the static to remove the destructor. That requires some tweaking, and I still hope we'll get a fix from Microsoft, although the page above reports a discouraging "Won't fix" decision...

Labels: , ,

Sunday, December 23, 2007 

Down to Earth

My recent posts have been leaning on the conceptual side, and I don't want to be mistaken for an Architecture Astronaut, so here is some down-to-earth experience from yesterday.

I've developed a few application for mobile phones over the years, mostly as J2ME (Java 2 Micro Edition) midlets, or targeting Windows Mobile phones.
I never ventured into developing a Symbian application in C++. However, for several months I've entertained some ideas, and they require to go down to the metal on a Nokia phone. Nokia provides only a C++ toolkit to access the data I need, so it was time for me to get back to good old C++, which I'm not using much lately.
I just needed some spare time to install all the necessary tools, which I somehow expected to be an unpleasant experience. It got worse :-).

It all started rather nicely. Nokia offers a Visual Studio 2005 plug-in (Carbide.vs 3.0), and they say it works on Windows Vista. Cool, as I have installed Vista in my new Core 2 Quad computer (ok, that might not have been a smart move :-).
So I download the thing, and I soon discover I have to download a bunch of other stuff as well (ActivePerl, a separate SDK, and whatever else). Well, at least I don't get any broken link during download, which is something :-).

I go ahead and install everything; well, I try to. Nokia seems to have tested Carbide.vs 3.0 on Vista, but not the other stuff. The installers fail, some with a nice error message (telling me to change the PATH manually), some silently, some (like the optional Microsoft PowerToys) with a cryptic error message. I discard the PowerToys, fix the PATH manually, and start up Visual Studio.

I get a dialog box requiring registration, fight a little with the slowest Nokia server on the planet, and I'm game. Thanks to the installed wizards, I get a barebone Symbian application in a few clicks. To my surprise :-), it builds and works fine in the emulator. So far, so good. I just need to build a release version and upload it into the phone.

Ok, there is no way to set a different target, just the emulator. Exploring the file system, I realise that the ARM toolchain has not been properly installed by the Symbian SDK installer. That is, the files have been created, but setup has not been started. I do it manually. I get a few problems with PATH again, and fix that manually too. Lucky for me, I kept all the default installation paths, although that means I've got files everywhere on the hard drive. I suspected they never tried to install anywhere else.

Well, now I can choose a release target in Visual Studio. When you try to build, however, you get a sequence of interesting errors. Fixing them was like entering hell's gates :-).

To make a long story very short, here are a few relevant links:

It's kinda fun to read that yeah, well, Carbide.vs 3.0 was tested on Vista, but the underlying SDK was not, and it does not work :-).

As usual, fixes you find in forums don't really apply verbatim, but in the end, after fixing paths, perl scripts, makefiles, and stuff like that, you can make it compile and even link. I had to copy as.exe like others to make it link too. I was naive enough to believe it was over.

Trying to upload the software to the phone, I got a certificate error. The phone was set up to accept unsigned applications (worked fine with unsigned midlets), but it refused the Symbian app anyway. Ok, time to sign the application. There is a time-limited developer certificate in the SDK, so I use that and get a signed app. Uploading fails again, with a "corrupted file" message from the phone.

Reading docs and dozens of internet posts did not help. The most frequent suggestion was to reset your phone, which I refused to do. Well, at least I realized there is an absurd signing policy (should I say police?) in Symbian, which requires developers to shell out some serious money if they intend to develop powerful apps. Quite dumb, but I won't talk about that, except to say it once again: DUMB :-).

In the end I tried to generate a brand new certificate, instead of using the one from the SDK. Guess what, it worked! (pure serendipity). Sure, I get all kind of warnings when installing on the phone (because it's a self-generated certificate) but the application is working. Now it's time to try out my little idea. Well, as soon as I get some time to waste, that is :-).

Bottom line: we can blame it on Vista; after all, even Microsoft PowerToys had problems. But if you look at the structure of Carbide.vs (and I had to :-), you'll see it's just a bunch of perl scripts creating makefiles on the fly, invoking gnu tools with crossed fingers, in the hope that somehow the whole stuff works. Gee. We had better development environments in the 80s :-).
I don't know if Carbide.c++ is any better than Carbide.vs in terms of structure. Of course, since the SDK is malfunctioning on Vista, Carbide.c++ will share the same problems as Carbide.vs. Maybe it's better structured. I honestly hope so :-), because as far as Carbide.vs goes, there is only one word to define it: unprofessional.

Labels: , ,

Tuesday, June 26, 2007 

Got Multicore? Think Asymmetric!

Multicore CPU are now widely available, yet many applications are not tapping into their true potential. Sure, web applications, and more generally container-based applications have an inherent degree of coarse parallelism (basically at the request level), and they will scale fairly well on new CPU. However, most client-side applications don't fall in the same pattern. Also, some server-side applications (like batch processing) are not intrinsically parallel as well. Or maybe they are?

A few months ago, I was consulting on the design of the next generation of a (server-side) banking application. One of the modules was a batch processor, basically importing huge files into a database. For several reasons (file format, business policies), the file had to be read sequentially, processed sequentially, and imported into the database. The processing time was usually dominated by a single huge file, so the obvious technique to exploit a multicore (use several instances to import different files in parallel) would have not been effective.
Note that when we think of parallelism in this way, we're looking for symmetric parallelism, where each thread performs basically the same job (process a request, or import a file, or whatever). There is only so much you can do with symmetrical parallelism, especially on a client (more on this later). Sometimes (of course, not all the times), it's better to think asymmetrically, that is, model the processing as a pipeline.

Even for the batch application, we can see at least three stages in the pipeline:
- reading from the file
- doing any relevant processing
- storing into the database
You can have up to three different threads performing these tasks in parallel: while thread 1 is reading record 3, thread 2 will process record 2, and thread 3 will store [the processed] record 1. Of course, you need some buffering in between (more on this in a short while).
Actually, in our case, it was pretty obvious that the processing wasn't taking enough CPU to justify a separate thread: it could be merged with the read file operation. What was actually funny (almost exhilarating :-) was to discover that despite the immensely powerful database server, storing into the database was much slower than reading from the file (truth to be said, the file was stored in an immensely powerful file server as well). A smart guy in the bank quickly realized that it was our fault: we could have issued several parallel store operations, basically turning stage two of the pipeline into a symmetrical parallel engine. That worked like a charm, and the total time dropped by a factor of about 6 (more than I expected: we were also using the multi-processor, multi-core DB server better, not just the batch server multicore CPU).

Just a few weeks later (meaningful coincidence?), I stumbled across a nice paper: Understand packet-processing performance when employing multicore processors by Edwin Verplanke (Embedded Systems Design Europe, April 2007). Guess what, their design is quite similar to ours, an asymmetric pipeline with a symmetric stage.

Indeed, the pipeline model is extremely useful also when dealing with legacy code which has never been designed to be thread-safe. I know that many projects aimed at squeezing some degree of parallelism out of that kind of code fails, because the programmers quickly find themselves adding locks and semaphores everywhere, thus slowing down the beast so much that there is either no gain or even a loss.
This is often due to an attempt to exploit symmetrical parallelism, which on legacy, client-side code is a recipe for resource contention.Instead, thinking of pipelined, asymmetrical parallelism often brings some good results.
For instance, I've recently overheard a discussion on how to make a graphical application faster on multicore. One of the guy contended that since the rendering stage is not thread-safe, there is basically nothing they can do (except doing some irrelevant background stuff just to keep a core busy). Of course, that's because he was thinking of symmetrical parallelism. There are actually several logical stages in the pipeline before rendering takes place: we "just" have to model the pipeline explicitly, and allocate stages to different threads.

As I've anticipated, pipelines need some kind of buffering between stages. Those buffers must be thread safe. The banking code was written in C#, and so we simply used a monitor-protected queue, and that was it. However, in high-performance C/C++ applications we may want to go a step further, and look into lock-free data structures.

A nice example comes from Bjarne Stroustrup himself: Lock-free Dynamically Resizable Arrays. The paper has also a great bibliography, and I must say that the concept of descriptor (by Harris) is so simple and effective that I would call it a stroke of genius. I just wish a better name than "descriptor" was adopted :-).

For more predictable environments, like packet processing above, we should also keep in mind a simple, interesting pattern that I always teach in my "design patterns" course (actually in a version tailored for embedded / real-time programming, which does not [yet] appear on my website [enquiries welcome :-)]. You can find it in Pattern Languages of Program Design Vol. 2, under the name Resource Exchanger, and it can be easily made lock-free. I don't know of an online version of that paper, but there is a reference in the online Pattern Almanac.
If you plan to adopt the Resource Exchanger, make sure to properly tweak the published design to suit your needs (most often, you can scale it down quite a bit). Indeed, over the years I've seen quite a few hard-core C programmers slowing themselves down in endless memcpy calls where a resource exchanger would have done the job oh so nicely.

A final note: I want to highlight the fact that symmetric parallelism can still be quite effective in many cases, including some kind of batch processing or client-side applications. For instance, back in the Pentium II times, I've implemented a parallel sort algorithm for a multiprocessor (not multicore) machine. Of course, there were significant challenges, as the threads had to work on the same data structure, without locks, and (that was kinda hard) without having one processor invalidating the cache line of the other (which happens quite naturally in discrete multiprocessing if you do nothing about it). The algorithm was then retrofitted into an existing application. So, yes, of course it's often possible to go symmetrical, we just have to know when to use what, at which cost :-).

Labels: , , , , , , ,