Wednesday, March 14, 2007
Web Services and the unfriendly appdomain
Sending frame-oriented data was the easy part. We just used UDP: light, connectionless, and since we don't mind occasionally losing some data (if we lose one frame, we'll display the next) that was just the best fit.
The palmtop, however, must also send commands to (and receive answers from) the legacy applications. They already have a (legacy) interprocess communication protocol, currently based on named pipes, and we definitely didn't want to integrate the palmtop into that (as we want to scrap that subsystem as soon as possible).
This sounds like a good opportunities for decoupling everything through a web service: the palmtop will talk to the web service (with all the advantages of having the proxy code generated), the web service will talk with the legacy system[s]. For a number of reasons, we decided to develop the palmtop software in C#, a significant portion of the web service in C# as well, and the glue between the web service and the legacy applications in C++/CLI (which is an ideal candidate for gluing together the managed and unmanaged worlds).
We had a [legacy] DLL exposing a set of functions of the underlying protocol: it was basically a publish/subscribe event model. You have to register a set of callback functions, that the legacy DLL will call on specific events (that's where C++/CLI becomes handy: the DLL assumed regular C pointers everywhere).
Everything was smooth and fine, except that (as known) ASP.NET is using a different appdomain for each hosted web service (or application). That won't be a problem, except that we soon discovered that all the callbacks from the unmanaged world got rooted into a separate appdomain. As appdomain are isolated, communicating data back and forth between the two is problematic.
This is usually no big deal: when you are in control of appdomain creation, you can have some portion of code (which creates the appdomains) storing serializable or (like it would be our case) MarshalByRefObject references into the different appdomains, through SetData. You can then retrieve the data through (guess what :-) GetData inside each appdomain. In our case, we could simply store a reference to AppDomain #2 (the one executing the web service) into AppDomain #1 (the one where the callback takes place). AppDomain derives from MarshalByRefObject, so this was just fine. Inside the callback, we could have used the DoCallback method of the AppDomain class to execute some code inside AppDomain #2 (as we needed to use data allocated as part of the web service logic).
The problem is, ASP.NET is creating the appdomains, and the .NET framework doesn't provide a simple way to obtain a reference to other appdomains (short of hosting the run-time yourself, or implementing a AppDomainManager, which I've considered at some point, but discarded as too ugly).
At that point, we started trying a variety of different approaches, including bypassing the whole appdomain isolation through C++/CLI and __declspec( process ) and all the exotic stuff, just to get a fancy variety of undefined behaviours.
In the end, a good guy working with us suggested that we just used a different approach:
a) create a new thread as part of the web service initialization, so in AppDomain #2.
b) have that thread wait on a Win32 kernel object (an auto-reset named event)
c) have the unmanaged callback in AppDomain #1 copy the data in a shared unmanaged space and signal the event
d) that will wake up our thread in AppDomain #2, and since unmanaged memory is, well :-), unmanaged, we will have no isolation problems.
That worked like a charm.
Now, we still have some issues when the appdomain gets unloaded (IIS can unload/reload your appdomain for a number of poorly specified reasons). By forcing a reload through a manual change to web.config, we're experiencing some problems with the IPC libraries, although we have carefully derived a critical class from CriticalFinalizerObject. There is some interplay between deferred finalization, new appdomain creation, unmanaged threads and so on that is misbehaving. We'll solve this too, but it's gonna cost more than expected.
Long story, short conclusion: it's ok for technology to go to some length to protect programmers from themselves. However, it is not ok to introduce a technology that cannot be controlled easily by programmers. It's the usual Visual Basic syndrome, where easy things get easier and hard things get harder, and I definitely don't like that...



