Tuesday, August 16, 2005 

Early results on Wise Pointers performance

I've put together an incomplete implementation of the Wise Pointers concept. Although a few features and optimizations are missing, there is enough substance to do a first performance comparison. See my previous posting, Benchmarking Smart Pointers in C++, for more details on the benchmark.
In their first run, with N = 10,000, Wise Pointers didn't improve much over the best smart pointer arond: 44,000 cycles Vs. 47,000. Quite unimpressive. With N = 100,000 things got more interesting: 660,000 Vs. 1,000,000, much closer to the raw pointers performance (450,000).
The problem is that the copy constructor, destructor, and assignment operator for Wise Pointers are too big (around 160 bytes of machine code each), so they don't get inlined. If I force them to be inlined with __forceinline under Visual C++, I get much better performance: 36,000 cycles in the first case (closer to the 26,000 score of raw pointers) and 570,000 cycles in the second case, again very close to raw pointers.
To be fair, I tried a __forceinline on my "regular" smart pointers (using two parallel pointers), and as expected their performance didn't change a bit, because they are already been inlined. I didn't try that on Boost and Loki implementation (yet).
Bottom line: with __forceinline, the current implementation is interesting, because the overhead of using smart pointers has gone down from 80% - 120% (respectively, for the two best implementation for N = 10,000 and N = 100,000) to 38% - 27% respectively. Of course, the code gets bigger (hardly a problem these days, but anyway). I'm still unhappy with the idea of using __forceinline, so I'm gonna think about the code a little bit more, as soon as I get some time to spare...