Carlo Pescio - Template Metaprogramming

Template Metaprogramming: Make parameterized integers portable with this novel technique

Published in C++ Report, Vol. 9 No. 7, July-August 1997.

Writing portable C++ code is not easy; writing portable, efficient C++ code may turn into a daunting task. Although C and C++ are among the most widely ported languages, there are several gray areas in both. Like the two flips of a coin, these gray areas allow compilers to generate more efficient, tailored code on different architectures, but make writing portable code harder.

Consider the apparently trivial task of implementing a hash function, which takes three 8-bit quantities and converts them to a single, 24-bit index. On a 32-bit architecture, you may write:

int hash( int x, int y, int z )
{
return( (x&255) + 
((y&255)<<8) +
((z&255)<<16) ) ;
}

However, this code wouldn't work on a 16-bit architecture. The most common (but potentially inefficient) solution is to use the smallest integral type with a size which is guaranteed to exceed the desired range. In the example above, you might use long instead of int, because the standard requires a long to be at least 32 bits wide. This solution is potentially inefficient (in terms of space and speed), because in some architectures the int type might be exactly 24-bits wide, so using a long would be slower, and a large hash table would waste a considerable amount of memory.

True portability of integral types can be achieved in two ways: by fixing the range of types in the standard (à la Java) or by extending the language so that range and size of the types can be defined by the user (the Ada way). Java is potentially inefficient, for the same reasons as above, while Ada requires a more sophisticated compiler; C and C++ allow a simpler and efficient compiler, but leave the burden of portability to the programmer.

In the case above, the careful programmer would probably use a preprocessor-based solution:

#include <limits.h>
#if INT_MAX < 8388607L
typedef long int24;
#else
typedef int int24;
#endif

And so on for each size used in the program. Obviously, that makes writing the code somewhat more tedious and error-prone than it ought to be.

Enter Template Metaprogramming

Template Metaprogramming¹ is a novel technique that can be used to perform some calculations at compile time, instead of waiting until runtime. Metaprogramming has been used to speed up algorithms, increase the efficiency of linear algebra², and to calculate constant values at compile time³. Combined with some creativity, metaprogramming opens new, interesting opportunities for taking on old challenges armed with a new tool.

In this case, a metaprogram may calculate, at compile-time, the size of signed char, short, int, and long, and decide which type to use for the desired resolution. That's something new for metaprogramming, which so far has been used to yield a value or a function, not a type.

Listing 1 shows a first attempt: I've intentionally neglected some details (like error handling) which will be considered later, after the basic principle has been illustrated. Beware: as it is presented, Listing 1 will probably not compile.

The overall solution is simple: the primary definition of the template metaprogram looks for the smallest integral with enough bits; explicit specializations for each integral type provide the termination anchors for the recursive lookup.

More exactly, for each integral signed type there is an explicit specialization of the template struct ParamInt, which defines int_type as the corresponding integral type via typedef. The specialization is based on the number of bits in the type, calculated as CHAR_BIT * sizeof( type ). Therefore, on a 32-bits architecture, ParamInt< 32 > :: int_type would be an int, while on a 16-bits architecture it would be a long. The macro INT( N ) provides a little syntactic sugar, so you can write INT( 16 ) instead of ParamInt< 16 > :: int_type; of course, the use of the macro is entirely optional.

What happens if you require a size which does not correspond to any of the integral types? Since no explicit specialization matches the request, the primary template definition is used. Note the unusual, open form of the primary definition: where the most common loops in a metaprogram count down, this one counts up. If you require a 30-bits type on a 32-bits architecture, then the primary definition must instantiate a 31-bits type, which in turn requires a 32-bits type, which is given by explicit specialization.

In general, either the desired size is provided by explicit specialization, or the primary definition will count up until an explicit specialization is reached. If the required size is bigger than the size of the biggest integral type, the behaviour of Listing 1 is implementation-defined; I'll deal with that shortly after.

There is also another, more subtle problem in Listing 1: there is no guarantee that the sizes of char, short, int and long will be all different. Most likely, int will have the same size as another integral type, and in several architectures short, int and long are all of the same size. In these cases Listing 1 will not compile, as two or more explicit specializations will collide.

Clearly, to be truly functioning and portable the naïve code in Listing 1 needs some improvements.

Size Collision, Error Handling, and Compilers Woes

Handling errors in a metaprogram may require a creative approach, as there are few error-reporting facilities available at compile-time, when the metaprogram is run. Moreover, when you start to play with metaprogramming, you soon realize that your code had better be extremely simple, or some compilers will refuse to handle it.

A first example is already present in Listing 1: if you remove the constants PI_SHORT, PI_CHAR, PI_LONG, and try to use the corresponding expression directly as the argument of the template specialization, Borland C++ 4.02 will not compile the code. First lesson: avoid complex expression as template arguments.

Consider now the implementation-defined behaviour when the required size is too big. In all the compilers I've access to, you actually get an error message (as wanted) because there is an internal limit to the recursion depth allowed during template instantiation. The limit is well below the maximum integer, so as the counter grows, it does eventually reach the limit. However, we cannot rely on the compiler having a small internal limit: as templates gets more widely used, compilers will improve their support for templates as well.

In this case, however, the solution is quite easy: provide another explicit specialization for ParamInt< INT_MAX > to act as a guard:

struct ParamInt< INT_MAX >
{
} ;

That way if you require a type bigger than the biggest available, and your compiler does not set a limit on the recursion depth of template instantiation, INT_MAX will be finally reached; since int_type is not defined in the corresponding template specialization, you'll get an error message, as needed.

As a matter of facts, my first version of the specialization above was:

struct ParamInt< INT_MAX >
{
typedef void int_type ;
} ;

which gave slightly more readable error messages on most compilers; still, GNU GCC 2.6.0 couldn't compile that code at all, so I backed up to the solution above.

Resolving collisions between different types was a bit tricky as well. Ideally, if the size of (e.g.) short is the same as the size (e.g.) int, I shouldn't provide a specialization for short; unfortunately, this kind of conditional clause cannot be implemented as a metaprogram.

At last, I decided to map colliding types to negative index, as in the example below:

const int PI_CHAR = 
sizeof( signed char ) != sizeof( short ) ? 
sizeof( signed char ) * CHAR_BIT :
INT_MIN ;
struct ParamInt< PI_CHAR >
{
typedef signed char int_type ;
} ;

If the size of the signed char is equal to the size of short, the template is specialized for INT_MIN, not for the number of bits in a signed char. In the same way, if short is of the same size as int, the template is specialized for INT_MIN + 1, and so on. Note that since the standard guarantees that signed char, short, int, and long have non-decreasing size, each template specialization needs only to check for size collision with the immediately following type, not with the others.

The only drawback of the technique is that in case of collision, code like:

INT( INT_MIN ) i ;

becomes valid. However, I don't expect that to be a problem in any real program - requiring a large negative number of digits is not a common mistake.

The complete implementation is given in Listing 2; the code has been compiled and used successfully on different compilers (Borland 4.02, Microsoft 4.0, GNU GCC 2.6.0) on various architectures (Intel x86, Sparc, Alpha). The Borland compiler was the only one with severe limitations on the recursion depth, so it cannot handle (e.g.) INT( 17 ) because it requires 15 recursive template instantions.

Custom Extensions

Your compiler may support extended types: for instance, Visual C++ supports a 64-bits int (__int64), and with the likely inclusion of long long in C9X, most C++ compilers will probably add some support for it.

The technique presented here can be extended to handle new types without problems; to add support for 64-bits integers in Visual C++, you only add a few lines of code:

struct ParamInt< 64 >
{
typedef __int64 int_type ;
} ;

The same applies to long long, or to your own 128 or 256-bits integer-like class. Note that in Listing 2, the specialization for long does not need collision handling, but you may have to modify that part if you add a type with a size which may collide with long.

Conclusions

Once again, templates prove to be one of the most powerful features of C++.

A novel use of template metaprogramming, which yields a type as the result of some compile-time calculations, provides a simple and effective solution to a long-standing, annoying portability/efficiency problem.

Metaprogramming needs creative approaches to error handling, but can tackle difficult problems into entirely new ways. With the right tool, difficult problems aren't so difficult after all.

Bibliography

1. Todd Veldhuizen, Using C++ Template Metaprograms, C++ Report Vol 7 No 4.

2. Todd Veldhuizen, Kumaraswamy Ponnambalam, Linear Algebra with C++ Template Metaprograms, Dr. Dobb's Journal No 250, August 1996.

3. Carlo Pescio, Binary Constants using Template Metaprogramming, C/C++ Users Journal, February 1997.

Biography

Carlo Pescio holds a doctoral degree in Computer Science and is a consultant and mentor for various European companies and corporations, including the Directorate of the European Commission. He specializes in object oriented technologies and is a member of IEEE Computer Society, the ACM, and the New York Academy of Sciences. He lives in Savona, Italy and can be contacted at pescio@eptacom.net.

listing 1.

#include <iostream.h>
#include <limits.h>

template< int N > struct ParamInt
{
typedef ParamInt< N + 1 > :: int_type int_type ;
} ;

struct ParamInt< CHAR_BIT >
{
typedef signed char int_type ;
} ;

const int PI_SHORT = sizeof( short ) * CHAR_BIT ;
struct ParamInt< PI_SHORT >
{
typedef short int_type ;
} ;

const int PI_INT = sizeof( int ) * CHAR_BIT ;
struct ParamInt< PI_INT >
{
typedef int int_type ;
} ;

const int PI_LONG = sizeof( long ) * CHAR_BIT ;
struct ParamInt< PI_LONG >
{
typedef long int_type ;
} ;


// A bit of syntactic sugar
#define INT( N ) ParamInt< N > :: int_type 


int main()
{
INT( 8 ) i ;
INT( 16 ) j ;
INT( 29 ) k ;
INT( 32 ) l ;
cout << sizeof( i ) << " " << 
sizeof( j ) << " " << 
sizeof( k ) << " " << 
sizeof( l ) ;
return( 0 ) ;
}

Listing 2.

#include <limits.h>

template< int N > struct ParamInt
{
typedef ParamInt< N + 1 > :: int_type int_type ;
} ;

// Specialize template for signed char, unless it has the
// same size of int, in which case, specialize for a 
// negative integer to avoid collisions
const int PI_CHAR = 
sizeof( signed char ) != sizeof( short ) ? 
sizeof( signed char ) * CHAR_BIT :
INT_MIN ;
struct ParamInt< PI_CHAR >
{
typedef signed char int_type ;
} ;

// As above, but avoid short/int collision
const int PI_SHORT = 
sizeof( short ) != sizeof( int ) ? 
sizeof( short ) * CHAR_BIT :
INT_MIN + 1 ;
struct ParamInt< PI_SHORT > 
{
typedef short int_type ;
} ;

// As above, but avoid int/long collision
const int PI_INT = 
sizeof( int ) != sizeof( long ) ? 
sizeof( int ) * CHAR_BIT :
INT_MIN + 2 ;
struct ParamInt< PI_INT > 
{
typedef int int_type ;
} ;

// No need for extra check here. Would be necessary 
// if e.g. a long long type was added.
const int PI_LONG = 
sizeof( long ) * CHAR_BIT ;
struct ParamInt< PI_LONG >
{
typedef long int_type ;
} ;

// If the required size is not supported, the metaprogram
// will increase the counter until an internal error is
// signaled, or INT_MAX is reached. The INT_MAX 
// specialization does not define a int_type, so a 
// compiling error is always generated
struct ParamInt< INT_MAX >
{
} ;

// A bit of syntactic sugar
#define INT( N ) ParamInt< N > :: int_type