Binary Constants using Template Metaprogramming

Dr. Carlo Pescio
Binary Constants using Template Metaprogramming

Published in C/C++ Users Journal, February 1997.

Introduction

Many developers using high-level languages for application development never have to deal with explicit binary numbers. On the other hand, programmers working on operating systems, embedded software, or device drivers must routinely deal with individual bits and binary constants. It is thus surprising that languages aimed at low-level programming, such as C and C++, do not allow the programmer to use binary constants. They have support for decimal, octal, and hexadecimal constants, but for some reasons radix two has been left out. There are numerous alternatives commonly used by programmers, but none of them is really satisfactory.

The easiest solution is rather obvious — convert the binary constant to radix ten or sixteen. After all, why do you have this feature on your hand-held calculator if not to use it? But this approach is very tiresome when you have to maintain or understand the code. All the data sheets document the meaning of each and every bit, and you have to develop the ability to think in binary. Unfortunately, such conversions are also a common source of bugs.

Wiser programmers use more verbose expressions like 32 + 8 + 2 instead of 42, to render in a more readable way which bits are touched. Elegance seekers delay the evaluation to run-time, using the ANSI function strtol, which lets you specify a radix for the conversion, as in strtol("101010", 0, 2). Wrap this in a small macro and you have very readable code. Unfortunately, the run-time evaluation of constants is an unsustainable luxury in low-level programming. Finally, you may use a custom preprocessor to convert some literals, such as 0b101010, into decimal numbers. But extra-linguistic solutions are rarely well accepted because of their restricted portability and negative effects on debugging.

Recently, while adding just another comment into the source code of a device driver, I realized that I could use a template metaprogram to convert literals from one radix to another at compile-time. This technique combines the advantages of compile-time constants with a more natural syntax to express numbers which are binary by nature.

Template Metaprograms

A template metaprogram [1] is a C++ program that is interpreted at compile-time. Metaprogramming is a very powerful technique. It can be used to generate more efficient code [2] , or to perform at compile-time some evaluation that would normally have been deferred until run-time. In case you have never seen a template metaprogram, Listing 1 shows a simple one that calculates the Fibonacci's numbers. It is compared with a recursive version in ML [3], a functional language based on pattern matching.

The idea behind the metaprogram is easy to grasp. When you request the instantiation of, say, Fib<4>, the compiler has to first instantiate Fib<3> and Fib<2>. This in turn requires the instantiation of Fib<1>, which is given (together with Fib<2>) by explicit specialization. It is crucial to understand the importance of the explicit specialization, which plays the role of the base of recursion, so that the instantiation process can terminate. Also, note the use of an enumerated type to keep the result of calculations. This is necessary because most compilers do not yet support the definition of constants inside the class declaration. The result of the calculation is accessible as Fib<4>::value.

The similarity between the template metaprogram and the recursive definition in ML is not accidental. All the first-order functions having rank in Nⁿ and image in N can be expressed as template metaprograms. A conditional is easily implemented using the operator ?:, a loop by recursion as in the example above, and case selection by using multiple explicit specializations.

So template instantiation requires the equivalent of a pattern-matching based interpreter to be written as part of the compiler, and this without even considering partial specialization. No wonder that compiler writers seem to have problems getting the template implementation right. On the positive side, that means you can move a whole lot of calculations (including binary-to-decimal conversion) from run-time to compile-time, without any performance penalty. Using any decent compiler, expressions such as Fib<4>::value result in exactly the same code as if manifest constants were used.

Binary Constants

Once you understand template metaprograms, binary-to-decimal conversion is almost trivial. Listing 2 shows a very simple metaprogram that does the job. Again, note the close resemblance with a recursive definition in ML. The metaprogram takes a decimal number such as 11 (eleven) but assumes that it represents a radix-two number (three). It does a conversion to decimal using a straightforward modulo/division technique. The syntax is still a bit odd (Bin<101010>::value), but syntax matters are easy to fix with a gentle touch of macro preprocessing. Still, this version of the code is too naive to be used in real programs. In fact, there are two serious flaws in Listing 2:

There is no validity checking on digits. You can pass an arbitrary number like 1234 and pretend it is a binary number.
Compilers constrain the length of literals, so large (e.g., 30-bit) binary numbers cannot be expressed directly, but all you get is an obscure error message.

In both cases we are left with a somewhat new problem. The metaprogram must check the validity of the parameter and generate an error message at compile-time, preferably one that is human readable, if the parameter is invalid. Unfortunately, there aren't many error-reporting mechanisms available at compile-time, so we have to resort to a more creative approach.

Error Checking

In Listing 2, digits are extracted using the modulo operator. The challenge is now to write a template metaprogram that returns the digit itself if a binary digit is passed, but generates a compile-time error for any other digit. A possible solution is given in Listing 3. Digits 0 and 1 are treated as special cases, using explicit specialization, while the general case calls a dummy function. Function calls are not constant expressions, so a compile-time error will be generated if the general template is instantiated. The message depends on the compiler, but usually it should refer to the name of the dummy function, or to the line where that function is called. Using a meaningful name (like NotABinaryDigit) for the dummy function should be enough to make the error easy to understand.

It is important to note that the static semantics of a template definition are checked only when the template itself is instantiated, not when it is defined. Therefore, the general template will not generate compile-time errors if never instantiated — if only binary digits are used.

Dealing with the maximum length of decimal numbers is harder. On most platforms, the largest integer is 32 bits wide. Since we use decimal numbers to represent binary numbers, the largest binary number we can have on those platforms is limited to ten digits, as log(2³²) = 9.6 and 10⁹ has ten digits. Unfortunately, ten is not a very attractive number when dealing with bits. Given this constraint on the number of bits, I've preferred to further constrain the metaprogram to eight bits, and use macros to define larger numbers. If your target architecture is capable of 64 bits integers, you may want to relax the constraint to allow 16-bit words.

Limiting the maximum number is easy once you know how to generate a compile-time error in a metaprogram. Listing 4 shows how I implemented the range checking, as well as the final version of the Bin template, now renamed BinByte since only bytes are allowed. Note that since digits are checked to be 0 or 1, we can simply verify that the number is less than the maximum eight-bits binary number.

Finally, I've defined two macros to allow for more readable code. BIN_BYTE(101010) is far less cluttering than BinByte<101010>::value. BIN_WORD_LE is used to define 16-bit words by passing two eight-bits binaries in little-endian order. You can easily add and customize macros to fit your taste.

The final version fixes the two flaws of the initial, simpler solution, while preserving the same flexibility and run-time efficiency. Listing 5 shows an example of C++ source and the corresponding assembly-language code generated by two compilers (Borland on Intel platform, GNU on Sparc). As you can see, there is no difference between using BIN_BYTE(101010) and using 42. In fact, you can use BIN_BYTE wherever a compile-time constant expression is required, for instance when you specify the size of an array.

A word of caution: the code has been compiled and run successfully on different compilers (Borland 4.02, Microsoft 4.0, GNU GCC 2.6.0) on various architectures (Intel x86, Sparc, Alpha). However, although Borland 4.02 had no problems with template metaprograms in general, Borland 5.0 seems to have trouble even with the simplest ones. Also, if you want to experiment with metaprogramming, be aware that some compilers (as Borland 4.02) constrain the size of an enum to the size of int, while the draft C++ Standard [4] requires that the smallest integral types that fits should be used. This is not a problem for binary constants, since we have restricted the range to the byte, but may be a source of headache if you try to implement, for instance, the factorial function or the Ackermann's function as a metaprogram. For them, you'd probably need a range larger than the one 16-bit integers provide.

An Open Problem

There is still a hidden problem in the code I've presented. If your parameter to BIN_BYTE starts with a zero, like 010 or 012, you may have a compile-time error. Or worse, you may have a number that does not look like binary but is accepted. The compiler interprets any number starting with zero and using only octal digits as a radix-eight number, so 010 equals eight (and you get a compile-time error) and 012 equals ten (and is accepted).

It is important to understand that if you never use non-binary digits, the worse thing that can happen is to have a number which appears correct but is refused, since it starts with zero. If we exclude 00 and 01 (which happen to generate the correct results), no octal number using only 0 and 1 is equal to a decimal number using only 0 and 1, so the compiler will signal a "non binary digit error."

Unfortunately there is still an hole for numbers like 012, which aren't detected as wrong. The only reasonable solution I've found is to add a 1 at the left of the number in the BIN_BYTE macro, and to remove that digit in the metaprogram (which is then bound to be used through the macro). This is not completely satisfactory, as it assumes that the macro parameter starts with a digit and not, for instance, with a parenthesis. If any reader has a better idea on how to remove this problem, I'd be glad to hear about it.

Conclusion

A few years ago, templates were used mostly (if not only) to implement generic classes such as containers. Today, a better understanding of their real flexibility has led to a flourishing of template-based techniques, like metaprogramming, which will have a concrete impact on everyday programming. Metaprogramming can help solve old problems in new, creative ways. The newest ANSI/ISO extensions (partial specialization, member function templates, default template parameters) will further extend the expressive power of metaprograms. Developers working on performance-critical code may then find metaprogramming a very valuable tool to reconcile efficiency and readability. o

Bibliography

[1] Todd Veldhuizen, "Using C++ Template Metaprograms," C++ Report, Vol. 7, No. 4.

[2] Todd Veldhuizen, Kumaraswamy Ponnambalam, "Linear Algebra with C++ Template Metaprograms," Dr. Dobb's Journal No. 250, August 1996.

[3] Robert Harper, Robin Milner, Mads Tofte, "The Definition of Standard ML," Computer Science Department, University of Edinburgh, Report ECS-LFCS-89-81, May 1989.

[4] Committees X3J16/WG21, "Working Paper for Draft Proposed Standard — Programming Language C++," September 1996.

Carlo Pescio holds a doctoral degree in Computer Science and is a freelance consultant in Savona, Italy. He specializes in object-oriented technologies and is a member of the IEEE Computer Society, the ACM, and the New York Academy of Sciences. You can contact him at pescio@acm.org.

Listing 1

/* 
Fibonacci's Function as
C++ Template Metaprogram 
*/
template< unsigned long N > struct Fib
  {
  enum { value = Fib<N-1>::value +
                 Fib<N-2>::value } ;
  } ;

struct Fib< 1 >
  {
  enum { value = 1 } ;
  } ;

struct Fib< 2 >
  {
  enum { value = 1 } ;
  } ;

/* 
Corresponding ML code
fun Fib( 1 ) = 1 |
    Fib( 2 ) = 1 |
    Fib( n ) = Fib(n-1)+Fib(n-2) ;
*/
//End of File

Listing 2

/*
Binary-to-Decimal as
Template Metaprogram
*/

template< unsigned long N > struct Bin
  {
  enum { value = (N % 10) + 
           2 * Bin< N / 10 > :: value } ;
  } ;

struct Bin< 0 >
  {
  enum { value = 0 } ;
  } ;

/*
Corresponding ML code
fun Bin( 0 ) = 0 |
    Bin( n ) = n mod 10 + 2 * Bin( n div 10 ) ;
*/
//End of File

Listing 3


/*
How to generate compile-time
errors if invalid literals are passed
*/

unsigned long NotABinaryDigit() 
  {
  return( 0 ) ;
  }

template< unsigned long N > struct 
  BinDigitOrError
  {
  enum { value = NotABinaryDigit() } ;
  } ;

struct BinDigitOrError< 1 >
  {
  enum { value = 1 } ;
  } ;

struct BinDigitOrError< 0 >
  {
  enum { value = 0 } ;
  } ;
//End of File

Listing 4


int ByteOverflow()
  {
  return( 0 ) ;
  }

template< unsigned long N > struct 
  InRangeBinByte
  {
  enum { value = (N <= 11111111L) } ;
  } ;

template< unsigned long N > struct BinByte
  {
  enum { value = InRangeBinByte< N > :: value ?
          BinDigitOrError< (N % 10) > :: value +
          2 * BinByte< N / 10 > :: value :
          ByteOverflow() } ;
  } ;

struct BinByte< 0 >
  {
  enum { value = 0 } ;
  } ;

#define BIN_BYTE( N ) \
  ((unsigned int)BinByte< (N) > :: value)
#define BIN_WORD_LE( N, M ) \
  ( 256 * BIN_BYTE( N ) + BIN_BYTE( M ) )
//End of File

Listing 5


/*
C++ Source
*/
int x = BIN_BYTE( 101010 ) ;
...

/*
Intel x86 code
*/
mov  word ptr [bp-2], 42
...

/* 
Sun Sparc code
*/
mov 42, %o0
...
//End of File