The C Preprocessor

When I was learning the C Programming Language, every time encountering the Preprocessor, though I knew that the Preprocessor can be really powerful, I told myself, I am just a novice and don’t be bothered by that and I only need to know some of the basics like #define, #if, #ifndef, #progma once, #endif and #include. But recently, when I was reading the book Data Structures and Algorithm Analysis in C, and incidentally seeing the following lines, I was bugged and could not help to learn more about the C Preprocessor.

// Following is a quote of code on page 197 of the book
// Data Structures and Algorithm Analysis in C [1]

// This line bothers me
# define Insert( X, H ) ( H = Insert1( (X), H) )
/* DeleteMin macro is left as an exercise */

PriorityQue Insert1( ElementType X, PriorityQueue H );

// Following is the definition on page 199
PriorityQueue
Insert1( ElementType X, PriorityQueue H )
{
    PriorityQueue SingleNode;

    SingleNode = malloc( sizeof( struct TreeNode) );
    if(SingleNode == NULL)
        FatalError("Our of space!!!" );
    else
    {
        SingleNode->Element = X; SingleNode->Npl = 0;
        SingleNode->Left = SingleNode->Right = NULL;
        H = Merge( SingeNode, H );
    }
    return H;
}

The Basics

Every C programmer must have his or her own tutorial book, which of mine is The Art of Science of C [5]. I referred to it first and found the basics for the C Preprocessor. I will just list the basics in this book below for reference for some of the readers.

The #define specification

All Starts with 3.14159265358979323846…

#define PI 3.14159265

A Tricky One

#define MaxStringSize 100

// frist version, 2 * BufferSize = 202:
// 2 * ( 100 + 1 )
#define BufferSize     (MaxStringSize + 1)

// second version, 2 * BufferSize = 201
// 2 * 100 + 1
#define BufferSize      MaxStringSize + 1 

As you may find, #define only does the dull string replacement work here.

The #include specification

2 Formats

// looks for files in a special area reserved in for system files
#include <filename>

// looks for files in a part of the file system under the control
// of the user. If the file is not found, it works the same as the
// previous angle bracket version. 
#include "filename"

Absolute Pathname

You can also #include a file with absolute pathname[4], but it can vary among operating systems. In a UNIX based system:

/home/fred/C/my_proj/declaration2.h

would be

\users\fred\C\my_proj\dclartion2.h

on Windows.

When absolute pathname are used, normal directory search is skipped.

Additional Features not covered

Some of the advanced features are not covered in reference[5], but will be introduced in detailed manner in later parts.

Psuedo-functions ( Macros)

It is also called macros, and is mostly used in the ANSI library. Which is the main topic of the remaining parts of this article.

Conditional Compilation

Following is the case that I use most, this is just like commenting a block out, but much easier.

#ifndef _BLOCK_TEST_H
#define _BLOCK_TEST_H

#define DEBUG

#if BLOCK_DEBUG
    // code blocks
#endif

#endif

Conditional Compilation is also useful for writing more advanced programs that can be more easily transferred from on computer system to another.

Formal Debut

In this part, I will introduce the trickiest and most amazing part of the C preprocessor – the Macros.

Macros Like Functions

Intro

As you have seen in the first code block in this article:

# define Insert( X, H ) ( H = Insert1( (X), H) )

Macros can be formally defined as:

#define name(comma-separated-parameter-list) stuff

where the opening parenthesis of the parameter list must be adjacent to the name, otherwise , the parameter list will be interpreted as part of stuff.

To illustrate it with a simpler example, the following macro intends to calculate the square of a number:

#define SQUARE(x)      x * x

Macro performs de facto a textual manipulation in preprocessing time by replacing parts of the code with macros. So the when SQUARE(5) is invoked, it is replaced by 5 * 5, which evaluates to be 25.

But what if when you invoke SQUARE( 2 + 3), as stated before, it only does string replacement, so it will be like 2 + 3 * 2 + 3, which is 11, not the intended 25.

To resolve this issue, as primary school have taught us, we need to use extra curly brackets!

Parentheses Help A Lot

Now, if we write the macros as:

#define SQUARE( (x) * (x) )

When calling SQUARE( 2 + 3 ), the textual replacement generates (2 + 3) * (2 + 3) for us, works as expected!

However, there is one more issue, which you might have spotted, (2 + 3) is evaluated twice. No big deal for this problem, but can be a big issue for other problems.

Macros are not functions

Have a look at the following chunk of code:

/* 1 */ int Square(int x) {
/* 2 */     return x * x;
        }

/* 3 */ i = 4;
/* 4 */ Square(i++);

Now it is quite different from the (2 + 3) case, with i = 4 as the initial value, Square(i++) and SQUARE(i++) are quite different. Let’s see what happens.

For Square(i++):

// equivalent to
Square(4);  // result: 16 

For SQUARE(i++):

// equivalent to
(i++) * (i++);

// equivalent to
4 * 5; // result: 20

There is actually no solution for this kind of problems, the only thing to do is to avoiding using macros in such a way. Following is another example:

#define EVENPARITY( ch )                         \
        ( ( count_one_bits( ch ) & 1 ) ?         \
        ( ch ) | PARITYBIT : ( ch ) )

This is a function widely used in communications and disk storage.

When you call:

ch = EVENPARITY( getchar() );

The function getchar() is called 2 times inside the macro( the statements after the question mark are parallel, only one of then can be called each time). And when the getchar() function is called, a char in I/O Stream is popped out, which will cause unexpected character missing. Following convention will solve the problem:

char c = getchar();
ch = EVENPARITY(c);

The only way to avoid this kind of problem is to make sure that never using the operations with side effects like self-increment and getchar() inside a macro. And the best way to tell a macro from a function is:

  1. Coding with good naming convention, macros all in uppercase
  2. Refer to the source code written by others.

Macros Should not be our Burden

When the arguments that SQUARE take are complicated, it will be calculated twice, which is not expected. We should simplify the arguments to improve the performance, which is our purpose of using macros instead of functions.

More about Conditional Compilation

#if, #elif, #else, #endif

#if constant-expression
    statements
#elif constant-expression
    other statements
#else
    other statements
#endif

Note: constant-expression should not be any runtime variable.

Equivalent Usages

#if     defined(symbol)
#ifdef  symbol

#if     !defined(symbol)
#ifndef symbol

Nested Directives

Just like the if..else.. statement in C, the directives can also be nested. Following can be applied to software that can run on multiple platforms.

#if     defined( OS_UNIX )
        #ifdef OPTION1
              unix_version_of_option1();
        #endif
        #ifdef OPTION2
              unix_version_of_option2();
        #endif
#elif   defined( OS_MSDOS )
        #ifdef OPTION2
               msdos_version_of_option2();
        #endif
#endif     

Command Line Definitions

This is only supported by some compilers, but still really useful, I will not explain much here

-Dname
-Dname=stuff
# in source code
int array[ARRAY_SIZE]

# on compilation in command line tools
cc -DARRAY_SIZE=100 prog.c

#line

#line number "string"

This directive modifies the value of the __LINE__ symbol(which will be mentioned later). It is used for translating code in other languages to C code.

#error

#error test of error message

# empty directive

ignored by the compiler, but can be used to separate directives from the surrounding code.

#pragma

#pragma differs in different compilers, as in Visual Studio

#pragma once

is equivalent to

#ifndef _HEADER_H
#define _HEADER_H 1  
// 1 above is arbitrary, the purpose is to have this 
// _HEADER_H symbol defined in the context


#endif

when the option like once is not defined in some compilers, the #pragma is ignored.

Other Funs

Customized C

This is the example C Is Not Algol on reference[7].

#define STRING char *
#define IF if(
#define THEN ){
#define ELSE } else {
#define FI ;}
#define DO ){
#define OD ;}
#define INT int
#define END }

// the #define's enables following code
INT compare(s1, s2)
    STRING s1;
    STRING s2;
BEGIN
    WHILE *s1++ == *s2
    DO IF *s2++ == 0
        THEN return(0);
        FI
    OD
    return (*--s1 - *s2);
END

The example here is only for fun, it is not recommended to do so. Since it will make team cooperation really hard in projects. Following is a similar usage that makes more sense but you should still be aware that aspects of the other language cannot be mimicked exactly.

#define repeat       do
#define until( x )   while( ! (x) )

// now you can write
repeat {
       statements
} until ( i >= 10 )

// which is equivalent to
do {
      statements
} while( ! (i >= 10 ) );

# to String Literals

#define PRINT(FORMAT,VALUE)                   \
        printf( "The value of " #VALUE        \
        " is " FORMAT "\n", VALUE )

PRINT( "%d", x + 3);

// output:
The value of x + 3 is 25

## Concatenates Strings

# define ADD_TO_SUM( sum_number, value ) \
         sum ## sum_number += value

ADD_TO_SUM( 5, 25 );

// above is equivalent to
sum5 += 25;

How amazing! This is a function I thought only supported on meta-programming languages like Ruby.

Predefined Symbols

There are several predefined symbols like __FILE__, __LINE__, __DATE__, __TIME__, __STDC__, which are used as environment variables.

  • __FILE__: Name of the source file being compiled
  • __LINE__: Line number of the current line in the file
  • __DATE__: Date that the file was compiled
  • __TIME__: Time that the file was compiled
  • __STDC__: 1 if the compiler confirms to ANSI C, else undefined.

Following is an example:

#define DEBUG_PRINT printf( "File %s line %d:" \
                        " x=%d, y=%d, z=%d", \
                        __FILE__, __LINE__, \
                        x, y, z )

// instantiation of the macro
x *= 2;
y += x;
z = x * y;
DEBUG_PRINT;

Split into multiple lines by ending with a backslash, where adjacent string literals are concatenated into one string.

Ontro

Macros are not statements

Have ever wondered how some of the embedded functions in C implemented? You bet! Macros are applied! [2]

Take the assert function as an example

assert(x > y)

It should terminate program execution with an appropriate error message when x > y evaluates to be 0.

We may implemented as following:

#define assert(e) if (!e) assert_error(__FILE__, __LINE__)

But how about we use it like this:

if (x > 0 && y > 0)
       assert (x > y);
else
       assert (y > x);

It will expand into something like:

if (x > 0 && y > 0)
        if (!(x > y)) 
                assert_error("foo.c", 37);
        else
                if (!(y > x)) 
                        assert_error("foo.c", 39);

The solution is:

#define assert(e)  \
        ((void)((e)||_assert_error(__FILE__, __LINE__)))

which relies on the sequential nature of the || operator, when e is true, it just returns true, otherwise the second statement is evaluated.

Macros are not Type Definitions

You may want to use #define to make type aliases, but this is actually a bad decision. Considering the following example:

#define FOOTYPE struct foo*
FOOTYPE a;
FOOTYPE b, c;

The first application works fine, but when it comes to the second, it will become:

struct foo* b, c;

which is not our intention. So to avoid this situation, we should use

typedef struct foo FOOTYPE;

instead of the faulty one.

References

  1. Mark Allen Weiss, Data Structures and Algorithms Analysis in C, 2nd Edition, 1997
  2. Andrew Koenig, C Traps and Pitfalls, 1989
  3. Dave Thomas with Chad Fowler and Andy Hunt Programming Ruby 1.9 & 2.0 – The Pragmatic Programmer’s Guide, 2013
  4. Kenneth Reek, Pointers on C, 1997
  5. Eric S. Roberts, The Art and Science of C – A Library-Based Introduction to Computer Science, 1995
  6. Self Reverential Macros: https://gcc.gnu.org/onlinedocs/cpp/Self-Referential-Macros.html
  7. Peter van der Linden, Expert C Programming: Deep C Secrets, 1994