03 - function-like macros

Here you can see a function-like macro used within regular code:

1
2
3
4
5
#define MAX(a, b) a < b ? b : a

int x = 1;
int y = 2;
int max = MAX(x, y);

Doesn't look very appealing, does it? Why use preprocessor text replacement when such macros can easily be written as regular C++ code (usually with the help of templates)? The standard library already offers std::max. This is why macros with no solid justification are frowned upon in C++ (in C it's a very different story - there are far less language features).

Using macros inside preprocessor directives is fine - this is the bread and butter of writing multiplatform and multi-build-configuration code in C and C++. But when used directly within code, they can significantly impact readability apart from other issues explained in this lesson.

Multiple evaluation

Let's revisit the previous example but with a small change:

1
2
3
int max = MAX(++x, f(y));
// after preprocessing
int max = ++x < f(y) ? f(y) : ++x;

Now you should see the problem - because the preprocessor operates on text, it often impacts code in very unpleasant ways:

  • the variable x is actually incremented twice (or not, depending which value is larger)

  • the function f may be called twice (at best a redundant operation, at worst the program will have different behavior)

For this reason, the convention is that macros should be called in a such way that there are no side effects. The input expressions should be as simple as possible to avoid accidentally invoking unwanted operations. Sometimes this means you will have to create a local variable that holds intermediate result to avoid putting computational expressions into macros.

Changed evaluation

Another problem - macros may unintentionally work differently due to operator precedence:

1
2
3
4
5
6
7
8
#define TO_FAHRENHEIT(celsius) 1.8 * celsius + 32

double temp = TO_FAHRENHEIT(20 + 80);

// what was meant
double temp = 1.8 * 100 + 32;
// what preprocessor generated
double temp = 1.8 * 20 + 80 + 32;

In such cases the workaround is simple - just correct macro definition to always enclose given text in parentheses:

1
2
3
4
// fully guarded against other operators
#define TO_FAHRENHEIT(celsius) (1.8 * (celsius) + 32)
// now, after preprocessing
double temp = (1.8 * (20 + 80) + 32);

In fact, the macro from previous example should look like this:

1
2
// That's a lot of parentheses!
#define MAX(a, b) ((a) < (b) ? (b) : (a))

Pretty dumb, isn't it? The preprocessor isn't very clever. Another reason to avoid macros.

The mechanism in detail

Each time a macro is invoked, the preprocessor expands it which may contain other macros inside. This allows further macro nesting and some powerful preprocessor tricks.

The preprocessor operates on text but more precisely on tokens. A token is the smallest entity made from text that forms something that has a meaning: an identifier, a number literal, a string literal, an operator and such. Whitespace characters (space, tab, newline and such) are used to separate tokens that would otherwise form a single, different token (e.g. constint identifier vs const int keywords).

3 tokens have special status for the preprocessor: , (separates arguments) , ( and ) (encloses macro and arguments). A macro argument can never contain an unmatched parenthesis or a comma that is not surrounded by matched parentheses but because preprocessor operates on tokens, it's possible to provide an empty-token argument.

1
2
3
4
5
6
7
8
9
#define IDENTITY(x) x
IDENTITY(,) // error: 2 arguments (both empty) but macro takes 1
IDENTITY(() // some syntax error
IDENTITY()) // this can work, it's just macro call followed by )
// error: 2 arguments but macro expects 1
// argument 1: std::pair<int
// argument 2: int>
IDENTITY(std::pair<int, int>)
IDENTITY((std::pair<int, int>)) // ok: 1 argument

Other characters do not have this special status and thus the preprocessor treats <>, [] and {} like any other. The macro argument does not even need to have them matched as the preprocessor doesn't test matching apart for ().

Stringification

cppreference:

In function-like macros, a # operator before an identifier in the replacement-list runs the identifier through parameter replacement and encloses the result in quotes, effectively creating a string literal. In addition, the preprocessor adds backslashes to escape the quotes surrounding embedded string literals, if any, and doubles the backslashes within the string as necessary. All leading and trailing whitespace is removed, and any sequence of whitespace in the middle of the text (but not inside embedded string literals) is collapsed to a single space.

Because the operation effectively turns an expression to a string, it's often called "stringification", "stringization" and such.

This feature allows generating code that can both evaluate the expression as it is but also treat it as a string:

1
2
// prints the name and the value of the object
#define PRINT(val) std::cout << #val " = " << val << "\n"

Because macro expansion proceeds recursively and there is a limited set of operations that can be done on each expansion, in some cases certain operations need to be delayed in order to work properly. For this reason, stringification is often not used directly but through another macro:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <iostream>

#define STRINGIZE_IMPL(x) #x
#define STRINGIZE(x) STRINGIZE_IMPL(x)

#define PRINT1(val) std::cout << #val " = " << val << "\n"
#define PRINT2(val) std::cout << STRINGIZE(val) " = " << val << "\n"

#define VALUE x + y

int main()
{
	int x = 1;
	int y = 2;
	PRINT1(VALUE); // immediately stringizes macro argument name
	PRINT2(VALUE); // stringizes after inner macro expansion
}

Output:

VALUE = 3
x + y = 3

Token pasting

The preprocessor operates on tokens and thus separate tokens remain separate even if as sequences of characters they would look like a single token:

1
2
3
4
5
int x = 2;
int xx = x * x;

#define MACRO() x
int y = MACRO()x; // treated as "int y = x x;" - syntax error

If concatenation of separate tokens into a larger token is desired, there is a special preprocessor operator ## that merges 2 adjacent tokens. The resulting token made from concatenation of characters from both input tokens must be valid - you can not create invalid tokens or any comment tokens (comments are removed from code before the preprocessor is run).

Similarly to stringification, the operation is commonly used through another macro to prevent situations where a macro name (instead of its expansion) would be used.

1
2
3
4
#define CONCAT_IMPL(x, y) x ## y // spaces here are not a problem
#define CONCAT(x, y) CONCAT_IMPL(x, y)

int y = CONCAT(x, x); // ok: int y = xx;

Concatenation is typically used when a macro defines a family of entities; a set of names with common prefix or suffix. This avoids creating name conflicts because each macro call will use different string that becomes the prefix/suffix.

Additional conventions

Some macros generate a lot of code - much more than a single subexpression or even entire statement. This causes few problems:

  • If the macro creates some local objects, their names might clash between usages of the macro and/or other code.

  • If the macro produces multiple statements, it can significantly impact code readability - it's not a function or other C++ construct and a human reading code may have trouble understanding how to connect other code with it. Additionally, if such macro is used under braceless if or other control flow statements, only the first statement from the macro expansion is covered by it.

A common solution to both problems is to enclose generated code within a do-while loop that runs exactly once. This guards the scope, makes it a single statement and additionally allows ; to be used after macro call to make it look as a single statement.

1
2
3
4
5
6
7
8
9
10
#define COMPLEX_MACRO(param) \
	do { \
		f1(param); \
		f2(param); \
		f3(param); \
	} while (false)

// sample usage - looks like a single statement
if (some_condition)
	COMPLEX_MACRO(some_value); // the ; completes the loop syntax

It's also very common to see while (0) because it's additionally compatible with older C standards, before C had boolean type. More examples and additional explanation: https://kernelnewbies.org/FAQ/DoWhile0.

In some cases macros have to be used at global scope (when they generate classes, enumerations or any other non-imperative code). In such situation they can add a dummy code like void no_op() (no operation) at the end so that ; immediatelly following the macro expansion forms valid, unused function declaration. The goal is to define a macro in a such way that ``;`` can be put after it, making the macro call resemble a function call.

Problematic macros

One particular header is known for the trouble it was causing in the past - <windows.h>. It was defining a macro just like the maximum one in this lesson, except it was named max, not MAX. It caused many accidental compilation errors because max is a popular name for functions and objects. std::max(a, b) had to be written as (std::max)(a, b) to avoid the macro call (the code as C++ still works but extra parentheses around the function name prevent preprocessor from considering it a macro call). This is a great example why macros and only macros should be written as UPPERCASE - otherwise there is a risk corrupting the code through accidental text replacements.

The problem is long gone since Microsoft made changes to <windows.h>. If the header is included while WIN32_LEAN_AND_MEAN is defined, it avoids including a lot of (mostly older) stuff, many lowercase and CamelCase macros in particular. Pretty much every new project that is compiled for windows will define this identifier to prevent nasty macros from destroying their code.

If when writing a project you encounter a similar issue (from a different header), you have few options:

  • rename your code

  • just #undef it if it's not needed in your code (kind of bad if someone includes your headers and expects the macro to be present though)

  • (if possible) organize your code so that the nasty includes are only present in specific source files

  • (if there is no better way) use compiler extensions like #pragma push and #pragma pop to temporarily change preprocessor state

More tricks

There are few more preprocessor features, all are pretty simple in theory but practice shows their real power is obtained by abusing preprocessor mechanism as much as possible. Because they are only used within certain niche applications and with each newer C++ standard there are less and less justified usages of macros, I'm not going to cover them in detail.

Variadic macros

If a function-like macro is defined with ..., it's a variadic macro. ... accepts any 1+ number of arguments (changed to 0+ in C++20) and can output them with __VA_ARGS__ as a comma-separated list of tokens. Since C++20 there is also __VA_OPT__(content) which expands content only if ... is non-empty.

X macros

Basically macros that abuse other macros for maximum power in code generation. Explanation on https://en.wikipedia.org/wiki/X_Macro.

One particular usage that is still relevant in C and C++ is generation of enum together with functions that convert between enum type and strings. Example such library: https://github.com/aantron/better-enums.

Maximum abuse

Boost.Preprocessor is a library for metaprogramming in C and C++ through preprocessor.