05 - header guards

Let's recall the example multi-file program from the previous lesson. This time we will introduce multiple problems into the project in order to observe build errors and explain certain aspects of the C/C++ build process.

Build process

Before we begin, you need to understand most important steps in the process of building a C/C++ project.

Simplified list (very detailed list on cppreference page):

  • comments are removed

  • preprocessor: code (treated as text) is altered and loses all preprocessor directives

  • compilation: translation units are transformed to intermediate build object files (usually named *.o or *.obj)

  • linking: separate object files are merged by the linker to form an executable or a static/dynamic library file

If you define something non-inline multiple times in the same file, you will get a compilation error. But if such problem is across files, in most implementations such ODR violation will be caught on the linking step - in such case the problem happens between different translation units.

What is a symbol?

A symbol is a single entity for the linker, usually some intermediate code attached to a mangled name of a specific C/C++ entity that needs compilation. So if the linker outputs an error "undefined reference to ..." it means that a definition of specific entity was not found.

What is name mangling?

Name mangling is transformation of ordinary entity names into ones that can be understood and differentiated by the linker. Generally, linkers work on lower level than the programming language and since they don't understand various high-level features that can result in same or complex names, they need a mechanism to differentiate them. Simply put, name mangling is the process of encoding language-specific entity names into a form that is simple enough to avoid having to understand given programming language syntax and semantics. For more information and examples, see Wikipedia article about name mangling.

Undefined reference

To trigger this linker error a single file is enough but to illustrate a more realistic problem, we will comment out function definition to simulate typical build misconfiguration (missing source files, not everything compiled).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// main.cpp
#include "hello.hpp"
#include <iostream>

int main()
{
	write_hello();
	std::cout << " world\n";
}

// hello.hpp
void write_hello();

// hello.cpp
#include "hello.hpp"
#include <iostream>
/*
void write_hello()
{
	std::cout << "hello";
}
*/
/tmp/ccQJjzeS.o: In function `main':
main.cpp:(.text+0x5): undefined reference to `write_hello()'
collect2: error: ld returned 1 exit status

The error appears because some entity (a function in this case) was ODR-used (used in a way which requires definition) but the definition was not provided. The same error can appear if you try to use an external library and do not link to library's compiled code in the build process.

Multiple reference

This error usually appears when:

  • Some code refactoring has been done and 2 copies of the same entity were left present in different files.

  • 2 different functions accidentally have been given the same name.

  • The project contains multiple subprojects and at least 2 of them link to the same external library with incompatible settings.

To trigger the error, we will simulate a mistake in refactoring and attempt to compile 2 files with definition of the same function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// main.cpp
#include "hello.hpp"
#include <iostream>

void write_hello()
{
	std::cout << "hello";
}

int main()
{
	write_hello();
	std::cout << " world\n";
}

// hello.hpp
void write_hello();

// hello.cpp
#include "hello.hpp"
#include <iostream>

void write_hello()
{
	std::cout << "hello";
}
/tmp/ccb9JA6l.o: In function `write_hello()':
hello.cpp:(.text+0x0): multiple definition of `write_hello()'
/tmp/ccfsecFJ.o:main.cpp:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

Even in the case where both definitions are identical, it's not a good thing just to discard one definition and go further. While there is everything needed to form an executable, such situation indicates there is a configuration or code problem. For safety, build tools consider such situation as an error.

In such situation the programmer must figure out what's causing multiple definitions as it will affect the solution:

  • refactoring - remove redundant code

  • accidental same name - rename something

  • duplicated dependencies - make changes in project build recipe (this is outside C++ code)

The most helpful information in such problem is usually contained within linker error: places (files) from which each definition is coming from.

Multiple inclusion

As you should know now, some headers can be included transitively. But what happens when because of this a specific header gets included multiple times? Remember, headers are not just for declarations (which by ODR can be repeated) but also for anything that is not immediately compiled, which includes some definitions.

One particulary good example are type definitions. Defining a type does not immediately produce any compilable code. It is rather a specification how exactly code which manipulates objects of this type should behave. But defining a type multiple times is an ODR violation.

To illustrate, here is an example that defines a type and accidentally includes its definition multiple times:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// power_state.hpp
enum class power_state { off, sleep, on };

// to_string.hpp
#include "power_state.hpp"
#include <string>

std::string to_string(power_state ps);

// to_string.cpp
#include "to_string.hpp"
// (#include <string> can be safely ommited here because function declaration
// needs to include it anyway so including the header is enough)

std::string to_string(power_state ps)
{
	switch (ps)
	{
		case power_state::off:
			return "off";
		case power_state::sleep:
			return "sleep";
		case power_state::on:
			return "on";
		// no default case because this switch handles all possible values
		// if there is no default and a new enumerator is added, compilers will issue a warning
	}
}

// main.cpp
#include "power_state.hpp"
#include "to_string.hpp"

#include <iostream>

int main()
{
	auto state = power_state::on;
	std::cout << "device power state: " << to_string(state) << "\n";
}
In file included from to_string.hpp:1,
                 from main.cpp:2:
power_state.hpp:1:12: error: multiple definition of ‘enum class power_state’
 enum class power_state { off, sleep, on };
            ^~~~~~~~~~~
In file included from main.cpp:1:
power_state.hpp:1:12: note: previous definition here
 enum class power_state { off, sleep, on };
            ^~~~~~~~~~~

The main file included power_state.hpp and to_string.hpp which indirectly included power_state.hpp too. This resulted in having duplicate contents of power_state.hpp in main.cpp file.

You could probably think of a convention how to split/separate code so that such situations don't arise but it would be very annoying in practice to track code dependencies of each file.

Header guards

We can create a mechanism that automatically prevents accidental duplicate inclusion through preprocessor identifiers. The solution is simple: wrap entire contents of each header file between #ifndef, #define and #endif directives:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// power_state.hpp
#ifndef EXAMPLE_PROJECT_POWER_STATE
#define EXAMPLE_PROJECT_POWER_STATE

enum class power_state { off, sleep, on };

#endif

// to_string.hpp
#ifndef EXAMPLE_PROJECT_TO_STRING
#define EXAMPLE_PROJECT_TO_STRING
#include "power_state.hpp"
#include <string>

std::string to_string(power_state ps);

#endif

// to_string.cpp
#include "to_string.hpp"
// (#include <string> can be safely ommited here because function declaration
// needs to include it anyway so including the header is enough)

std::string to_string(power_state ps)
{
	switch (ps)
	{
		case power_state::off:
			return "off";
		case power_state::sleep:
			return "sleep";
		case power_state::on:
			return "on";
		// no default case because this switch handles all possible values
		// if there is no default and a new enumerator is added, compilers will issue a warning
	}
}

// main.cpp
#include "power_state.hpp"
#include "to_string.hpp"

#include <iostream>

int main()
{
	auto state = power_state::on;
	std::cout << "device power state: " << to_string(state) << "\n";
}

How it works? Each time a file is included, the preprocessor is required to check if a specific identifier has been defined. At first inclusion, it's not. At any later inclusion, it has been defined so entire content of the file is skipped. Because each header was given a unique identifier and separate translation units have separate preprocessing, any header content is parsed exactly once.

The identifier must be unique for each header, so to guarantee uniqueness it usually consists of the company name and/or project name, root-relative file path and sometimes a date/time when it was created.

Why source files did not get these directives?

Because only header files are supposed to be included. Only header files are shared between translation units.

Alternative guards

Since header guards are a such often used mechanism, many compilers implemented #pragma once to make it easier. Pragmas are special subset of preprocessor directives intended for implementation-defined extensions (see cppreference page about pragmas). An example header then looks like this:

1
2
3
4
// power_state.hpp
#pragma once

enum class power_state { off, sleep, on };

The usage is much simpler - less code, no unique identifier required and no #endif at the end of the file. The only disadvantage is that pragmas are not strictly standard. On the other hand, personally I had never any problems with #pragma once while I have observed many people (including myself) get into errors after broken traditional header guards (usually due to missing #endif or non-unique identifier).

There were some attempts to standarize #pragma once as it's probably the most common preprocessor extension but ultimately they failed. Reasons were many but mostly because each implementation uses different way of verifying that a file is unique and no universal solution could be agreed upon. Standarizing it as "implementation-defined solution" makes very little sense because pragmas are already under this term. Even though the feature remains an extension, many projects use this type of guard for it's simplicity and very widespread support.

Recommendation

All headers should be guarded, even if they are included only once in the entire project.

As for which type of guard to use - the choice is yours depending on whether you prefer absolute standard conformance or simpler code and convenience. Use one and be consistent.