01 - introduction

You may have noticed that sometimes the code contains variables named like x1, x2, x3. Multiple objects of the same type are used for the same purpose. You may write functions to reuse logic on multiple objects, but even calling functions with a group of same variables gets repetitive and we know that duplicate code is a really bad thing.

1
2
3
4
5
6
7
8
9
int x1 = 1;
int x2 = 2;
int x3 = 3;

int y1 = f(x1);
int y2 = f(x2);
int y3 = f(x3);

// any other operation will likely have another 3 similar lines...

Object-variable relationship

Recall the difference between an object and a variable:

  • An object is data in memory.

  • A variable is an abstraction, a name that refers to (potentially multiple) objects.

So far there were examples of n:1 variable-object relation (multiple references to the same object), now it's time for 1:n (one variable for multiple objects).

Arrays

1
2
3
4
5
6
int x[3] = {1, 2, 3};

int y[3]; // (uninitialized)
y[0] = f(x[0]);
y[1] = f(x[1]);
y[2] = f(x[2]);

The example looks differently and few things need to be explained.

Syntax

Intuitively, you might expect the syntax to be a bit different:

1
2
int[10] arr; // expected: arr represents 10 integers
int arr[10]; // actual: a bit weird/inconsistent order

Recall one of the first lessons - declarator syntax is not "type followed by name" and unfortunately due to backwards compatibility, it can not be changed.

So is there any lesson about declarator syntax?

No, at least not now. But the problem is old enough that you may find various resources on the internet that may help you, e.g. https://cdelc.org. Here, I will post examples when needed as I don't think spending time trying to understand this part of C grammar is worth it. In practice, there are C++ alternatives and actual "syntax abonimations" are pretty rare so people usually just remember them instead of understanding. Explaining the C (and C++) language grammar could be a separate course on its own.

The good news for arrays is that we will not use this syntax for long, as C++11 added an alternative, which is much better and not only in syntax. But first, let's explain some aspects of arrays.

Indexing

Arrays are zero-indexed. This means that an array of size n has objects under indexes 0 ... n-1. In the example above both arrays are of size 3 and their valid indexes are 0, 1, 2.

If this is the first time you encounter 0-based indexing you might be a bit confused, but the more you dwell into programming the more appropriately it will sound. Once you gain enough knowledge, it will be obvious that zero-based indexing is the only proper way of dealing with arrays.

The primary reason for zero-based indexing is related to how arrays work underneath - variables that refer to contiguously allocated objects (next to each other) of the same type. A good comparison (which may be surprising, depending where do you live) is floor numbering - ground floor is floor 0, first floor is 1, second is 2 and so on. If you are not used to this type of indexing, I have to inform you that this is the European convention and elevators there do have a 0 floor button.

An array starts at a specific memory address and it's first element is placed right in the beginning:

1
2
3
4
5
6
7
8
9
10
11
12
#include <memory> // for std::addressof
#include <iostream>

int main()
{
	int arr[3] = {1, 2, 3};

	std::cout << std::addressof(arr) << "\n";
	std::cout << std::addressof(arr[0]) << "\n";
	std::cout << std::addressof(arr[1]) << "\n";
	std::cout << std::addressof(arr[2]) << "\n";
}
0x7ffdc1562d04
0x7ffdc1562d04
0x7ffdc1562d08
0x7ffdc1562d0c

Standard streams print memory addresses using base 16 (extra 6 digits are denoted with a-f) but you can clearly see that the address of the first array element is the same as the address of the array. Obviously if you run the same program the output may be different but first 2 lines will always be identical. This lack of difference is the reason for 0 based numbering - during pointer arithmetics (which are far from this lesson) the index is used to calculate memory address of specific object and if it wasn't 0-based indexing, the math would simply not work.

The reason for the above is simple: if you use an index that is outside valid range, a machine instruction will be generated that tries to access memory outside what has been allocated for the array. Such bug will usually either immediately crash the program or corrupt memory of something else that will later lead to a crash and/or other faulty behavior.

Array size

Array size must be a constant expression, that is, an expression that can be evaluated at compile time.

So array size must be constexpr?

Generally, this is a good mental shortcut because constexpr values can be used as array size, but many other language constructs can create constant expressions too. These constructs are mostly special rules for various use cases (especially const) that were present before C++11 introduced constexpr. The list is long but without these rules, one would need to use preprocessor or even worse tricks to manipulate constants - and you should know that C++ (unlike C) really hates macros. const in C is absent of these rules and const-qualified objects there can not be used for things such as an array size, even if the value is computable at compile time. This caused 3 conventions to emerge:

  • C: #define ARRAY_SIZE 10 - macros are the only practical solution

  • C++ < 11: const int array_size = 10; - rely on special rules intended for constant expressions

  • C++ >= 11: constexpr int array_size = 10; - use dedicated feature for constant expressions

I don't want you to remember all these special rules - there are too many of them and their practical value exists pretty much only for compiler implementers, but just to illustrate - in the example below both n2 and n3 are const-qualified objects, but only n2 is classified as a constant expression.

1
2
3
4
5
6
7
8
9
10
11
constexpr int n1 = 10;
const int n2 = 10;
const int n3 = non_constexpr_function();

int arr1[n1]; // ok
int arr2[n2]; // also ok (n2 can be evaluated at compile time)
int arr3[n3]; // error: size not a constant expression

int n4 = 10;
const int n5 = n4; // const, but not compile time
int arr5[n5]; // error: n5 is not a constant expression

My recommendation is to use constexpr and then you don't need to remember all these special rules - they were made to elevate const before constexpr was introduced into the language. In C, text-replacing macros have to be used as there is no way to make a constant expression other than writing the literal (obviously C++ code which uses macro for constants is bad code).

VLA

During some experiments, you might accidentally use a common extension known as variable length arrays. It permits code such as this:

1
2
3
int n;
std::cin >> n;
int arr[n];

This is not standard C++. The feature dates back to C89, but even in C99 it was changed from "official" to "optional" and later removed in C11. C++ never officially had VLA, some compilers simply continued to support it as a non-standard extension. Modern compilers (with standard options) should reject such code or at least output a warning.

Why was it removed from C and never was a part of C++? It seems useful.

There are multiple reasons:

  • VLA does runtime computation of stack allocation, which is very unsafe. There is no easy way to detect any failures with such operation and programs which invoke undefined behavior due to stack corruption are very prone to security exploits.

  • The same functionality (arrays with runtime changeable size) is available through dynamic memory allocation (you will learn about std::vector later in this chapter) and to some extent with preallocated fixed-size arrays.

  • The performance gain of VLA (compared to alternatives) is negligible.

Initialization

Arrays can be initialized in multiple ways. You don't need to remember how every initialization is named (hardly anyone does), it's much more important to know their effect.

1
2
3
4
5
6
7
8
constexpr int n = 5;
int arr1[n]; // uninitialized
int arr2[n] = {}; // each element is zero-initialized
int arr3[n] = {1, 2}; // first 2 elements initialized explicitly, rest are zero-initialized
int arr4[n] = {1, 2, 3, 4, 5, 6}; // ill-formed: too many initializers

// array size can be inferred if it contains a non-empty initializer:
int arr[] = {1, 2, 3}; // ok: arr has type int[3]

Array size

Because of how arrays work and how they occupy space in memory, there is a simple trick to compute their size:

1
2
3
int arr[] = {1, 2, 3};
// size = number of bytes that array occupies / number of bytes per element
constexpr int sz = sizeof(arr) / sizeof(arr[0]);

Since C++17 the same can be done using a standard function:

1
2
// std::size is available in many headers, including <array>
constexpr int sz = std::size(arr);

And before C++17, the function could be written in C++11-compatible code. This function doesn't use sizeof operator but the fact that templates can deduce array size from its type (template type deduction is a very powerful feature where the compiler can infer a lot of compile-time information):

1
2
3
4
5
template <typename T, std::size_t N>
constexpr std::size_t size(const T (&)[N]) noexcept
{
	return N;
}

Working with arrays

Since arrays are variables that refer to multiple objects you will pretty much always use loops to work with them, especially the for loop. Now you should be able to see how well a for loop is tied to arrays:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <iostream>

int main()
{
	constexpr int sz = 10;
	int powers[sz] = {};

	// the most idiomatic and classic loop that handles an array
	// in almost all situations, i should not be modified inside loop body
	for (int i = 0; i < sz; ++i)
	{
		// reminder: << here is bit shift operator
		// shifting 1 will create different powers of 2
		powers[i] = 1 << i;
	}

	for (int i = 0; i < sz; ++i)
		std::cout << powers[i] << ", ";
}

Notice how exactly the loop is written:

  • it starts with 0

  • the condition is <

  • it uses preincrement (postfix would work too but we don't need an extra copy of old i)

A common mistake is writing loop condition with <=. For an array of size n, valid indexes are 0 ... n-1, thus the loop should continue only when i is lower than the size. Rarely you might find !=, which is also valid but such code is usually written when working with iterators which will be covered later.

Whichever of < and != operators is used, after last iteration the loop control variable (i) will be equal to array size and stop the loop. Iteration with i equal to array size will never happen.

Loop control vs array size type

The C++ standard library uses size type (std::size_t) for array sizes. For historical reasons, this type is an alias of some unsigned integer (typically unsigned long or unsigned long long) which as you should remember is not a good choice - unsigned types should only be used when dealing with bit-level operations or when overflow is desired.

This causes a quite common warning:

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <iostream>

int main()
{
	int arr[] = {1, 2, 3};

	// auto deduced to std::size_t
	// std::size(arr) would deduce auto as std::size_t too
	const auto sz = sizeof(arr) / sizeof(arr[0]);

	for (int i = 0; i < sz; ++i)
		std::cout << arr[i];
}
main.cpp: In function ‘int main()’:
main.cpp:11:20: warning: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Wsign-compare]
  for (int i = 0; i < sz; ++i)
                  ~~^~~~

Most beginners will find this warning hard to understand. What's the problem here? Can't the compiler generate code that converts one integer to the type of another and then compare them?

It can. The problem is, the resulting behavior can be very surprising. The prime example is the expression -1 < 1u. Obviously -1 is smaller than 1 but surprisingly the expression evaluates to false. Why?

The cause are convertion rules. Promotion is preferred so if you compare e.g. int with long there will be no problem as the first one will be promoted and the comparison will happen between 2 long integers. But in the case of -1 < 1u, there is no promotion because both integers are of the same size: int (AKA signed) and unsigned. One must be converted to another and in both cases there is a risk that final value will not fit:

  • convert to signed: huge values will not fit

  • convert to unsigned: negative values will not fit

For historical reasons, convertion to unsigned takes place. Because of how integer convertions work, value -1 will be interpreted as the largest possible value representable in unsigned type (modulo 2 arithmetic), causing an operation like 4294967296 < 1. In other words: if you compare signed with unsigned and the signed value is negative, the comparison will evaluate to :cch:`false`. This is a common source of bugs in loops.

The solution is simple: make sure both comparison operands are of the same type. Usually it's as simple as changing the type of i, which is on the same line as the warning. Since C++20 there is also another small help: std::ssize member functions with the same name. These work just like std::size but their return type is a signed version of std::size_t, called std::ptrdiff_t (pointer difference type). Later you will also learn about other typical forms of loops (range-based, iterator-based) which do not have this problem.

What if I can not change the types?

In such case use a static_cast to convert values before comparison. TODO which convert to which? both to signed or both to unsigned?

Looping backwards

Some algorithms need to work on the array in reverse order. A typical loop would look then like this:

1
for (int i = sz - 1; i >= 0; --i)

This is fine, but breaks when i is of unsigned type as for unsigned types condition i >= 0 is always true as they can not represent negative numbers (--i on zero will overflow to the largest possible value).

It's possible to loop backward on an unsigned control variable, but one needs to do a little trick to change the order of operations:

1
2
3
4
// start with sz (which is out of bounds)
// but decrement i after each comparison, not after each iteration
// this loop will also work on signed types
for (std::size_t i = sz; i-- > 0;)

Related: SO: What is the "-->" operator in C++?.

Passing arrays

Do you remember that function argument types strip top-level const (a part of a set of implicit convertions, known as decay)?

1
2
3
4
5
void func(const int value);
void func(int value); // equivalent

void func(const char* const str);
void func(const char* str); // equivalent

This is also true for array types. The array type itself (including size information) is removed and the only thing that is left is a pointer:

1
2
3
4
5
6
7
8
9
void f(int arr[3]); // array argument of size 3
void f(int arr[]);  // array argument of unspecified size

// both above are equivalent to this
void f(int* arr); // pointer argument

// this is something different (VLA argument) and applies only to C
// since C++ (officially) has no VLAs, this is a syntax error
void f(int arr[*]);

The function declaration can use array declaration syntax for informational purposes but it has no semantic difference.

Since the array type is lost, the convention of passing arrays to functions is to pass the pointer and a size (often std::size_t). A benefit of this approach is that a function can work with multiple arrays of different sizes, only the type of objects within the array must match. In C++20 there is also a dedicated type for it - std::span.

Pointers are a complicated topic that will be explained later. For now, it's enough to understand that:

  • arrays decay into pointers (memory addresses)

  • operator [] is actually defined for pointers, not arrays

This means that once within a function, you can work with arrays exactly the same way:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <iostream>

// in C++17 you can remove this and use std::size
template <typename T, int N>
constexpr int size(const T (&)[N]) noexcept
{
	return N;
}

void print(const int* arr, int size)
{
	for (int i = 0; i < size; ++i)
		std::cout << arr[i] << ", ";

	std::cout << "\n";
}

int main()
{
	int arr1[] = {1, 2, 3};
	int arr2[] = {3, 1, 3, 3, 7};

	print(arr1, size(arr1));
	print(arr2, size(arr2));
}

You might wonder why. After all, it's possible to compute array size with the sizeof operator, right? That's true, but only for array types. Inside the function you don't have an array, only a pointer! In other words, because decay strips some type information, it's not possible to compute the size of the array after it.

1
2
3
4
5
6
7
8
int arr[] = {1, 2, 3, 4, 5, 6};
// ok: sizeof(array) / sizeof(element)
const auto sz1 = sizeof(arr) / sizeof(arr[0]);

// this is what happens when an array is passed to a function
const int* ptr = arr; // some type information is lost
// wrong: sizeof(pointer) / sizeof(element)
const auto sz2 = sizeof(ptr) / sizeof(ptr[0]);

Array limitations

The syntax of arrays in C++ has been inherited from C and various rules regarding array-related operations were too. Sadly, for backward compatibility reasons they have to remain as they were specified in C.

1
2
3
4
5
6
7
8
9
10
11
12
13
int arr1[5] = {1, 2, 3, 4, 5};
int arr2[5] = {1, 2, 3, 4, 5};
int arr3[5] = arr2; // error: arrays can not be assigned to or initialized with other arrays

// workaround
struct sarr { int arr[5]; };
sarr s1 = {1, 2, 3, 4, 5};
sarr s2 = s1; // ok

// this will compile, but behaves differently than expected
// it will compare memory adresses of arrays, not their contents
// may print a warning that comparison is always false
bool b = arr1 == arr2;

Arrays can not be copied, but structures can. Yes, kind of stupid. Soon you will learn about std::array (the proper C++ array) which does not have such limitations.

Exercise

  • Write a function that copies contents of one array to another.

  • Write a function that reverses order of elements in an array.

  • Write a function that compares whether 2 arrays are identical.

  • Call reverse twice and verify that array is identical to the state before reversal.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <iostream>

template <typename T, int N>
constexpr int size(const T (&)[N]) noexcept
{
	return N;
}

void copy(const int* input, int* output, int size)
{

}

void reverse(int* arr, int size)
{

}

bool compare(const int* lhs_arr, int lhs_size, const int* rhs_arr, int rhs_size)
{

}

int main()
{
	int arr1[] = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89};
	constexpr int sz = size(arr1);
	int arr2[sz] = {};

	copy(arr1, arr2, sz);
	reverse(arr2, sz);
	reverse(arr2, sz);
	std::cout << std::boolalpha << compare(arr1, sz, arr2, sz);
}

The compare takes 2 sizes intentionally. In practice, if you are copying you must be sure that the output array is at least as large as the input array. But for comparison, you could obtain 2 different arrays from 2 different places. This extra check is not needed for this exercise but all functions showcase canonical way they would be defined.

reversal algorithm hint

Inside the function, you don't need to make array copy or anything similar. Just swap pairs of elements that have identical distance from array ends.

solution
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
void copy(const int* input, int* output, int size)
{
	for (int i = 0; i < size; ++i)
		output[i] = input[i];
}

void reverse(int* arr, int size)
{
	const int n = size / 2;
	for (int i = 0; i < n; ++i)
	{
		int temp = arr[i];
		arr[i] = arr[size - i - 1];
		arr[size - i - 1] = temp;

		// the loop body could also be this
		// std::swap(arr[i], arr[size - i - 1]);
	}
}

bool compare(const int* lhs_arr, int lhs_size, const int* rhs_arr, int rhs_size)
{
	// if sizes are different, there is no point in going further
	if (lhs_size != rhs_size)
		return false;

	// now we can use just 1 variable for counting as both sizes must be the same
	for (int i = 0; i < lhs_size; ++i)
		if (lhs_arr[i] != rhs_arr[i])
			return false;

	return true;
}