09 - virtual inheritance

A class can have multiple parent classes. Sometimes this can lead to inheriting from the same parent type multiple times.

The diamond problem

The diamond problem occurs when the same parent class appears multiple times in the hierarchy.

class animal { /* functionality common for all animals... */ };

class mammal: public animal { /* functionality common for all mammals... */ };
class flying: public animal { /* functionality common for all flying creatures... */ };

// bat indirectly inherits animal twice
class bat: public mammal, public flying { /* ... */ };

The problem is named diamond because the hierarchy graph looks like a diamond:

    animal
    /    \
   /      \
mammal   flying
   \      /
    \    /
     bat

Logically (considering graph theory behind inheritance) there is no problem for such relation but for programming, the essence lies in problems that arise when deciding how such code should work.

Problem 1: multiple overrides at the same level

If A has a virtual function that is overriden in B and C, and D does not override it - which implementation should be used? So far "down-most overrider" won, but now there are 2 overriders at the same level in the hierarchy.

Problem 2: duplication of base type state

If B contains state (data members) inherited from A and C also contains state inherited from A, should the D class contain 2 copies of A state?

Having 2 copies of A state seems a natural consequence of the fact that inheritance accumulates (sums up) the state from all classes involved. But in practice, we don't want Ds to have duplicates of their base-level information (bat with 2 animal states makes no sense).

If there should be only one copy of A state within D, how and where it should be stored? So far inheritance simply extended object representation in memory by appending any additional data. But if data of A needs to appear only once, object representation of D can not be made of B and C sticked together.

Problem 1 is solved in many languages by a simple requirement that if multiple overriders are present at the same level, the current class must override the function.

Problem 2 is "solved" in many languages by simply forbidding multiple implementation inheritance. In such case, besides vptr, all data members come from 1 line of inheritance which means no problems with object representation in memory - any derived class just appends its data.

Now, in C++; C++ obviously being C++, would not go for a simple solution like forbidding a feature just because it can be misused or because its implementation is complex. Many C++ features already can be misused/abused in a variety of ways (it's nothing new) and the language already deals with complex memory-level stuff like padding, size and alignment. So it's not strictly about complexity - it's more about pragmatism. Would the feature even have any practical value?

The answer is: yes, it would. And interestingly, there are actually 2 solutions.

Multiple inheritance

By default, inheritance uses a simple "accumulation mechanism" which means that each derived class is simply a sum of all of its members and members of its parent classes.

Surprisingly, there are no overriding requirements on D:

If D overrides a function from A, the situation is clear as D is clearly the down-most overrider.
If D does not override the function and leaves 2 overrides from B and C on equal hierarchy level:
- The function can not be called on the object through a reference/pointer to D (ambiguous call).
- The function can be called on the object through a reference/pointer to a different type in the hierarchy.

Weird, isn't it? The function can not be called in the context of D but can be when some parent type is used?

Aren't these rules just inconsistent? Why bases but not derived?
Isn't the restriction easily bypassed by casting a pointer/reference upwards?
What actually happens when the function is called through some base type? There are still 2 overriders at the same level.

Seems like nonsense, right? Here is the thing: there will be 2 subobjects of type A within D. This means that the graph which looks this way:

  A
 / \
B   C
 \ /
  D

actually works like this:

A   A
|   |
B   C
 \ /
  D

Multiple, different references/pointers to A are possible. Yes, you can obtain 2 different A objects from an object of type D.

There are 2 different "inheritance lines". This explains why it's possible to call the function through some base type reference - it will either be:

a reference to A that is a subobject of B
a reference to A that is a subobject of C

Because there are 2 objects of type A within D, it's formally named as ambiguous base class. Casts to ambiguous types (have different semantics / work differently / ???).

#include <iostream>

struct A    { virtual void f()  { std::cout << "A::f\n"; } };
struct B: A { void f() override { std::cout << "B::f\n"; } };
struct C: A { void f() override { std::cout << "C::f\n"; } };
struct D: B, C {};

int main()
{
	D d;

	// error: D::f is ambiguous
	// d.f();

	// error: A is an ambiguous base of D
	// (there are 2 A objects within D, this cast is unclear)
	// A& ra = d;

	// solution: add an intermediate cast
	// this informs the compiler which graph edge to follow
	// cast from D to B/C is not ambiguous
	// cast from B/C to A is not ambiguous
	A& rab = static_cast<B&>(d);
	A& rac = static_cast<C&>(d);

	// 2 distinct A objects, each with different overrider
	rab.f();
	rac.f();
}

B::f
C::f

Analogical rules follow every other member (not just functions): if there is an ambiguity which one should be used (there are 2 subobjects of type A within D), references (or pointers) must be cast upwards through a parent type which is not ambiguous.

Virtual inheritance

Adding virtual keyword when inheriting causes the compiler to deduplicate any (ambiguous) bases that have virtual specified. The final object representation contains exactly 1 instance of virtual base.

Classes which have conflicting same-level final overriders from virtual bases are required to override them again. Casts to virtual bases classes are not ambiguous since there is only 1 object of the virtualized type.

#include <iostream>

struct A            { virtual void f()  { std::cout << "A::f\n"; } };
struct B: virtual A { void f() override { std::cout << "B::f\n"; } };
struct C: virtual A { void f() override { std::cout << "C::f\n"; } };
struct D: B, C
{
	// this is required, otherwise the program is ill-formed
	void f() override { std::cout << "D::f\n"; }
};

int main()
{
	D d;

	d.f(); // ok

	// ok, there is only 1 A within D
	// the cast is not ambiguous
	A& ra = d;
	ra.f();
}

Virtual inheritance fixes the problem of duplicate information (which usually is undesired) but it comes with its costs.

Because the base is virtual, it can not be a subobject of both B and C. There can be only 1 A within D. Since 2 classes must share the same subobject, they need to know where it is inside the final object (D or something derived from it). For this reason, each inheritance hierarchy that inherits virtually adds another vptr to the final class.

To illustrate, here are the layouts of the types based on the information obtained with clang++ -cc1 -emit-llvm -fdump-record-layouts (the output contained much more information but this is not the place of the tutorial to discuss padding and alignment, only order of the data):

A:
- vtable pointer for A
- non-static data members of A

B:
- vtable pointer for B
- non-static data members of B
- A:
  - vtable pointer for A
  - non-static data members of A

C:
- vtable pointer for C
- non-static data members of C
- A:
  - vtable pointer for A
  - non-static data members of A

D:
- B:
  - vtable pointer for B
  - non-static data members of B
- C:
  - vtable pointer for C
  - non-static data members of C
- non-static data members of D
- A:
  - vtable pointer for A
  - non-static data members of A

No vtable pointer for D?

I think vtable pointer for B will actually function for both B and D. After all, D overrides B's functions and so far every class had its vtable pointer at the start. Multiple vptrs are present only to support casting so that pointers/references to base types can also expect vptr at the beginning of what memory they refer.

The shared state of A is placed at the end of any type that inherits virtually from it. The offsets can be different for different dynamic types:

In B, the state of A immediately follows the state of B.
In D, there is the state of C and D before the state of A.

Layouts are no longer subsets or supersets, each class can have its own specific order of vtable pointers and state of other classes (though some pattern can be observed). This also means that casts require additional overhead of checking the offsets (they are stored in vtables). For example, converting a pointer/reference from B to A requires different adjustment when the actual (dynamic type) of the object is B and when it is D. The distance between data of A and data of B is not the same for all types.

dynamic_cast will fail (null pointer/exception) when the destination type is ambiguous (checked at runtime)
static_cast downcast: cast is ill-formed if input type is ambiguous or is a virtual base of or base of virtual base of destination type (there is no way to perform the cast only with compile time information)

Apart from complex object layout and casts with runtime overhead there is one more thing: constructors. How should the constructor of A be called when B and C specify different initialization for A?

struct A
{
	A(char a = 'a'): a(a) {}
	virtual void f() { std::cout << "A::f\n"; }

	char a;
};

struct B: virtual A
{
	B(): A('b') {}
	void f() override { std::cout << "B::f\n"; }
};

struct C: virtual A
{
	C(): A('c') {}
	void f() override { std::cout << "C::f\n"; }
};

struct D: B, C
{
	// How is A initialized in D::D???
};

There are 2 different initializations specified for A but there is only 1 A subobject within objects of type D.

The solution is that initialization (constructor calls) of virtual bases behave similarly to virtual functions and have to be overriden:

struct D: B, C
{
	D(): A('d'), B(), C() {}
};

Note that normally this is not possible: a class, in its member init list, can only call constructors of its direct parent classes. In the case of virtual inheritance, calling constructors of indirect virtual parents is not only allowed but actually required.

Extra example - extended diamond

In this hierarchy B inherits virtually from A.

  A
 / \
B   B
|   |
C   D
 \ /
  E

The most interesting thing is that E contains 2 B subobjects that share 1 A subobject. Casts from A to B are ambiguous and their ambiguity can not be detected at compile time (dynamic_cast will fail, static_cast will be ill-formed as it does not allow converting from virtual bases). Pointers/references to A would have to be converted to refer to C or D first before being converted to B.

This is the layout:

A:
- vtable pointer for A
- non-static data members of A

B:
- vtable pointer for B
- non-static data members of B
- A:
  - vtable pointer for A
  - non-static data members of A

C:
- B:
  - vtable pointer for B
  - non-static data members of B
- non-static data members of C
- A:
  - vtable pointer for A
  - non-static data members of A

D:
- B:
  - vtable pointer for B
  - non-static data members of B
- non-static data members of D
- A:
  - vtable pointer for A
  - non-static data members of A

E:
- C:
  - B:
    - vtable pointer for B
    - non-static data members of B
  - non-static data members of C
- D:
  - B:
    - vtable pointer for B
    - non-static data members of B
  - non-static data members of D
- non-static data members of E
- A:
  - vtable pointer for A
  - non-static data members of A

Extra example - mixed inheritance

A is virtually inherited by B and C
A is inherited by D
C is virtually inherited by E and F

If the same base is inherited virtually and non-virtually, the final object contains 1 base of this type (from virtual inheritance) + 1 non-virtual base for each non-virtual inheritance. That is, only virtual bases are deduplicated.

  A     A
 / \    |
B   C   D
 \ / \ /
  E   F
   \ /
    G

A:
- vtable pointer for A
- non-static data members of A

B:
- vtable pointer for B
- non-static data members of B
- A:
  - vtable pointer for A
  - non-static data members of A

C:
- vtable pointer for C
- non-static data members of C
- A:
  - vtable pointer for A
  - non-static data members of A

D:
- A:
  - vtable pointer for A
  - non-static data members of A
- non-static data members of D

E:
- B:
  - vtable pointer for B
  - non-static data members of B
- non-static data members of E
- A:
  - vtable pointer for A
  - non-static data members of A
- C:
  - vtable pointer for C
  - non-static data members of C

F:
- D:
  - A:
    - vtable pointer for A
    - non-static data members of A
  - non-static data members of D
- non-static data members of F
- A:
  - vtable pointer for A
  - non-static data members of A
- C:
  - vtable pointer for C
  - non-static data members of C

G:
- E:
  - B:
    - vtable pointer for B
    - non-static data members of B
  - non-static data members of E
- F:
  - D:
    - A:
      - vtable pointer for A
      - non-static data members of A
    - non-static data members of D
  - non-static data members of F
- non-static data members of G
- A:
  - vtable pointer for A
  - non-static data members of A
- C:
  - vtable pointer for C
  - non-static data members of C

Additional resources: