Deep Dive into Inside the C++ Object Model: C++ Virtual Function Implementation Analysis (Part 3)


Theme: GitHub Highlight: Dark

“In-depth Interpretation of Inside the C++ Object Model” series has been fully updated on CSDN and my official WeChat account. Students who need it can go to my CSDN homepage to read it. Homepage address: https://blog.csdn.net/iShare_Carlos?spm=1010.2135.3001.5421 Or you can follow my official WeChat account: iShare爱分享

Read the first two articles here: In-depth Interpretation of Inside the C++ Object Model: Analysis of C++ Virtual Function Implementation (Part 1) In-depth Interpretation of Inside the C++ Object Model: Analysis of C++ Virtual Function Implementation (Part 2)

Analysis of Virtual Function and Polymorphism Implementation under Virtual Inheritance

If virtual inheritance is combined with multiple inheritance, or there are more than two layers of virtual inheritance, the compiler's support for virtual functions becomes just like entering a maze—dazzling and with intricate relationships. In fact, such designs are rarely used in actual applications, and this approach is not recommended. We will use a common example of only one layer of virtual inheritance to explain the compiler's support for virtual functions, as shown below:

#include 

class Base {
public:
virtual ~Base() = default;
virtual void virtual_func1() { printf(“%s\n”, PRETTY_FUNCTION); }
virtual void virtual_func2() { printf(“%s\n”, PRETTY_FUNCTION); }
int b = 0;
};
class Derived: virtual public Base {
public:
virtual ~Derived() = default;
void virtual_func2() override { printf(“%s\n”, PRETTY_FUNCTION); }
virtual void virtual_func3() { printf(“%s\n”, PRETTY_FUNCTION); }
int d = 0;
};

int main() {
Derived* pd = new Derived;
pd->virtual_func1();
pd->virtual_func2();
pd->virtual_func3();
Base* pb = pd;
pb->virtual_func1();
pb->virtual_func2();
delete pd;
return 0;
}

Although the inheritance relationship in the above code is just single inheritance, because it is virtual inheritance, it is not like ordinary single inheritance where the base class subobject is aligned with the start address of the complete object, and the virtual function table is shared. Due to the nature of virtual inheritance, the virtual base class subobject is shared. In most compiler implementations, it is placed at the very end of the object layout, that is, after all concretely inherited subobjects and the derived class itself. It also does not share a virtual function table with any other subobject, and has its own independent virtual function table. Therefore, the compiler will generate two virtual function tables for the above code: one for the Derived subclass, and one for the Base virtual base class. The compiler simply merges the two tables into one, and the virtual function table pointers of the two subobjects (Derived and Base) are set to point to different offset addresses. Let's look at the virtual function table in the corresponding assembly code of the above example:

vtable for Derived:
    .quad   16
    .quad   0
    .quad   typeinfo for Derived
    .quad   Derived::~Derived() [complete object destructor]
    .quad   Derived::~Derived() [deleting destructor]
    .quad   Derived::virtual_func2()
    .quad   Derived::virtual_func3()
    .quad   -16
    .quad   0
    .quad   -16
    .quad   -16
    .quad   typeinfo for Derived
    .quad   virtual thunk to Derived::~Derived() [complete object destructor]
    .quad   virtual thunk to Derived::~Derived() [deleting destructor]
    .quad   Base::virtual_func1()
    .quad   virtual thunk to Derived::virtual_func2()

The virtual function table pointer of the Derived object is set to point to the position of line 5 in the above table, and the virtual function table pointer of the Base virtual base class is set to point to the position of line 14. All of this work is done by code generated by the compiler in the default destructor. For a detailed analysis, you can refer to another article, "Behavior Behind the Compiler: Default Constructors". Because of the existence of virtual inheritance, in addition to virtual functions that support polymorphism and RTTI information, the above table also contains information to support virtual inheritance, mainly positive and negative offset values used to adjust the this pointer when needed. For example, 16 in line 2 is the offset to adjust from the start address of the Derived object to the start address of the Base virtual base class subobject, and -16 in lines 9 to 12 is used to adjust from the Base virtual base class subobject back to the start address of the Derived object. The upper part is the main table, and the lower part is the secondary table. The main table contains virtual functions defined by the Derived class: the virtual destructor, and the two virtual functions virtual_func2 and virtual_func3. The secondary table contains virtual functions inherited from the Base virtual base class, including the virtual destructor, and the two virtual functions virtual_func1 and virtual_func2. Since the virtual destructor and virtual_func2 are overridden in the Derived class, what is stored here is not the address of the actual virtual function instance, but a piece of assembly code implemented via thunk technology, which will jump to the real virtual function instance for execution.

The difficulties of supporting virtual functions under virtual inheritance mainly lie in two aspects: one is calling a virtual function from the Base virtual base class through a pointer of Derived type; the other is calling a virtual function from the Derived class through a pointer of Base virtual base class type. Their call handling is very similar to how multiple inheritance handles the second and subsequent base classes. We will explain these two points separately below.

  • Calling a virtual function in the Base virtual base class through a pointer of Derived type

In the three calls on lines 20 to 22 of the C++ code above, calls to virtual_func2 and virtual_func3 use the conventional calling method, because these two virtual functions exist in the virtual function table of the Derived class. For the call to virtual_func1, since virtual_func1 is inherited from the Base virtual base class and not overridden in the Derived class, it only exists in the virtual function table of the Base virtual base class. Before calling it, the this pointer needs to be adjusted first to make this point to the start address of the Base subobject, then we address the virtual function table via the virtual function table pointer of the Base subobject, and call the corresponding virtual function. Here is the corresponding assembly code:

mov     rax, qword ptr [rbp - 16]
mov     rcx, qword ptr [rax]
mov     rcx, qword ptr [rcx - 24]
mov     rdi, rax
add     rdi, rcx
mov     rax, qword ptr [rax + rcx]
call    qword ptr [rax + 16]

The stack space at [rbp - 16] stores the start address of the Derived object, and dereferencing it gives the virtual function table pointer (if you are not familiar with this, please refer to Memory Layout After C++ Object Encapsulation). This pointer points to the start address of the virtual function table of the Derived class, which is the position of line 5 in the table above. [rcx - 24] means offset 24 bytes upward and dereference, which points to the start of the table, and the value obtained is 16. This value is the offset used to adjust the this pointer to support virtual inheritance, as introduced earlier. Then lines 4 and 5 of the assembly code add this offset to rdi. The rdi register stores the start address of the Derived object, and rdi (as the this pointer) will also be used as the parameter when calling the virtual function in line 7. [rax + rcx] in line 6 means dereferencing after adding the 16 offset to the start address of the Derived object, which gives the virtual function table pointer of the Base subobject (pointing to line 14 in the table above). Then adding the 16 offset when calling in line 7 gives the address corresponding to the virtual_func1 virtual function, which is line 16 in the table above.

  • Calling a virtual function of the Derived class through a pointer of Base virtual base class type

Calling the virtual destructor of the Derived class and the virtual_func2 virtual function through a pointer of Base virtual base class type uses the same implementation method, namely thunk technology. So we will explain them together. Let's first look at their assembly code:

virtual thunk to Derived::~Derived() [deleting destructor]:# @virtual thunk to Derived::~Derived() [deleting destructor]
    push    rbp
    mov     rbp, rsp
    mov     qword ptr [rbp - 8], rdi
    mov     rdi, qword ptr [rbp - 8]
    mov     rax, qword ptr [rdi]
    mov     rax, qword ptr [rax - 24]
    add     rdi, rax
    pop     rbp
    jmp     Derived::~Derived() [deleting destructor] # TAILCALL
# The code for the other virtual destructor is almost identical, so it is omitted here

virtual thunk to Derived::virtual_func2():# @virtual thunk to Derived::virtual_func2()
push rbp
mov rbp, rsp
mov qword ptr [rbp - 8], rdi
mov rdi, qword ptr [rbp - 8]
mov rax, qword ptr [rdi]
mov rax, qword ptr [rax - 40]
add rdi, rax
pop rbp
jmp Derived::virtual_func2() # TAILCALL

The scenario of calling the virtual destructor of the Derived class through a Base pointer is: a Base pointer points to a Derived object, then the delete function is called to release the object. At this time, the virtual destructor in the virtual function table of the Base subobject is called, which is a piece of assembly code implemented by thunk technology. The virtual_func2 virtual function is defined in the Derived class, and overrides the virtual_func2 virtual function from the Base virtual base class, so it exists in two virtual function tables, but there is only one actual function instance. The entry in the Base virtual base class's virtual function table stores a piece of assembly code implemented via thunk technology.

Both of the above functions are assembly code generated via thunk technology, and the code content is basically the same, except that the last line jumps to different functions for execution. First, the this pointer (stored in the rdi register, which points to the address of the Base subobject at this point) is saved to the stack space at [rbp - 8], then the value is loaded and saved to the rax register. The value obtained here is the virtual function table pointer of the Base subobject, which points to the position of line 14 in the table above. Then we subtract an offset of 24 (or 40) and dereference, and the value obtained in both cases is -16. This value is then added to rdi, which stores the address of the Base subobject. After offsetting 16 bytes downward, we return to the start address of the Derived object, then jump to the corresponding function for execution.

“In-depth Interpretation of Inside the C++ Object Model” series has been fully updated on CSDN and my official WeChat account. Students who need it can go to my CSDN homepage to read it. Homepage address: https://blog.csdn.net/iShare_Carlos?spm=1010.2135.3001.5421 Or you can follow my official WeChat account: iShare爱分享


This is a discussion topic separated from the original topic at https://juejin.cn/post/7369114828076089380