Compilation

Our source files go through the following tools that act on them:

Preprocessor
Compiler
Assembler
Linker

Let’s explain the process based on a simple solution:

main.cpp:

#include "source.hpp"

int main() {
    add(2,3);
    return 0;
}

add.cpp:

int add(int a, int b) {
    return a + b;
}

add.hpp:

int add(int a, int b);

We can compile the app with:

g++ main.cpp add.cpp

# or to change the binary name
g++ main.cpp add.cpp -o program

The result is an executable file a.out.

Header Files

First, it’s good to understand the purpose of the header files. These files include declarations of various entities (like functions or global variables). The actual implementations (defintitions) of these functions go into the .cpp/.c files (although, the header files also might contain definitions, it’s not illegal). The implementation might change over time, which requires recompilation. The header files are less likely to change, since they only contain the signatures of functions. In other files, we’re not directly relying on the .cpp/.c files. Instead, we’re relying on the header files.

Header files are then like interfaces that are expected to not change. We are supposed to rely on them instead of on the actual implementations.

In our programs, we cannot refer to symbols that are not defined/declared. The symbol can be defined in the current file, or in some other file that is included into the current files. Additionally, a given entity can only be defined once. It’s called the One Definition Rule in the C++ Standard.

Preprocessor

The first step when compiling our program is the Preprocessing. It handles all the lines that start with the # (e.g., #include or #if, macros substitution).

To get the output of the preprocessor, we can execute:

gcc++ -E main.cpp

Here’s what we get on stdout:

# 1 "main.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "main.cpp"
# 1 "source.hpp" 1
int add(int a, int b);
# 2 "main.cpp" 2

int main() {
 add(2,3);
 return 0;
}

If we included some common library like iostream, our file would become huge after the preprocessing stage.

What preprocessor did in this case was just including the content of add.hpp directly into main.cpp.

The files that preprocessor creates have the .i extension. They are sometimes called Translation Units. These files are not generated by default, we rarely need them.

Compiler

The result of preprocessing is handed over to the Compiler. Compiler analyzes the text of the code and builds a tree (like AST - Abstract Syntax Tree).

We can dump out AST with:

g++ -fdump-tree-all-graph -g main.cpp add.cpp

The resulting .dot files can be viewed.

The next step of the compilation is the generation of the Assemby code. Here’ how we can generate Assembly code:

g++ -S add.cpp

Here’s the resulting add.s file:

  .arch armv8-a
  .file  "add.cpp"
  .text
  .align  2
  .global  _Z3addii
  .type  _Z3addii, %function
_Z3addii:
.LFB0:
  .cfi_startproc
  sub  sp, sp, #16
  .cfi_def_cfa_offset 16
  str  w0, [sp, 12]
  str  w1, [sp, 8]
  ldr  w1, [sp, 12]
  ldr  w0, [sp, 8]
  add  w0, w1, w0
  add  sp, sp, 16
  .cfi_def_cfa_offset 0
  ret
  .cfi_endproc
.LFE0:
  .size  _Z3addii, .-_Z3addii
  .ident  "GCC: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0"
  .section  .note.GNU-stack,"",@progbits

Assembler

We can generate object files from our source code with:

g++ -c main.c
g++ -c add.c

It will produce main.o and add.o files. These are blobs of machine code. They need to be joined together (via the Linker) in a proper way to have the final executable.

We can explore what’s inside of the .o files with the objdump command:

objdump -t add.o

The result is:

add.o:     file format elf64-littleaarch64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 add.cpp
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss  0000000000000000 .bss
0000000000000000 l    d  .note.GNU-stack  0000000000000000 .note.GNU-stack
0000000000000000 l    d  .eh_frame  0000000000000000 .eh_frame
0000000000000000 l    d  .comment  0000000000000000 .comment
0000000000000000 g     F .text  0000000000000020 _Z3addii

The last line lists our function add.

In general, the .o files contain:

data - the actual machine instructions, parts of our program. We will find there references to entities defined in other translation units, these are placeholders that will be filled by the Linker.
metadata - information needed by Linker to combine the object files into an actual executable. An example is the “link” between names of symbols and their addresses in memory.

Linker

Here’s how to link the object files into an executable:

g++ main.o source.o -o program # results in a.out

Linker “glues” together the .o files. It also can link dynamic libraries (.so files) that may come form the “outside” of our solution (like some standard libraries). An example of it could be “iostream”.

#include "add.hpp"
#include <iostream>

int main() {
    std::cout << add(2,3) << std::endl;;
    return 0;
}

In this program, I’m including “iostream” to make use of the cout function.

After compilation, we can have a look at the dynamic libraries being linked to our program:

ldd a.out

The result is:

linux-vdso.so.1 (0x0000ffff8ff98000)
libstdc++.so.6 => /lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000ffff8fd71000)
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff8fbfe000)
libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffff8fb54000)
/lib/ld-linux-aarch64.so.1 (0x0000ffff8ff68000)
libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000ffff8fb30000)

When compiling programs, we can specify explicitly the dynamic libraries that we want to link, with the -l flag in g++.

Resources

make and g++

Structures →