Compilation
Our source files go through the following tools that act on them:
- Preprocessor
- Compiler
- Assembler
- Linker
Let’s explain the process based on a simple solution:
main.cpp
:
add.cpp
:
add.hpp
:
We can compile the app with:
The result is an executable file a.out
.
Header Files
First, it’s good to understand the purpose of the header files. These files
include declarations of various entities (like functions or global variables).
The actual implementations (defintitions) of these functions go into the
.cpp
/.c
files (although, the header files also might contain definitions,
it’s not illegal). The implementation might change over time, which requires
recompilation. The header files are less likely to change, since they only
contain the signatures of functions. In other files, we’re not directly relying
on the .cpp
/.c
files. Instead, we’re relying on the header files.
Header files are then like interfaces that are expected to not change. We are supposed to rely on them instead of on the actual implementations.
In our programs, we cannot refer to symbols that are not defined/declared. The symbol can be defined in the current file, or in some other file that is included into the current files. Additionally, a given entity can only be defined once. It’s called the One Definition Rule in the C++ Standard.
Preprocessor
The first step when compiling our program is the Preprocessing. It handles
all the lines that start with the #
(e.g., #include
or #if
, macros
substitution).
To get the output of the preprocessor, we can execute:
Here’s what we get on stdout:
If we included some common library like iostream, our file would become huge after the preprocessing stage.
What preprocessor did in this case was just including the content of add.hpp
directly into main.cpp
.
The files that preprocessor creates have the .i
extension. They are sometimes
called Translation Units. These files are not generated by default, we
rarely need them.
Compiler
The result of preprocessing is handed over to the Compiler. Compiler analyzes the text of the code and builds a tree (like AST - Abstract Syntax Tree).
We can dump out AST with:
The resulting .dot
files can be viewed.
The next step of the compilation is the generation of the Assemby code. Here’ how we can generate Assembly code:
Here’s the resulting add.s
file:
Assembler
We can generate object files from our source code with:
It will produce main.o
and add.o
files. These are blobs of machine code.
They need to be joined together (via the Linker) in a proper way to have the
final executable.
We can explore what’s inside of the .o
files with the objdump command:
The result is:
The last line lists our function add
.
In general, the .o
files contain:
- data - the actual machine instructions, parts of our program. We will find there references to entities defined in other translation units, these are placeholders that will be filled by the Linker.
- metadata - information needed by Linker to combine the object files into an actual executable. An example is the “link” between names of symbols and their addresses in memory.
Linker
Here’s how to link the object files into an executable:
Linker “glues” together the .o
files. It also can link dynamic libraries
(.so
files) that may come form the “outside” of our solution (like some
standard libraries). An example of it could be “iostream”.
In this program, I’m including “iostream” to make use of the cout
function.
After compilation, we can have a look at the dynamic libraries being linked to our program:
The result is:
When compiling programs, we can specify explicitly the dynamic libraries that we
want to link, with the -l
flag in g++.