Fibers
Fibers are also called M:N
Threading, because M tasks will run on N OS
threads. Other names: green threads, stackful coroutines.
Green threads are all about moving the responsibility of managing threads from the kernel to user space. To do that, some kind of runtime needs to implement the functionality of a scheduler. Kernel’s scheduler switches between different threads, allowing each one to run its steps. The same thing needs to be done by the runtime now. Additionally, each task needs its own stack, and our runtime needs to load appropriate stacks between task switches. Normally, stack has a fixed size, fibers however often have dynamic/growable stacks.
Green threads cannot use blocking I/O operations, since they would block the entire process (assuming we’re using 1 native thread).
Implementation
To implement fibers, we need some way of modifying the Program Counter register.
This is necessary for our scheduler to be able to switch between different
tasks. Processes cannot write to that register, but with a bit of assembler
trickery, we’re able to set PC to our desired address via the RET
instruction.
When switching to a different task, we also need to switch the stack pointer to
the stack that belong to the new task.
This is a very simplified view on the matter. While switching the tasks, we also need to switch the registers state. The switching process must do pretty much the same context switch as the OS does when switching native threads.
Our runtime must also keep track of the state of each green thread. In the simplest case, a thread might be:
- available - thread awaiting for some code to run.
- running - the active thread
- ready - the task has some code to execute, and it’s awaiting for scheduler to let it continue to run.
The simplest scheduler implementation might look like this:
The actual switch
will have to prepare the registers for the new task to run:
Here, we can see an example of how to use fibers from the client side:
We can see that this implementation is cooperative. Threads need to call
yield_thread()
for other thread to be scheduled. Normally, the runtime should
provide libraries that yield whenever some I/O request is made. Then, using
event queues, the runtime can be notified when the thread
should be marked as ready to continue to run (because some I/O operation is
finalized).
spawn
will setup some available thread with proper context, and mark it as Ready
.
The scheduler (shown above) will switch to that thread at some point.
Summary
We can see that applications making use of green threading require some sorts of a runtime. That changes the model of how our code runs (compared to simplest compiled environments). Our program becomes kind of hosted within a runtime, which switches between threads in the background. For our program, the only thing that it “sees” is how it needs to yield control sometimes. Without deeper thought, we could even assume that those yields are pretty much the same thing as kernel doing preemption.
More genearally, I think we can list the following abstractions:
- embedded programming (think microcontrollers) - our programs truly run on bare metal. Of course, it is still some simplification, but our code has full control of the CPU
- OS-based systems - our programs are hosted within the OS, and kernel has control over the CPU. Our program cannot run without kernel’s “permission”. The kernel has to allow our programto run by scheduling its process/threads on some available CPU cores
- green threads - we’re moving even higher up. Not only the kernel controls when our process can execute, but additionally, within that process, there’s a thin layer (some runtime), which switches between our green threads.
The list above is not any official or formal categorization of how programs run. This is just a summary of how we can look at green threading in terms of abstraction and complexity that it introduces.