Asynchronous Programming
The frameworks and languages of today provide tools that allow us to easily incorporate async code into our programs. However, these are usually rather hight abstractions over the actual mechanisms used to implement concurrency of some form. These hight level abstractions sometimes “leak” giving us various restrictions on what we can do, and what we cannot do. Without deeper understanding of how asynchrony works under the hood, it’s difficult to understand how our programs actually execute. This page will cover some basics around concurrency.
Asynchronous Programming allows us to split our code into tasks. Each tasks might contain a bunch of instructions. The tasks may execute concurrently (although not necessarily in parallel).
Synchronous Programs
When running our synchronous code on top of modern operating system, there’s a high chance that this code will not be executed synchronously. The OS schedules processes on our machines, and switches between them constantly. Even though from the perspective of our program, it might be 100% synchronous, in reality the OS might split it into “chunks” interlaced with other processes executions in between.
Evolution of Multitasking
The following approaches emmerged as a means to achieve some form of concurrency:
- Non-preemptive multitasking - the responsibility to let other processes run was in full control of the programmer It was programmer’s responsibility to yield control to the OS, so that another process could be scheduled
- Preemptive multitasking - OS is responsible for stopping/starting processes, freeing programmers from this responsibility. 3, Pipelining - executing CPU instructions follows some well-defined process of steps. We can parallelize those. E.g., while some instruction is being decoded, we can already start to fetch the next one from memory.
- Hyper-threading - a single CPU core might be used to execute multiple operations in parallel, if different instructions require different parts of the CPU to be involved
- Multi-core Processors - allow us to run code in parallel on multiple CPUs.
Parallel vs Concurrent
Parallelism is about throwing more cores at a problem. Concurrency is about being efficient, trying to use the resources we have in the best way possible, e.g. by calculating something while waiting for some I/O operation to complete.
Concurrency will not make single task (synchonous set of instructions) go faster, parallelism will. It might make a set of tasks execute faster though.
In actual applications, it’s often best to mix the two approaches, i.e. execute tasks concurrently on multiple cores (multi-threading).
Operating System
Assuming, that our application targets some OS as its host, we can expect the following:
- OS will schedule our process to run on the CPU
- OS will interrupt our process whenever it sees fit to let other processes to run (pre-emption)
- OS allows us to create threads to run tasks concurrently
- OS exposes its API (system calls) allowing us to access I/O, hardware. Some system calls might be blocking, while others might be async, allowing us to poll the OS for results.
Asynchronous Abstractions
Abstractions may be categorized in various ways. Here’s one:
- cooperative - tasks yield control ti scheduler whenever
they cannot progress further because they’re awaiting
some data/operation. Examples:
async
/await
in .NET, JS, or Rust. - non-cooperative - it’s the scheduler that pre-empts a running task, no matter if it can continue to run or not. Usually, tasks can also yield control back to scheduler on their own. Examples: OS threads, Go’s goroutines. Examples: fibers
Here’s another:
- stackful - each task has its own stack. Tasks can be preempted at any time, their state is preserved on the stack.
- stackless - there’s one stack. Tasks cannot be preempted in the middle
of execution. Examples: Rust
async
/await
.
OS Threads
Threads can be understood as OS threads (managed by the kernel), or user-level threads, managed by some runtime/framework.
Using OS threading is called 1:1
Threading, because each task
is mapped to its own thread.
Characteristics:
- creating threads takes time, so it’s not going to scale well when the number of tasks grows.
- each thread has its own stack - stack has a fixed size, so many threads will use much memory
- synchronization might be required to circumvent data races
- switching between threads (context switch) can be costly
- parallelism
- all threads are equal for the scheduler
- unavailable on OS-less systems
OS Syscalls
Modern OSs give us 3 kinds of I/O syscalls:
- blocking
- non-blocking
- event-queues (Linux
epoll
, macOSkqueue
, WindowsIOCP
)
Most async runtimes use OS-backed queues for high performance I/O. It’s an alternative to blocking I/O syscalls that put a thread on hold until result is ready to consume.
Non-blocking syscall lets us poll to check if result is ready via a handle given to us by the OS.
Event queue allows us to throw there multiple handles, and:
- regularly check if any of the handles is ready
- ask the OS to wake us up when any handle becomes ready
It is a kind of a hybrid between blocking and non-blocking I/O syscalls.
This is pretty similar to how async/await works in various runtimes.
Only when we’ve done all the work and need he results, we await
and
let the runtime preempt the task until the awaited
thing is ready.
Coroutines (futures)
It’s another example of M:N
threading. Each task is
represented by a state machine.
There are two kinds of coroutines:
- symmetric - yields a specific destination (like another coroutine)
- asymmetric - yields to a scheduler
Whenever we place async
keyword on a function, the
compiler rewrites it as a state machine. This is problematic
since it will change stack traces, making debugging
more difficult.
Whenever we place await
keyword, we yield control back
to the runtime scheduler. The task gets suspended
until the awaited future/promise/task gets resolved.
Compared to fibers, it’s impossible to preempt task at any point in time, it will happen only at specific points (awaits). The compiler can also “artificially” insert some preemption points.
Characteristics:
- code looks like synchronous
- no context switching (stackless)
- memory-efficient
- tasks cannot be suspended as freely as with fibers
- difficult debugging