Asynchronous Programming

The frameworks and languages of today provide tools that allow us to easily incorporate async code into our programs. However, these are usually rather hight abstractions over the actual mechanisms used to implement concurrency of some form. These hight level abstractions sometimes “leak” giving us various restrictions on what we can do, and what we cannot do. Without deeper understanding of how asynchrony works under the hood, it’s difficult to understand how our programs actually execute. This page will cover some basics around concurrency.

Asynchronous Programming allows us to split our code into tasks. Each tasks might contain a bunch of instructions. The tasks may execute concurrently (although not necessarily in parallel).

Synchronous Programs

When running our synchronous code on top of modern operating system, there’s a high chance that this code will not be executed synchronously. The OS schedules processes on our machines, and switches between them constantly. Even though from the perspective of our program, it might be 100% synchronous, in reality the OS might split it into “chunks” interlaced with other processes executions in between.

Evolution of Multitasking

The following approaches emmerged as a means to achieve some form of concurrency:

Non-preemptive multitasking - the responsibility to let other processes run was in full control of the programmer It was programmer’s responsibility to yield control to the OS, so that another process could be scheduled
Preemptive multitasking - OS is responsible for stopping/starting processes, freeing programmers from this responsibility. 3, Pipelining - executing CPU instructions follows some well-defined process of steps. We can parallelize those. E.g., while some instruction is being decoded, we can already start to fetch the next one from memory.
Hyper-threading - a single CPU core might be used to execute multiple operations in parallel, if different instructions require different parts of the CPU to be involved
Multi-core Processors - allow us to run code in parallel on multiple CPUs.

Parallel vs Concurrent

Parallelism is about throwing more cores at a problem. Concurrency is about being efficient, trying to use the resources we have in the best way possible, e.g. by calculating something while waiting for some I/O operation to complete.

Concurrency will not make single task (synchonous set of instructions) go faster, parallelism will. It might make a set of tasks execute faster though.

In actual applications, it’s often best to mix the two approaches, i.e. execute tasks concurrently on multiple cores (multi-threading).

Operating System

Assuming, that our application targets some OS as its host, we can expect the following:

OS will schedule our process to run on the CPU
OS will interrupt our process whenever it sees fit to let other processes to run (pre-emption)
OS allows us to create threads to run tasks concurrently
OS exposes its API (system calls) allowing us to access I/O, hardware. Some system calls might be blocking, while others might be async, allowing us to poll the OS for results.

Asynchronous Abstractions

Abstractions may be categorized in various ways. Here’s one:

cooperative - tasks yield control ti scheduler whenever they cannot progress further because they’re awaiting some data/operation. Examples: async/await in .NET, JS, or Rust.
non-cooperative - it’s the scheduler that pre-empts a running task, no matter if it can continue to run or not. Usually, tasks can also yield control back to scheduler on their own. Examples: OS threads, Go’s goroutines. Examples: fibers

Here’s another:

stackful - each task has its own stack. Tasks can be preempted at any time, their state is preserved on the stack.
stackless - there’s one stack. Tasks cannot be preempted in the middle of execution. Examples: Rust async/await.

OS Threads

Threads can be understood as OS threads (managed by the kernel), or user-level threads, managed by some runtime/framework.

Using OS threading is called 1:1 Threading, because each task is mapped to its own thread.

Characteristics:

creating threads takes time, so it’s not going to scale well when the number of tasks grows.
each thread has its own stack - stack has a fixed size, so many threads will use much memory
synchronization might be required to circumvent data races
switching between threads (context switch) can be costly
parallelism
all threads are equal for the scheduler
unavailable on OS-less systems

OS Syscalls

Modern OSs give us 3 kinds of I/O syscalls:

blocking
non-blocking
event-queues (Linux epoll, macOS kqueue, Windows IOCP)

Most async runtimes use OS-backed queues for high performance I/O. It’s an alternative to blocking I/O syscalls that put a thread on hold until result is ready to consume.

Non-blocking syscall lets us poll to check if result is ready via a handle given to us by the OS.

Event queue allows us to throw there multiple handles, and:

regularly check if any of the handles is ready
ask the OS to wake us up when any handle becomes ready

It is a kind of a hybrid between blocking and non-blocking I/O syscalls.

This is pretty similar to how async/await works in various runtimes. Only when we’ve done all the work and need he results, we await and let the runtime preempt the task until the awaited thing is ready.

Coroutines (futures)

It’s another example of M:N threading. Each task is represented by a state machine.

There are two kinds of coroutines:

symmetric - yields a specific destination (like another coroutine)
asymmetric - yields to a scheduler

Whenever we place async keyword on a function, the compiler rewrites it as a state machine. This is problematic since it will change stack traces, making debugging more difficult. Whenever we place await keyword, we yield control back to the runtime scheduler. The task gets suspended until the awaited future/promise/task gets resolved.

async RunAsync()
{
    await RunTask1(); // yields control to scheduler
    await RunTask2(); // yields control to scheduler
}

Compared to fibers, it’s impossible to preempt task at any point in time, it will happen only at specific points (awaits). The compiler can also “artificially” insert some preemption points.

Characteristics:

code looks like synchronous
no context switching (stackless)
memory-efficient
tasks cannot be suspended as freely as with fibers
difficult debugging

Event Queues →