Marcin Jahn | Dev Notebook
  • Home
  • Programming
  • Technologies
  • Projects
  • About
  • Home
  • Programming
  • Technologies
  • Projects
  • About
  • An icon of the Core section Core
    • Programs Execution
    • Stack and Heap
    • Garbage Collection
    • Asynchronous Programming
      • Overview
      • Event Queues
      • Fibers
      • Stackless Coroutines
  • An icon of the Functional Programming section Functional Programming
    • Fundamentals of Functional Programming
    • .NET Functional Features
    • Signatures
    • Function Composition
    • Error Handling
    • Partial Application
    • Modularity
    • Category Theory
      • Overview
      • Monoid
      • Other Magmas
      • Functors
  • An icon of the .NET section .NET
    • HTTPClient
    • Async
      • How Async Works
      • TAP Tips
    • Equality
    • Comparisons
    • Enumerables
    • Unit Tests
    • Generic Host
    • Logging
    • Configuration
    • Records
    • Nullability
    • Memory Management
    • IL and Allocations
    • gRPC
    • Source Generators
    • Platform Invoke
    • ASP.NET Core
      • Overview
      • Middleware
      • Razor Pages
      • Routing in Razor Pages
      • Web APIs
      • Filters
      • Identity
      • Validation
      • Tips
    • Entity Framework Core
      • Overview
      • Testing
      • Tips
  • An icon of the Angular section Angular
    • Overview
    • Components
    • Directives
    • Services and DI
    • Routing
    • Observables (RxJS)
    • Forms
    • Pipes
    • HTTP
    • Modules
    • NgRx
    • Angular Universal
    • Tips
    • Unknown
  • An icon of the JavaScript section JavaScript
    • OOP
    • JavaScript - The Weird Parts
    • JS Functions
    • ES Modules
    • Node.js
    • Axios Tips
    • TypeScript
      • TypeScript Environment Setup
      • TypeScript Tips
    • React
      • React Routing
      • MobX
    • Advanced Vue.js Features
  • An icon of the Rust section Rust
    • Overview
    • Cargo
    • Basics
    • Ownership
    • Structures
    • Enums
    • Organization
    • Collections
    • Error Handling
    • Generics
    • Traits
    • Lifetimes
    • Closures
    • Raw Pointers
    • Smart Pointers
    • Concurrency
    • Testing
    • Tips
  • An icon of the C/C++ section C/C++
    • Compilation
    • Structures
    • OOP in C
    • Pointers
    • Strings
    • Dynamic Memory
    • argc and argv Visualization
  • An icon of the GTK section GTK
    • Overview
    • GObject
    • GJS
  • An icon of the HTML & CSS section HTML & CSS
    • HTML Foundations
    • CSS Foundations
    • Responsive Design
    • CSS Tips
  • An icon of the Unity section Unity
    • Unity
  • An icon of the Algorithms section Algorithms
    • Big O Notation
    • Array
    • Linked List
    • Queue
    • Hash Table and Set
    • Tree
    • Sorting
    • Searching
  • An icon of the Architecture section Architecture
    • What is architecture?
    • Domain-Driven Design
    • ASP.NET Core Projects
  • An icon of the Core section Core
    • Programs Execution
    • Stack and Heap
    • Garbage Collection
    • Asynchronous Programming
      • Overview
      • Event Queues
      • Fibers
      • Stackless Coroutines
  • An icon of the Functional Programming section Functional Programming
    • Fundamentals of Functional Programming
    • .NET Functional Features
    • Signatures
    • Function Composition
    • Error Handling
    • Partial Application
    • Modularity
    • Category Theory
      • Overview
      • Monoid
      • Other Magmas
      • Functors
  • An icon of the .NET section .NET
    • HTTPClient
    • Async
      • How Async Works
      • TAP Tips
    • Equality
    • Comparisons
    • Enumerables
    • Unit Tests
    • Generic Host
    • Logging
    • Configuration
    • Records
    • Nullability
    • Memory Management
    • IL and Allocations
    • gRPC
    • Source Generators
    • Platform Invoke
    • ASP.NET Core
      • Overview
      • Middleware
      • Razor Pages
      • Routing in Razor Pages
      • Web APIs
      • Filters
      • Identity
      • Validation
      • Tips
    • Entity Framework Core
      • Overview
      • Testing
      • Tips
  • An icon of the Angular section Angular
    • Overview
    • Components
    • Directives
    • Services and DI
    • Routing
    • Observables (RxJS)
    • Forms
    • Pipes
    • HTTP
    • Modules
    • NgRx
    • Angular Universal
    • Tips
    • Unknown
  • An icon of the JavaScript section JavaScript
    • OOP
    • JavaScript - The Weird Parts
    • JS Functions
    • ES Modules
    • Node.js
    • Axios Tips
    • TypeScript
      • TypeScript Environment Setup
      • TypeScript Tips
    • React
      • React Routing
      • MobX
    • Advanced Vue.js Features
  • An icon of the Rust section Rust
    • Overview
    • Cargo
    • Basics
    • Ownership
    • Structures
    • Enums
    • Organization
    • Collections
    • Error Handling
    • Generics
    • Traits
    • Lifetimes
    • Closures
    • Raw Pointers
    • Smart Pointers
    • Concurrency
    • Testing
    • Tips
  • An icon of the C/C++ section C/C++
    • Compilation
    • Structures
    • OOP in C
    • Pointers
    • Strings
    • Dynamic Memory
    • argc and argv Visualization
  • An icon of the GTK section GTK
    • Overview
    • GObject
    • GJS
  • An icon of the HTML & CSS section HTML & CSS
    • HTML Foundations
    • CSS Foundations
    • Responsive Design
    • CSS Tips
  • An icon of the Unity section Unity
    • Unity
  • An icon of the Algorithms section Algorithms
    • Big O Notation
    • Array
    • Linked List
    • Queue
    • Hash Table and Set
    • Tree
    • Sorting
    • Searching
  • An icon of the Architecture section Architecture
    • What is architecture?
    • Domain-Driven Design
    • ASP.NET Core Projects

Memory Management

.NET uses Garbage Collection to free up memory from unused objects, that is objects that are unreachable from the application root.

Types of Data

Value Types

Value types are not part of Garbage Collection cleanup process. They live on the stack and they’re naturally added and removed to/from it.

Benefits of value types:

  • less memory - data is stored directly, without a pointer (or actually without two pointers, because reference types have two pointers)
  • faster access to fields because of direct access
  • avoids allocations (unless boxed)
  • no inheritance, so no overhead of devirtualization
  • stored on the stack or register (or heap if boxed or part of something else on a heap)

Boxing

Every value type has a corresponding reference type. E.g., for int, it’d be System.Int32. When boxing, runtime first creates a new instance of reference type (as usual, with header and Method Table pointer), and then it copies data from value type.

Unboxing is cheaper, it’s just a copy of data from heap to stack.

Reference Types

They are composed of two parts:

  • reference - an adress to data. The reference itself is copied by value, just like a value type.
  • data - memory region pointed to by reference.

Reference is like a “classic” pointer, with some runtime safety on top. Reference Type is beneficial when a particular data outlives a stack frame. Then heap allocation is actually useful.

.NET runtime can infer that a given instance of reference type does not “escape” the stack frame and it might place it on stack, like a vlue type instance! However, it will be stored exactly like it would be stored on the heap - with Method Table pinter and header.

Reference Type Structure

Each object on the heap has:

  • header - stores any metadata that might be attached to a given object. E.g., lock on an object, cached value of GetHashCode, some GC info.

  • Method Table pointer - it’s an address to entry in Method Table structure about the type of an object. There are information such as listing of all methods, data for reflection, and others. The reference to any object actually points at the Method Table pointer!

    ┌──────────────┬──────────────┬──────────────┬──────────────┬──────────────┐
    │ Header │ Method Table │ Data │ Data │ Data │
    │ │ Pointer │ │ │ │...
    └──────────────┴──────────────┴──────────────┴──────────────┴──────────────┘

    Therefore, sometimes we say that the Header lays on a negative index, because to get to it, you have to subtract from the reference’s pointer.

  • Data - the actual values (or references) that are part of the class. Even for empty classes (without fields) there will be one place (8B), because GC expects it.


The smallest object will take 24 bytes (on a 64-bit system) or 12 bytes (on a 32-bit system):

  • 8 bytes for a Header
  • 8 bytes for the Method Table pointer
  • 8 bytes for Data

A struct instance with 1 byte field will take 1 byte of space (or a bit more due to alignment).

A class with 1 byte field will take 24 bytes of space!

ref

For value types, this keyword allows us to pass value types by reference (so the same way as reference types do by default).

For reference types, it acts like a pointer to a pointer. We can chane the object that the passed reference variable points to!

null

It’s a reference that points at an address of 0. Wen we try to access that memory, OS raises exception (CLR catches that and rethrows). The whole first page of virtual memory is an illegal access space.

Data Locality

Structs are better in data locality. An array of structs has each item one next to the other. For classes, only references to items are next to each other, but accessing consecutive items requires:

  1. Moving to next reference
  2. Following the referenece’s pointer to get to actual item

When loading array into cacheline, for structs case we load the actual data. For classes case, we load references to items into cacheline. Each item can be in completely different place in memory. Program will be slower due to more cacheline loads (+ dereferencing).

Garbage Collector

High Level Overview

When garbage collection gets triggered, the following occurs (assumes Workstation, non-concurrent):

  1. Suspend all managed threads.
  2. The thread that triggered GC, starts to execute GC code.
  3. GC decides which generation to condemn (condmned generation + all generations below will be collected)
  4. Mark Phase
  5. Plan Phase
  6. Collection (Sweep or Compaction)
  7. Resume managed threads.

Regions

Memory is not stored in one huge bag. Instead, GC has a few regions and objects fall into them based on:

  • their size
  • their lifetime

Various GCs choose different parameters for memory partitioning, like objects mutability, type, and others. .NET uses mostly just size and lifetime. Regarding size, GC heap is split into 2 regions:

  • SOH - Small Objects Heap - objects smaler than 85_000 bytes
  • LOH - Large Objects Heap - objects larger or equal 85_000 bytes.

The 85000 bytes threshold can be increased via configuration.

SOH uses compacting, because moving small objects around is not that hard/expensive. LOH uses sweep collection by default. Howeverm we can invoke LOH compaction as well. The 85000 bytes threshold is computed with shallow object size. Therefore, LOH mostly constains large arrays and strings. Object with array field (even huge) stores just a reference to that array, so array size is not included in object’s size by GC.

Typically, SOH contains much more objects than LOH, because 85000 bytes is a rather big value. LOG gets collected only when Gen 2 is collected.

On 32-bit runtime, LOH will also contain arrays of doubles with min 1000 elements (8000 bytes of size). It’s due to alignment issues. 32-bit has 4 byte alignment and double takes 8 bytes. LOH has 9 byte alignment on 32-bit runtime.

Generations

SOH has lots of objects normally, so it’s split further by lifetime of these objects, into generations.

  • SOH
    • Gen 0 - for ephemeral objects, like objects allocated in some method, that are no longer in scope after that method’s execution is over. Gen 0 deallocation is the cheapest, objects basically land on top of the heap, and are removed from there as soon as the method’s execution is over.
    • Gen 1 - also for ephemeral objects. Objects that go untouched after at least one Gen 0 pass are Gen 1. This could be an object being assigned to some property of a class instance.
    • Gen 2 - for long-lived objects that have stayed alive after a few Gen 1 passes. It could be a property of a static class, or some other objects that live through the whole lifetime of an app.
  • LOH (Large Object Heap) - objects bigger than 85 kilobytes.
  • POH - Pinned Objects Heap, a place for objects that should not be moved (also compacted), for various reasons, but mostly when these objects are exposed to unmanaged code.

Moving up the generations, the cost of GC becomes higher (Gen 0 is the cheapest). Therefore, whenever possible, it’s good to keep our objects in the Gen0/1 range, or, even better, use value types.

Layout

In .NET 7+, memory is split into Regions. In earlier versions, it was split into Segments. Each SOH generation has its own region. Also LOH, POH and NGCH have their own regions.

During runtime startup the following occurs:

  1. Runtime tries to allocate continuous area of memory (if possible).
  2. Runtime creates regions for each part of managed memory. POH and LOH are 8 times larger than SOH generation region size. (NGCH is created later, on demand, in other part of the address space).
  3. One page of each region is committed.

In Computer Science, the following observations have been made:

  • weak generational hypothesis - most yung objects live short
  • the longer the object lives, the more likely it is to continue living

Therefore, it makes sense to collect young objects more frequently than older objects. Objects change generations (up) via promotion. Different generation can be handled in various ways. For example, there could be separate memory regions for different generations. Any time a promotion occurs, an object would have to be copied to a different area of memory. That would be quite expensive though. Another approach would be to have one memory region, but with dynamic thresholds that separate generations. Each time promotion occurs, objects that are close to each other are promoted together (because they are placed in memory “chronologically” one next to the other). Threshold is moved “to the right” while objects stay in place. This simple description assumes no compaction, just sweep.

.NET uses a kind of the second approach that I described. There are 0, 1, and 2 generations. Each collection survival is a promotion. There are exceptions to that, sometimes promotion does not occur (it’s called a demotion then, although objects stay in the same gneration, they never go to younger generation).

Mono uses the first approach, it copies objects to different memory area. Also, it has just 2 generations.

GC.GetGeneration() allows to get generation of any object. For LOH and POH, it returns 2. We can track generation sizes with dotnet counters. Gen 0 size, for legacy reasons, is not really Gen 0 size, but rather an allocation budget for Gen 0.

To find memory leaks, it’s best to look at Gen 2 size. Objects that stay in memory for too long will eventually go to Gen 2 and its size will keep on increasing. An ever increasing size of Gen 2 is a sign of a potential problem.

Control Values

GC has varous config values that allow it to decide on various options, like whether to use sweep or compacting collection, or even how often to trigger collections.

Allocation Budget

Allocation Budget is the amount of memory that is allowed until GC of a given generation will occur. Each generation has different setting for it. For gen 1 and 2, promotion from lower gens is treated the same as allocation (there’s no way to allocate directly in gen 1 or 2, so promotion is the only way to “allocate” there).

Fragmentation is the total size occupied by “free objects”.

Allocation Budget changes between GCs. In general, the more objects survive a collection, the higher the new allocation budget. The reasoning is: if in a given generation not much had to be cleaned up, then probably we should wait longer until more unreaahble objects accumulate.

Performance Counter for Gen 0 size is actually not the Gen 0 size, but rather it’s its allocation budget (for legacy reasons).

GC Triggers

Garbage Collection may be triggered in various ways:

  • when theree’s not enough space in some generation for new alloation. It’s related to allocation budget as well.
  • explicit call to GC.Collect(…).
  • low memory notification - (only on Windows OS) - OS might inform apps that memory is low. Apps might ignore it or do something about it. .NET triggers collection.
  • various internal events
  • explicit call to GC.TryStartNoGCRegion - this method actually allows us to NOT run GCs in a given region of code exectuion. However, by default, when we run it, runtime will trigger GC just before starting to execute the no-GC code region. It can be disabled though.

When GC on a given generation gets triggered in any of the ways above, runtime might decide to also collect higer generation. This might occur when:

  • some higher gen has exceeded ints allocation budget
  • long time passed since some higher gen got collected last time. For gen1, it’s 10 seconds, for gen 2 it’s 100 seconds. It happens only in Workstation mode.
  • low Card Table efficiency
  • big fragmentation
  • before OutOfMemoryException

Stop the World

When GC pauses all managed threads to do its job, it’s called “Stop the World” sometimes. It’s also called “Execution Engine (EE) Suspension”.

Managed threads are stopped, because GC can do its thing only when it’s safe to do so. A given thread may be in one of two states:

  • cooperative - runtile must cooperate with the thread to get it to safe point for it to become preemptive
  • preemptive - the thread is executing code that does not touch any GC-managed references (like non-managed code).

EE Suspension forces all threads to be preemptive. When a thread gets to preepmptive state, its state is stored in GC Info structure. Thanks to it, when resume occurs, each thread will continue right where it finished.

Safe points are during calls to other methods. There is actually a special GC poll method that forces a safe point to occur. It’s used in various strategic places in runtime code, but it might also get placed in app code by JIT, especially in loops of long/undefined iterations count. JIT might also emit code that is “fully interruptible” - in such code, not only method calls are safe, but every instruction invocation is safe. That’s pretty expensive though, because every instruction needs to have GC Info.

Mark Phase

As described in Garbage Collection article, a part of GC is Mark Phase. This is when GC looks for objects that are still alive. Mark Phase cannot just traverse the whole graph of objects, cause it would invalidate the whole idea of generations. Younger generations should be collected more often than older generations, as we established before. Traversing the whole graph would take us through a mix of generations. Objects from younger generations might point to older generation objects, and vice versa. If we marked only objects that are reachable from young generations, we would end up deleting objects that are still needed (but are pointed to only from older generatin objects).

So, in addition to traversing from young objects, GC keeps a RememberedSet. RememberedSet contains references from older to younger generations. When collecting younger generation, both traversal in young generation and RememberedSet are checked.

For young-to-old references, a similar issue would occur. Traversing objects in gen 1 would not mark objects that are referenced only from gen 0. Another RememberedSet would be needed. It’s non-trivial to have it, so instead the following assumption has been agreed on:

Collection in gen N collects also all previous generations! So, for example, a Gen 2 collection would also collect 0 and 1.

The following collections are possible then:

  • only 0
  • 0 and 1
  • 0, 1, 2, LOH (full GC)

RememberedSet is updated anytime some older generation object has some reference added to younger generation object. It’d be a huge overhead to check such condition everytime, so it’s heavily optimized and done only when needed by JIT compiler. This check is done in the Write Barrier of Mutator.

RememberedSet is implemented in a bit more convoluted way than just as a collection of data. It’s implemented with the help of Card Tables. That’s because managing references per every single older generation object would be an overkill in apps with lots of objects. Instead, the memory of older generation is divided into 256 byte-wide regions (on 64-bit runtime) and every region might contain some objects (0 or more). Anytime a reference is written into some object in older generation, its region (card) is marked dirty. When collection of younger generation occurs, both traversals in young gen and dirty cards are checked to find all reachable objects. One card is 1 bit (dirty/clean) in memory.

Management of Card Table happens during Write Barrier. There are a few implementations of Write Barrier and runtime will use any of them depending on various conditions.

With Card Table technique, GC tracks groups of objects instead of tracking them individually. Still, there’ll be lots of cards for apps that use lots of memory. Therefore, runtime does not check cards individually. Instead, it creates words (and even bundles) of cards. Each bit in a bundle represents 1024 cards. So, when if a given bundle card is clean, we can skip 1024*256 bytes of memory from reference scanning! It saves a lot of time, which is important, because young generation collections must be fast.

Some objects (e.g., arrays) might be bigger than a single card (256 bytes). If such huge object has reference to younger gen object, during mark phase only a part of that object will be scanned (the one that belongs to a dirty card).

When Mark Phase occurs, GC “knows” which gens will be collected, otherwise it’d have to go back to Mark Phase to check some potential higher gen.

Implementation Details

Objects get marked (as reachable) by modifyng the MethodTable pointer. The space for the pointer is actually bigger than necessary and two lowest bits are unused, so it’s safe to modify 1 bit of it. The modified bit will be reset back during the Plan Phase (to prepare it for potential next Mark Phase).

Roots

Objects marking starts from roots.

Local Variables Roots

Local variables are one of the most common roots. Some of them are short-lived (like variables within some short method), or long-lived (like variables created in Main(...)). Anytime we create some object in some method, we create a new Local Variable Root for Mark Phase.

Local Variable Roots may be stored on the stack, or directly in a CPU register (just the referene of course, referenced data lives on the heap).

Lifetime of local variables is controlled by GC Info struct. It contains a listing of all local variables that should stay alive at a given instruction.

Scope

Intuitively, we might think that a given method’s variables are “safe” from being collected while it’s executing. However, runtime might collect objects pointed to by loal variables as soon as they stop being used.

Example:

public MyObject DoSomething()
{
var myObj = new MyObject();
myObj.Value = 38;
DoSomethingElse();
// myObj is not needed anymore, it might get GC'ed at this point!
var anotherObj = new MyObject();
return anotherObj;
}

myObj might be collected at the “safe point” after its last use. That safe point is the call to DoSomethingElse(). Assuming typical case of method calls being safe points, that’s where GC might occur. Such behavior is called Eager Root Collection.

Debug Configuration

Eager Root Collection will occur only in Release configuration (and only once Tier 1 is reached). Debug mode will keep all local variables alive during the whole method execution time, for the needs of debugging. It’s convenient to be able to see all variables in a given lexical scope while in debug session.

This difference between Debug and Release might, in rare cases, result in program executing differently on these two configurations! We could imagine a case where some timer could get collected before its callback triggers. In well designed API it wouldn’t occur, such timer would probably implement IDisposable, suggesting the use of Dispose when timer is no longer needed. Such timer.Dispose() call, somewhere at the end of the method would “show” to the runtime that timer is in use.

There’s also a way to keep a given objet alive explicitly, via GC.KeepAlive(obj). It’s an empty method, it’s only function is to “pretend” that a given object is being utiliezd somehow at some point of code execution, so that object would not get collected prior to that.

Even this might get collected while its method is executing!

As .NET source says:

The JIT is very aggressive about keeping an object’s lifetime to as small a window as possible (…)

Cross-Generation References Roots

Another kind of root are references from older to younger generation(s). While marking, it goes through the previously described RememberedSet and Card Tables.

GC Handle Roots

Strong and Pinned handles are another kind of roots. They are called explicitly via GC.Alloc(...).

Plan Phase

It figures out whether GC should compact or if sweep is good enough. While doing so, it actually simulates the compaction scenario to see if it would yield big enough benefits. The results of that simulation are not wasted, both compaction and sweep approaches use the results to do their job.

Simplifying a bit, planner looks at managed heap(s) and marks groups of objects as either plugs or gaps:

  • plug - a grouping of neighbouring objects that should be retained after GC
  • gap - a grouping of neigbouring objects that should get collected

Then, it simulates the needed move operations to relocate plugs into free spots left by gaps (or other holes that are alrady there). It takes into account pinned objects, which have to stay in place.

A final result creates a pattern of PGPGPG…:

┌──────────────┬────────┬──────────────┬──────┬─────────────┬──────────┬───────┐
│ Gap 1 │ Plug 1 │ Gap 2 │Plug 2│ Gap 3 │ Plug 3 │ Gap 4 │
└──────────────┴────────┴──────────────┴──────┴─────────────┴──────────┴───────┘

Each plug has its own plan of where it’d be moved. The plans are stored within the gaps that preceed a given plug! It’s efficient, because no additional memory gets allocated. Additionally, it’s optimizing CPU cache use since runtime is operating on the same part of memory while working on plugs/gaps.

First Plug

To make sure that every plug has its plan, every Region starts from 24 bytes of empty space, to be used as a gap.

A plan for one plug takes 24 bytes of data. It consists of:

  • gap size
  • plug relocation offset
  • left and right child offsets (for Binary Search Tree)

The Plan Phase and its strategy of using gaps to store plans is the exact reason of why reference types will always consume some memory (24 bytes), even when they are seemingly empty (have 0 fields). A single object might become a Gap after all.

Plans are organized into Binary Search Tree, that’s why each plan contains offsets to get to the left and right children of a given “plan node”. In practice, there are many plug trees. One global tree would be too big for efficient traversal. Each such tree covers 4kB of memory. Each such region is called a Brick. Bricks are put into Brick Table with all plug trees (SOH-only).

Pinned Objects

Pinned Objects introduce complications in the compaction planning. Here are just some of the problematic scenarios:

  • a pinned plug might come right before some “normal” plug - there’s no gap to store the plan! As a compromise, the pinned plug gets extended to also cover the not pinned plug.
  • a pinned plug might come right after some “normal” plug - the plan of the pinned plug is actually stored onto the plug before it, overwriting some (24) bytes there. Overwritten data gets stored in a remote location - a pinned plug queue, to be restored before thread(s) get resumed.

It’s worth to mention that pinned plugs’ plans contain offset of 0, for obvious reasons.

Demotion

We could imagine another case with pinner objects. After compaction, we could end up with situation like this:

| GENERATION 0 |
| |
+---+-------+----+----------------------+----------+
|###|///////|$$$$| |**********|
|###|///////|$$$$| |**********|
+---+-------+----+----------------------+----------+
| |
| #, /, $ - various compacted objects |
| * - pinned object |

The generation 0 got collected, with compaction, now, normally, the objects would all get promoted to Generation 1. However, there’s a lot of empt space, which would make Generation 1 fragmented.

Therefore, in cases like that, a Demotion might occur.

| GENERATION 1 | GENERATION 0 |
| | |
+---+-------+----+----------------------+----------+
|###|///////|$$$$| |**********|
|###|///////|$$$$| |**********|
+---+-------+----+----------------------+----------+

It is also possible for a given object to be moved to a lower generation!

LOH and POH

The described Brick Table approach is applied only to SOH. In LOH it’s simplified. First of all, LOH is almost never compacted. By design, LOH objects are huge. Moving them around is an expensive operation. However, when compaction does occur, the following facts are worth noting:

  • each object becomes a separate plug - objects are big enough to be considered individually
  • every object in LOH has 24 bytes of space prior to it - a space for plug plan

LOH compaction will occur when:

  • GC is caused by the risk of OOM
  • it’s called explicitly by user code
  • not enough memory for allocation is available
  • fragmentation is high

Sweep and Compact Phase

This phase is the heaviest during GC cycle.

During Sweep, gaps are turned into Free Objects. Any regions that become empty, get deleted. For POH and LOH, objects are scanned directly, without going through Brick Table.

During Compaction, objects are moved around, nad references to them get updated.

Allocation

In non-managed environments (like C/C++), an action to create an objet goes directly to the OS to get memory. In .NET, there’s runtime in between. The runtime will allocate a big chunk of memory from the OS before the app even needs it. That speeds up the process when the memory is actually needed. This is one of the “tricks” that managed environments use to be faster than non-managed ones.

.NET has two wasy of allocating memory:

  • bump pointer allocation (faster)
  • free-list

When allocating new objects, it happens on:

  • SOH - only Gen 0
  • LOH
  • POH

Bump Pointer

Objects Zeroed memory Non-reserved
┌────────────┬────────────────┬─────────────┬────────────────
│████████████│ │░░░░░░░░░░░░░│▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓.
│████████████│ │░░░░░░░░░░░░░│▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓.
│████████████│ │░░░░░░░░░░░░░│▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓.
└────────────┴────────────────┴─────────────┴────────────────
└──────────Committed───────────┘
↑
Allocation
pointer

When object needs to be allocated, the allocator pointer gets moved “to the right”, and the space on the left is the new allocated space. Runtime zeroes memory even before the allocation to speed it up. If some memory is not zeroed yet, it will happen ad-hoc. The whole commited space that is ready for allocations is called allocation context. When we need to grow the alloation context, we grow it by allocation quantum (8kB on SOH). Each managed thread has its own allocation context. Thanks to it, creating objects does not require synchronization (and is faster). These multiple allocation contexts might live in one region. When sweepin collection runs, we would end up with lots of holes of free space before allocation pointer(s). Compaction would resolve it, but it’s expensive. Instead, .NET creates allocation contexts in those holes! Compaction still occurs, but less often.

Free-list

In Free-list, we have to keep note of free spaces in memory. We do that in buckets. Each bucket is a threshold “less or equal x bytes”. Each “hole” in memory is assigned to some bucket. When allocating objects, we find the first bucket that has big enough spaces to fit it.

The free spaces managed in free-list are actually kind of objects themselves. They have a MethodTable pointer in their header area that points to entry of “free object”.

As mentioned, allocations for new objects happen only on Gen 0, LOH and POH. However, we also need to manage promotions between generations. It’s very similar to allocation of new objects though, so Gen 1 and Gen 2 have their own free-list buckets as well. Gen 0 and Gen 1 of SOH have each just one bucket; Gen 2 has 12 of them; LOH has 7; POH has 19.

Zeroing of memory is needed only for Gen 0 allocations. Higher generations copy “filled” objects, so zeroing them would be redundant.


SOH uses pointer bumper technique first, and fallbacks to free-list if the first method fails. If there’s not enough memory, GC gets invoked. If still not enough memory, OutOfMemoryExcetpion gets thrown. LOH and POH use only free-list (GC and exception also occur).

When experiencing OOM exception, adding some explicit GC is not a solution. As the paragraph above explains, GC will happen anyway (a few times even) before OOM is thrown. Explicit GC might help for LOH allocations though.

Pinning

Pinning Objects means to make their location constant and disallow runtime to move it to some other location. Pinned objects fragment the heap, because they cannot be compacted. Pinning is useful mostly for P/Invoke scenarios where some unmanaged code operates on pointers.

There is also POH (since .NET 5). We can create arrays (only of structs without any references inside to simplify GC by avoiding traversal of POH during the Mark Phase).

GC Modes

.NET garbage collector may operate in one of two modes.

Workstation Mode

Gen 0 and Gen 1 are collected in the foreground thread, pausing user code. It’s blocking, but it’s relatively cheap (since it’s Gen 0/1). Collections are more often, but shorter (because the more often you GC, the less memory there’s to collect). There’s just one managed heap.

It’s good for desktop apps where GCs should be short to avoid blocking the UI. However, in some cases where an app relies havily on parallelism, Server Mode might be a better fit.

Server Mode

It is ideal for server-side apps. Each CPU core gets its own heap. Also, each core has its own GC thread running in the background. It’s non-blocking, but uses more memory (30-40% more) and CPU. It can be enabled via csproj. It can give you a huge boost of performance if utilized in scenarios where throughput is really important. Collections are less often, so they can take more time, resulting in longer pauses. However, considering that GC will run parallelly on a few threads, pause might actually be shorter than in the Workstation mode.


Submodes

Each of the two modes above can also be configured to work in one of the two submodes below. Therefore, we could say that there are 4 ways of GC operation in .NET.

  • Non-concurrent - whole GC operation happens while app threads are paused.
  • Concurrent - part of GC operations happen concurrently with app threads, in dedicated GC thread. Compaction is never concurrent though.

The Non-Concurrent submode is sometimes called Background Mode. So, for example, we could be using “Background Server Mode”.

While Background GC is executing, a Foreground GC might still occur! It takes precedence over Background GC, pausing it.


The simplest GC mode of operation is Non-concurrent Workstation mode.

Collection

There are two main ways of how memory gets collected:

  • Sweep - objects that are considered dead are removed. The “holes” created by that are potential places for now objects to be allocated in the future. However, when lots of small holes stay in memory, it’s difficult to allocate anything there. This obviously leads to fragmentation where even though we might have lots of free space, it’s impossible to put any objects there. It’s fast though.
  • Compacting - live objects are moved together, removing any holes in between them. It’s slower, because moving objects requires updating all references to them. However, it leads to better memory usage and less fragmentation.

Decision whether to use one or the other is made during the Plan Phase.

Ideally, Sweep + Compaction would result in best memory utilization.

SOH may ue Sweep or Compacting, GC has various heuristics to decide which one to use in a given case. We might also manually force either one of them.

LOH uses Sweep by default, but it will also Compact when memory is low. Manually, both can be invoked. POH uses only Sweep, since, by design, Compaction would invalidate pointers to POH objects.

Algorithm

Collection is triggered on a gen generation. When generation n GC is invoked, generation n, and all generations below will be collected.

GC first marks reachable objects, then it decides wheteher it will use Sweep or Compaction.

After collection of gen 0, it will be empty, since objects will either get removed or promoted to higher generation.

There are exceptions from that though, some objects might get “demoted”, meaning they will stay in the same generation.

Higher generations get scanned using Card Tables to find references to lower generations.

When collection Gen 1, it might actually grow in size after all. That’s due to potential amount of objects being promoted from Gen 0 to Gen 1.

Regions

With Regions layout, each generation as its own “region” of memory. To promote some generation fully, a given region just becomes a higher generation region. A new region then needs to be created for the collected generation. That’s Sweep though, for Compaction, objects might be moved around, and a given region might stay in the same generation.

Segments

When Segments are used, the simple act of moving the bumper pointer promotes a given set of objects to higher gen. Some objects might also be moved to fill holes in higer generation (that would happen only during Compaciton though).

Tips

Pooling Objects

When dealing with lots of potentially expensive objects creations, it could be a better idea to reuse pooled objects instead. That way, GC will have less work to do, since there will be less objects to deallocate. Microsoft provides ObjectPool for this (and ArrayPool specifically for arrays).

Good candidates for pooling are:

  • StringBuilder
  • byte array

So, instead of creating an object every time, you crete a pool of them (like hundred, or a thousand, depending on your needs), and request/return objects from/to the pool during the lifetime of the app.

Object Pool is also useful if object creation is expensive, since you can create a pool of reusable objects on startup and just use them afterwards.

Empty Collections

Whenever an empty collection is needed, do not create a new object, use Array.Empty<T> instead, which skips unnecessary allocation.

Stackallock

We are able to allocate objects on the stack instead of the heap.

var array = stackallow new byte[500]();

A few facts to keep in mind:

  • can’t be done in async methods
  • shouldn’t be used to allocate too much data, otherwise stack overflow will occur
  • it will always be faster than allocating objects on the heap
←  Nullability
IL and Allocations  →
© 2025 Marcin Jahn | Dev Notebook | All Rights Reserved. | Built with Astro.