During the last years Paolo and Mark have been developing a generational garbage collector for Mono called "Simple Generational GC" or SGen-GC, the properties of this garbage collector are:
- Two generations.
- Mostly precise scanning (stacks and registers are scanned conservatively).
- Copying minor collector.
- Two major collectors: Copying and Mark&Sweep.
- Per-thread fragments for fast per-thread allocation.
- Uses write barriers to minimize the work done on minor collections.
The code is still experimental and should not be used in any kind of production environment. In SVN Mono, a separate mono executable, named "mono-sgen", is built, that uses the SGen garbage collector.
Table of Contents
3.1 Nursery Collection
3.2 Major Collection
3.3 Stopping the world.
3.4 Roots
3.5 Inside the Nursery
3.6 Remembered Sets and Write Barriers
3.7 Flagging Objects
3.8 Conservative Scanning
3.9 Scanning Objects
3.10 Dray Gray Stack
3.11 Pinned Objects
3.12 Finalizers
Copying Collection
A copying garbage collector improves data locality and helps reducing the memory fragmentation of long-running applications or applications whose allocation patterns might lead to memory fragmentation in non-moving collectors.
Mono has historically used the Boehm-Demers-Wiser Conservative Garbage Collector (http://www.hpl.hp.com/personal/Hans_Boehm/gc/) which provides a very simple and non-intrusive interface for providing applications with garbage collection.
Fragmentation of the heap is a common problem faced by C and C++ applications and it happens under certain allocation patterns that sometimes are outside the developer control. For example, consider the following allocation pattern:
- Allocate an object which is 4096 bytes long (a).
- Allocate an object which is 1024 bytes long (b).
- Allocate an object which is 1024 bytes long (c).
- Release (a)
- Allocate another 1024 byte sized object (d).
- Release (c)
- Allocate an object which is 4096 long. (e).
Here is how the heap would look like during each step, notice that although there are 4096 bytes available to allocate, the memory is not contiguous and the heap has to be extended.
Some memory allocators are able to return the "holes" back to the operating system if these holes happen to match the operating system minimum allocation sizes. For most collectors, the complication is that these holes can be created due to smaller allocations in the address space.
A compacting collector would be able to move the data and not grow the heap:
This represents an ideal situation, but there are some cases when moving the objects is not possible. In the CIL universe this can be caused because objects have been "pinned" which prevent objects from being moved.
In addition, SGen-GC also handles large objects specially as moving large objects can easily hurt performance. The SGen-GC allocates large objects directly on pages managed by the operating system, this allows the collector to release the memory back to the operating system when those blocks are no longer in use.
Implementation Details
Note: This is an evolving piece of SGen-GC.
Managaging these objects differently can fragment the continuity of the address space, but since the memory is returned to the operating system the memory usage pattern is different than an allocator that would mix large and small allocations as these large objects are always allocated independently.
This reflects SGen-GC strategy as of May 29th, 2006, but it is an area that we will tune to reflect usage scenarios of Mono applications and will likely change to take into consideration the age of the large objects and possible perform copying/moving to a special Large Object section at some point.
Interface
The GC exposes two interfaces, a public interface which is limited to a few methods in the metadata/mono-gc.h header file.
And a more extensive internal interface consumed by the JIT engine, this one is declared in metadata/gc-internal.h
This GC is also different from the Boehm GC in the fact that only objects are GC-tracked: it is not possible to allocate blobs of memory and have them collected by the GC.
The entry points for allocating objects are:
void *mono_gc_alloc_obj (MonoVTable *vtable, size_t size); void *mono_gc_alloc_pinned_obj (MonoVTable *vtable, size_t size);
mono_gc_alloc_pinned_obj must be used for objects that can't be moved in memory (like interned strings).
The GC also provides the following convenience function to allocate memory and register the area as a root with a specified GC descriptor. This block of memory must be explicitly freed with a call to mono_gc_free_fixed().
void *mono_gc_alloc_fixed (size_t size, void *descr);
mono_gc_alloc_obj is used from within the Mono runtime through the following macros: ALLOC_PTRFREE, ALLOC_OBJECT, ALLOC_TYPED.
Collection
SGen-GC manages memory in four groups:
- The nursery, where new small objects are allocated.
- Large object store, where large objects are allocated.
- Old generations, where small objects are copied to.
- Pinned chunks, for objects that have been pinned-allocated.
The garbage collector will trigger a collection when there is no free memory. The collector distinguishes between two kinds of collections: major collections and nursery-collections which only operate on the current nursery.
The principle behind this two-stage collection (major and nursery) is based on the observation that many programs create a lot of short-lived objects that can be quickly collected. All new objects are allocated in the nursery and during a collection they are moved to the older generation memory block. Nursery collections reduce the amount of work required by the collector by limiting the work done by the collector.
When the nursery is full, the collector will trigger a nursery collection, this is performed by the copying GC. When the old generation is full as well, a major collection is performed.
The garbage collector is complicated by a number of issues:
- Must work with multi-threaded applications.
- The collection can be triggered by any thread.
- We need to track which objects are pinned, as we can not move those objects.
- Large objects are too expensive to move, so they are dealt with separately using a mark/sweep during a major collection.
- Some kinds of objects can not be moved either (interned strings, type handles) and are also dealt with using mark/sweep.
Before doing a collection (minor or major), the collector must stop all running threads as objects might move.
Nursery Collection
To optimize the speed of collections of short-lived objects, SGen-GC performs nursery collections, these are collections that do not have to scan all of the allocated heap, but instead limits the collection to the youngest generation, also known as the nursery.
The nursery is made up of new objects, to survive these objects must be referenced by a root either one that has been explicitly registered, indirectly due to the conservative scan of stacks or if an object in the old generation points to an object in the nursery.
During this collection all live objects are copied into the free space from the old generation, or if the old generation does not have enough space to hold it, into a new generation space:
The dark areas represent pinned objects, objects that could not be moved. This fragments the nursery. To cope with this fragmentation (and also as an optimization for fast thread allocations) the nursery uses fragments, these fragments describe the individual gaps that memory can be allocated from.
After copying the live objects from the nursery into the older generation, the nursery will look like this:
Since the purpose of a nursery collection is to avoid scanning all the allocated memory, the collector and the VM keep track of any pointers that have changed in the old generations which might point to objects in the nursery (as this would determine objects that must be kept alive as well). These changes are known as the remembered set.
A nursery collection is perfomed like this:
- Pinned objects (implicit and explict) are identified and flagged.
- Objects tracked by the remembered sets are copied.
- All pinned objects are scanned.
- All registered roots are scanned but its scope is limited to the nursery.
- Objects referenced by the roots and pinned objects are now copied.
- Nursery fragment are reconstructed.
Major Collection
Major collections look at all the objects allocated by the application: they look at the old generations and the nursery. In addition to performing the copying and compacting of most objects, a mark/sweep is performed for pinned chunks and for large objects as well.
During a major collection, the following steps take place:
- Pinned objects (implicit and explicit) are identified and flagged.
- A new section of memory for the generation is allocated.
- Pinned objects are scanned for references to other objects.
- All roots are traversed and scanned for references to other objects.
- Objects referenced by the roots and pinned objects are now copied
- Sweep of large objects is performed.
- Sweep of pinned objects is performed.
- Any unused sections are released.
- Nursery fragments are reconstructed.
The various stages are described in more detail in the following sections.
Stopping the world.
To perform the garbage collection, it is important that all the running threads are stopped, this is called "stopping the world". This guarantees that no changes are happening behind the GC's back and also, the compacting collector will need to move the objects, and update all pointers to the objects to point to their new locations.
Stopping threads and resuming execution is done by the stop_world() and restart_world() routines, which are invoked before a major collection or a nursery collection are performed.
To actually implement the stop and restore, a couple of Unix signals are used: one signal is used to to notify a thread that it should stop, and another is used to notify a thread that it can resume execution. These signals are setup when the GC is initialized.
Stopping is done by calling the thread_handshake routine, which sends the stop signal to all known threads (using pthread_kill). The signal handler for the thread will in turn sleep until a signal is received to resume execution.
The Mono runtime will automatically register all threads that are created from the managed world with the garbage collector. But it is important that any threads that will communicate with the Mono runtime to register these threads with the garbage collector, so they can be stopped during a collection, developers embedding the Mono runtime must call the mono_gc_register_thread.
The information about each registered thread is tracked in the thread_table hash table, a hash table that contains nodes of type SgenThreadInfo. The SgenThreadInfo contains among other things the RememberedSet for the thread as well as the end of the stack.
Another important side effect of suspending the threads with a signal handler is that the values held on the registers are stored on the stack of each thread. The signal handler will update the stack_start value in the current thread information. The GC will use this to conservatively scan the memory between stack_start and stack_end.
The dark areas represent the blocks of memory that the SgenThreadInfo will track for each thread in the running application and contain the live data on each of the thread stacks, in addition to the register values.
Roots
Roots are the initial set of object references from which all of the other objects in a program are referenced. These root object references are made up of all static field references, CPU registers referencing objects, variables stored on all of the thread stacks, and any internal object references that the runtime holds.
Objects that are no longer reachable from the roots are garbage collected.
The dark regions are the root objects, the gray regions are the objects that are referenced by any root objects or other referenced objects and the white regions represent the garbage.
Internally, the collector tracks two kinds of records, conservative root records and precise root records.
Both records are created through the mono_gc_register_root, its prototype is:
int mono_gc_register_root (char *start, size_t size, void *descr)
The start and size arguments are used to specify the block in memory that represents the root.
The descr argument is a root descriptor used to specify the contents of the block. The ROOT_DESC_CONSERVATIVE descriptor (which happens to be NULL) means that the block will be scanned conservatively for roots. Otherwise the descr represents a descriptor that specifies the object references in the specified block.
Currently the stack and register contents are scanned conservatively (the CPU registers are copied into a small buffer of memory that has been registered as a root block of memory that is to be scanned conservatively).
The collector tracks each of these registered roots in the roots_hash hash table, a hash table that contains node of type RootRecord. Each RootRecord tracks a block of memory (start and end addresses) with the given descriptor as a root.
Inside the Nursery
Objects are initially allocated in a nursery using a fast bump-pointer technique. When the nursery is full we start a nursery collection: this is performed with a copying GC.
When the old generation is full, we also perform a collection of the old generation
There are a number of complications over a simple copy collection:
- Pinned objects.
- Large objects (these are too expensive to handle with a garbage collection).
The nursery is a contiguous region of memory where all new objects are allocated, the region is delimited by two global variables nursery_start and nursery_real_end, the nursery_next pointer points to the location where the next object will be allocated.
Implementation Details
Segments of the nursery memory are delimited by the nursery_next and nursery_temp_end. This serves a number of purposes:
- It allows allocations to be done inside fragmented areas of the nursery (due to pinned objects).
- FIXME: scan starts
- Allows for fast, per-thread allocation without any locking to take place.
In the implementation as of May 2006, the nursery_next and nursery_temp_end are global, but we will eventually turn them into per-thread variables to allow thread to do allocations without having to take the GC-lock.
TODO: Pinned objects in the nursery.
Remembered Sets and Write Barriers
To reduce the time spent on a collection, the collector differentiates between a major collection which will scan the entire heap of allocated objects and a nursery collection which only collects objects in the newer generation. By making this distinction the runtime reduces the time spent during collection as it will focus its attention on the nursery as opposed to the whole heap.
There are two ways in which an object in an older generation might point to objects in the newer generation:
- During code execution when a field is set.
- During the copy phase, when an object that might contain a pointer to the new generation is moved to the old generation.
To track field-settings, the JIT uses a per-thread RememberedSet which is stored in the thread-local remembered_set variable, this is done in cooperation with the JIT which instruments all object stores to call some helper routines in the GC that implement the write barrier, the mono_gc_wbarrier_* functions.
The second kind, the ones generated during object copying, are tracked in the global RememberedSet, in the global_remset variable.
The per-thread remembered set is maintained as code executes and by being a per-thread variable, it is lock-free. The global remembered set is maintained when the world is stopped and the GC lock is held.
When we collect the whole heap the root set is basically thread stacks and static fields.
When we collect the young generation we use the remembered sets as roots so that objects in the young generation that are referenced from objects in the old one are kept alive.
Flagging Objects
During the copying phase the collector needs to track some information efficiently: whether a given object is pinned, and whether a given object has been copied to an older generation (forwarded).
To avoid creating new lists or scanning arrays, during the collection the two lower bits of the vtable of the object are used to store this information. Every object in Mono is represented by the following structure:
typedef struct { MonoVTable *vtable; MonoThreadsSync *synchronisation; } MonoObject;
As the vtable is a pointer, it is guaranteed to always be aligned to 4-bytes the two lower bits of the vtable are always zero. During the collection the collector uses these two bits to store two flags: PINNED and FORWARDED. When the collection is complete these bits are cleared to restore the original value of the vtable pointer.
Conservative Scanning
Before the actual collection takes place, it is important to find all the objects that can not be moved (the pinned objects), there are two kinds of pinned objects, those that were explicitly pinned by the runtime or the programmer (using the fixed expression or object references that are passed to unmanaged code with P/Invoke) and those which are identified by a conservative scan of the stacks and thread registers.
The thread stacks and the registers must be conservatively scanned, which means that the memory regions are scanned for pointer-sized patterns that might point to objects allocated and the heap, and if such a match is found, the object is considered to be pinned.
This is necessary since Mono does not currently track object usage on the stack and the current registers of a routine. This means that Mono does not know which values on the stack are actual object references, nor which values on the registers at the point that a thread was stopped points to objects. Since the GC would be unable to make the distinction between a reference to an object and a variable which holds a value that just happens to look like a pointer, the GC must err on the side of safety and assume that the object is pinned. By assuming that the object is pinned, the collector does not move these objects, nor does it have to update the stack contents and the register values to contain this information.
A future version of Mono might track as code is generated where object references are tracked on the registers and where they are stored on the stack. Only then it would be possible to scan the thread stack and registers precisely instead of conservatively.
At the implementation level, a pinnect object is merely an object that the copying phase of the collector will ignore and will not move to the older generation.
To do this efficiently the collector flags these objects with the pinned flag on its vtable.
The pinning process has two stages:
- Identifying pinned objects explicit and implicit (conservative scanning).
- Flagging the objects individually as pinned.
Once the objects have been flagged as pinned, the copying collection can begin.
The identification of pinned objects is done by the pin_from_roots function. This functions scans all of the pinned locations: the registered pinned roots (see "Pinned Objects") as well as all of the thread stacks, which at this point also contain the contents of the registers in use in each thread.
The routine constructs a list of all pinned objects in the pin_queue array. The contents of this array are then optimized by the optimize_pin_queue which sorts the contents and removes any duplicates from it.
Once the list of unique addresses has been created, the collector has to flag the object before the collection can start. This is done by pin_objects_from_addresses.
pin_object_from_addresses has to cope with one problem, the addresses in the pin_queue do not necessarily point to the beginning of the object, these pointers might very well be pointing somewhere in the middle of the object, so its necessary to map the address in the pin_queue to the object start. Since objects are allocate sequentially in the memory section, it is possible to do this mapping by scanning from the beginning of the memory section, but this could be very time consuming.
This process is optimized by using the scan_start array of pointers from the GCMemSection:
Every 4k to 8k of objects sequentially allocated an entry in the scan_starts array is allocated. This way we only have to scan between 4096 and 8096 bytes of memory to find the header of the object to pin.
The contents of the scan_starts array is maintained in the mono_gc_alloc_obj and the copy_object functions.
Scanning Objects
TODO: describe scanning.
The compacting collector needs a mechanism to move objects (from the nursery to the old generation for example). When an object is copied, it is important that all references pointing to it are updated to point to the new location.
All objects that have are currently referenced are copied to the older generation, this is done by the copy_object routine which does a bit copy of all the object contents.
Once an object has been copied to its new destination, in the old copy of the object the "forwarded" bit is turned on in the vtable field and the synchronization pointer is updated with the new location of the object. This is safe to do, since the object at this point is no longer in use at the old address.
This information is used to update all the existing references to the object to point to the new location.
Dray Gray Stack
Pinned Objects
The garbage collector also supports "pinned" objects, these are objects that for some reason should not be moved. For example, if an object reference is passed to an unmanaged function through P/Invoke, the object is pinned, as unmanaged code can not cope with objects that move behind its back. Objects are also pinned when developers use the fixed statement in C#, like this:
fixed (MyObject *p = &object){ ... }
Notice that `p' might point to the beginning of the object, or through pointer manipulation anywhere inside the object or even outside the object, the GC must account for this.
Mono also pins objects when conservatively scanning the thread stacks and the registers (more on this shortly).
Pinned objects also have the potential of fragmenting the heap and they reduce the effectiveness of the GC as these objects become obstacles for the compacting process.
Pinned objects can be registered with the garbage collector by calling the mono_gc_register_root function with a NULL descriptor, which has the following prototype:
mono_gc_register_root (char *start, size_t size, void *descr);
If the descriptor is NULL, the GC will treat the region starting at "start" for "size" bytes as a block of memory that contains pointers to objects. Any objects in that range will be pinned.
A convenience macro is available from C to register a variable as a pinned root, you can use the MONO_GC_REGISTER_ROOT macro to register a variable as a root.
Compacting collectors have some challenges to solve. For instance, when an object is moved, all of the pointers that refer to it must be updated to reflect the new location of the object. Although it is relatively simple to update the pointers in known managed locations, there are some locations that are hard to track: references that might be held on the stack of a thread, or in the registers of a thread.
Today Mono does not track the object references in the thread stacks or in the registers of a thread, instead Mono conservatively scans the thread stacks and the registers for each thread and assumes that anything in those spaces that might be a pointer into the GC-controlled heap is a valid pointer into the GC-controlled heap.
This means that the collector treats any object that might be referenced from this areas as a pinned object.
Finalizers
Implementation Details
Low-level Memory Allocation
Memory inside the GC is allocated from two sources:
- get_internal_mem(size_t size)
- get_os_memory(size_t size, int activate)
get_internal_mem is used for small allocations, and is very similar to malloc, but memory is obtained from the internal chunks created by the GC.
For large allocations, memory is obtained from the operating system using the get_os_memory. The returned blocks of memory returned from get_os_memory are aligned to the operating system page alignment. get_os_memory allocates memory directly from the operating system circumventing the C runtime malloc-based system and it is implemented in Unix using mmap(2) over /dev/zero.
GCMemSections
Major blocks of memory managed by the GC are tracked in the GCMemSection structure. This structure tracks the memory allocated from the operating system and objects are allocated back-to-back in the chunk of memory obtained from the operating system.
The structure looks like this:
The grey area represents the pieces of this particular section that have been used. The next field is used to keep track of all the GCMemSection structures allocated in a linked list.
The scan_starts array is used as an optimization to minimize the memory that must be scanned for pinned objects, this is discussed in the Conservative Scanning section.
When a nursery collection happens, the live and movable objects from the nursery are copied into a new GCMemSection.
Large Object Allocation
Note: SGen-GC's Large Object Allocation strategy is an evolving one. The description in this paragraph reflects the implementation as of May 29th, 2006 and will likely change as the GC is tuned to cope with most common usage patterns.
Since moving large objects is an expensive operation, the GC treats any objects above 64k specially (the actual size is MAX_SMALL_OBJ_SIZE, a value defined in the sgen-gc.c file). These objects are allocated using the get_os_memory routine.
These objects do not participate in the copying/compacting process and are not stored in the memory managed by GCMemSection. Instead these objects are managed by a mark and sweep algorithm during a major collection and are kept track in a global list of large objects (in the structure LOSObject and the los_object_list linked list).
When an object has been determined to be unused, the memory is returned to the operating system immediately (this is possible because the memory is allocated using get_os_memory as opposed to the regular malloc from the C runtime).
Fragments
Descriptors
Creating Root Descriptors
To create such a descriptor, the following routine is used:
void *mono_gc_make_descr_from_bitmap (gsize *bitmap, int numbits);
This routine will return a descriptor that encodes the object references as specified in the given bitmap (one bit per pointer). Internally the routine will find the best possible descriptor encoding for the given bitmap, as of this writing the routine will create a bitmap descriptor, provided that the bitmap fits on a pointer, otherwise it will return a ROOT_DESC_CONSERVATIVE descriptor.
The code should eventually support other descriptor encodings like ROOT_DESC_RUN_LEN and ROOT_DESC_LARGE_BITMAP in addition to the two existing ones.
GCVTable
Runtime Interface
The runtime will sometimes need to keep pointers to managed objects. In the past the runtime used GHashTable and GList for tracking objects in hashtables and lists.
With the moving collector, it is necessary to use data structures that the garbage collector is aware of. To this end, you must use the MonoGHashTable and the MonoMList.
MonoMList
MonoMList is a single linked list that can store references to managed objects. This list is mirrored in the managed world in the internal corlib class System.MonoListItem







