Mono:Runtime

From Mono

Table of contents

The Mono runtime

The Mono runtime implements the ECMA Common Language Infrastructure (http://www.ecma-international.org/publications/standards/Ecma-335.htm) (CLI). The Mono runtime implements this virtual machine.

The Mono runtime engine provides a Just-in-Time compiler (JIT), an Ahead-of-Time compiler (AOT), a library loader, the garbage collector, a threading system and interoperability functionality.

We currently support two execution engines:

  • mono: Our Just-in-Time and Ahead-of-Time code generator for maximum performance.
  • mint: The Mono interpreter. This is an easy-to-port runtime engine.

We are using the Boehm conservative garbage collector.

The Mono runtime can be used as a stand-alone process, or it can be embedded into applications

Embedding the Mono runtime allows applications to be extended in C# while reusing all of the existing C and C++ code. For more details, see the Embedding Mono page and the Scripting With Mono page.

Supported Platforms

Mono has support for both 32 and 64 bit systems on a number of architectures as well as a number of operating systems.

Supported Operating Systems

Operating Systems

Supported Architectures

Mono has both an optimizing just-in-time (JIT) runtime and a interpreter runtime. The interpreter runtime is far less complex and is primarily used in the early stages before a JIT version for that architecture is constructed. The interpreter is not supported on architectures where the JIT has been ported.

Supported Architectures Runtime Operating system
s390, s390x (32 and 64 bits) JIT Linux
SPARC (32) JIT Solaris, Linux
PowerPC JIT Linux, Mac OSX
x86 JIT Linux, FreeBSD, OpenBSD, NetBSD,
Microsoft Windows, Solaris, OS X
x86-64: AMD64 and EM64T (64 bit) JIT Linux, Solaris
IA64 Itanium2 (64 bit) JIT Linux
ARM: little and big endian JIT Linux (both the old and the new ABI)
Alpha JIT Linux
MIPS JIT Linux
HPPA JIT Linux

Note that the Alpha, MIPS, ARM big-endian and HPPA architectures are community-supported and may not be as complete as the other architectures.

Support for SPARC64 works in older versions of Mono, but not in the recent versions.


Packages for most platforms are available from the Downloads page.

Embedded systems

To make mono more suitable for some architectures used as embedded systems have a look at the Small footprint page.

Compilation Engine

Paolo Molaro did a presentation on the current JIT engine and the new JIT engine. You can find his slides here (http://primates.ximian.com/~lupus/slides/jit).

We have re-written our JIT compiler. We wanted to support a number of features that were missing: Ahead of Time compilation, simplify porting and have a solid foundation for compiler optimizations.

Ahead-of-time compilation

The idea is to allow developers to pre-compile their code to native code to reduce startup time, and the working set that is used at runtime in the just-in-time compiler.

Although in Mono this has not been a visible problem, we wanted to pro-actively address this problem.

When an assembly (a Mono/.NET executable) is installed in the system, it is then be possible to pre-compile the code, and have the JIT compiler tune the generated code to the particular CPU on which the software is installed.

This is done in the Microsoft.NET world with a tool called ngen.exe.

The code produced by Mono's ahead-of-time compiler is Position Independent Code (PIC) which tends to be a bit slower than regular JITed code, but what you loose in performance with PIC you gain by being able to use all the available optimizations.

To compile your assemblies with this, just run this command:

 $ mono -O=all --aot program.exe

The above command will turn on all optimizations (-O=all) and then instructs Mono to compile the code to native code.

Some optimizations are being planned: OptimizingAOT

Bundles

Mono also offers bundles. Bundles merge your application, the libraries it uses and the Mono runtime into a single executable image. You can think of bundles as "statically linking mono" into your application.

To do this, you use the "mkbundle" command (see the man page distributed with Mono for more details).

For example, to create a fully static and bundled version of Mono for "hello world", you would:

bash$ mcs hello.cs
bash$ mkbundle --static hello.exe -o hello
 OS is: Linux
 Note that statically linking the LGPL Mono runtime has more licensing restrictions than dynamically linking.
 See http://www.mono-project.com/Licensing for details on licensing.
 Sources: 1 Auto-dependencies: False
   embedding: /tmp/hello.exe
 Compiling:
 as -o temp.o temp.s
 cc -o helol -Wall `pkg-config --cflags mono` temp.c  `pkg-config --libs-only-L mono` -Wl,-Bstatic -lmono -Wl,-Bdynamic `pkg-config --libs-only-l mono | sed -e "s/\-lmono //"` temp.o
 Done
bash$

Of course, you can also just embed the libraries, without the actual Mono runtime, by removing the --static flag.

A downside of the --static flag is that it will trigger the LGPL license requirement in the runtime. If you are planning on using this feature as an obfuscation technique you must obtain a commercial license of Mono by emailing mono@novell.com. Otherwise you should distribute all the components that are necessary to comply with the LGPL with bundles.

Platform for Code Optimizations

Beyond the use of the Mono VM as Just-in-Time compiler, we need to make Mono code generation as efficient as possible.

The design called for a good architecture that would enable various levels of optimizations: some optimizations are better performed on high-level intermediate representations, some on medium-level and some at low-level representations.

Also it should be possible to conditionally turn these on or off. Some optimizations are too expensive to be used in just-in-time compilation scenarios, but these expensive optimizations can be turned on for ahead-of-time compilations or when using profile-guided optimizations on a subset of the executed methods.

Simplify Porting

We wanted to reduce the effort required to port the Mono code generator to new architectures.

For Mono to gain wide adoption in the UNIX world, it is necessary that the JIT engine works in most of today's commercial hardware platforms.

The new Mono engine now supports both 32 and 64 bit systems and various architectures (See Supported Platforms).

Profiling and Code Coverage

Mono provides a number of profiling tools and code coverage tools.

See the Performance Tips page for details on using the profiler, and the Code Coverage page for information on how to use the code coverage functionality with your application and your test suites.

Versioning

Mono supports a Global Assembly Cache or GAC. The GAC is used to share libraries between different applications, to keep multiple versions of the same library installed at once and to avoid conflicts over the names of the libraries and they also play an important role in trust and security.

See the Assemblies_and_the_GAC document for more details.

Garbage Collection

Mono today uses Boehm's GC as its Garbage Collection engine. We are also working on a precise and compacting GC engine specific to Mono.

The GC interface is being isolated to allow for more than one GC engine to be used or for the GC to be tuned for specific tasks.

Mono's use of Boehm GC

We are using the Boehm conservative GC in precise mode.

There are a few areas that the GC scans for pointers to managed objects:

  1. The heap (where other managed objects are allocated)
  2. thread stacks and registers
  3. static data area
  4. data structures allocated by the runtime

(1) is currently handled in mostly precise mode: almost always the GC will only consider memory words that contain only references to the heap, so there is very little chance of pointer misidentification and hence memory retention as a result. The new GC requires a fully precise mode here, so it will improve things marginally. The details about mostly precise have to do with large objects with sparse bitmaps of references and the handling of multiple appdomains safely.

(2) is always scanned conservatively. This will be true for the new GC, too, at least for the first versions, where I'll have my own share of fun at tracking the bugs that a moving generational GC will expose. Later we'll conservatively scan only the unmanaged part of the stacks.

(3) We already optimized this both with Boehm and the current GC to work in precise mode.

(4) I already optimized this to work in mostly precise mode (ie some data structures are dealt with precisely, others not yet). I'll need to do more work in this area, especially for the new GC, where having pinned objects can be a significant source of pain.

Compacting GC

A new generational, precise and compacting GC is being developed and is currently available from SVN releases of Mono. This new compacting GC was implemented to work around some of the limitations in a purely conservative collector, specifically the memory consumption due to heap fragmentation.

Although this GC is currently available on SVN it is not a supported configuration and will not be a supported configuration for the Mono 1.2 release, it is available for developers that might be interested in testing their applications or might want to work on it.

The implementation details are available on our Compacting GC page.

References

  • "A Generational Mostly-concurrent Garbage Collector":
  • Details on The Microsoft .NET Garbage Collection Implementation:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnmag00/html/GCI.asp

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnmag00/html/GCI2.asp

IO and threading

The ECMA runtime and the .NET runtime assume an IO model and a threading model that is very similar to the Win32 API.

Dick Porter has developed WAPI: the Mono abstraction layer that allows our runtime to execute code that depend on this behaviour, this is called the `io-layer' in the Mono source code distribution.

This io-layer offers services like named mutexes that can be shared across multiple processes.

To achieve this, the io-layer uses a shared file mapping across multiple Mono processes to track the state that must be shared across Mono applications in the ~/.wapi directory.

Optimizations

The JIT engine implements a number of optimizations:

  • Opcode cost estimates (our architecture allows us to generate different code paths depending on the target CPU dynamically).
  • Inlining.
  • Constant folding, copy propagation, dead code elimination.
    Although compilers typically do constant folding, the combination of inlining with constant folding gives some very good results.
  • Linear scan register allocation.
  • SSA-based framework. Various optimizations are implemented on top of this framework

We continue to improve our engine, but many optimizations can not be effectively done by the runtime. You will get very good results if you profile and study your application patterns. See our Performance Tips article for various ideas on how to tune your software.

PRE: Partial Redundancy Elimination

Massimiliano implemented our SSAPRE.

Partial Redundancy Elimination (or PRE) is an optimization that (guess what?) tries to remove redundant computations. It achieves this by saving the result of "not redundant" evaluations of expressions into appositely created temporary variables, so that "redundant" evaluations can be replaced by a load from the appropriate variable.

Of course, on register starved architectures (x86) a temporary could cost more than the evaluation itself... PRE guarantees that the live range of the introduced variables is the minimal possible, but the added pressure on the register allocator can be an issue.

The nice thing about PRE is that it not only removes "full" redundancies, but also "partial" ones. A full redundancy is easy to spot, and straightforward to handle, like in the following example (in every example here, the "expression" is "a + b"):

int FullRedundancy1 (int a, int b) {
     int v1 = a + b;
     int v2 = a + b;
     return v1 + v2;
}

PRE would transform it like this:

int FullRedundancy1 (int a, int b) {
     int t = a + b;
     int v1 = t;
     int v2 = t;
     return v1 + v2;
}

Of course, either a copy propagation pass or a register allocator smart enough to remove unneeded variables would be necessary afterwords.

Another example of full redundancy is the following:

int FullRedundancy2 (int a, int b) {
     int v1;
     
     if (a >= 0) {
          v1 = a + b; // BB1
     } else {
          a = -a; // BB2
          v1 = a + b;
     }
     
     int v2 = a + b; // BB3
     return v1 + v2;
}

Here the two expressions in BB1 and BB2 are *not* the same thing (a is modified in BB2), but both are redundant with the expression in BB3, so the code can be transformed like this:

int FullRedundancy2 (int a, int b) {
     int v1;
     int t;
     
     if (a >= 0) {
          t = a + b; // BB1
          v1 = t;
     } else {
          a = -a; // BB2
          t = a + b;
          v1 = t;
     }
     
     int v2 = t; // BB3
     return v1 + v2;
}

Note that there are still two occurrences of the expression, while it can be easily seen that one (at the beginning of BB3) would suffice. This, however, is not a redundancy for PRE, because there is no path in the CFG where the expression is evaluated twice. Maybe this other kind of redundancy (which affects code size, and not the computations that are actually performed) would be eliminated by code hoisting, but I should check it; anyway, it is not a PRE related thing.

An example of partial redundancy, on the other hand, is the following:

int PartialRedundancy (int a, int b) {
     int v1;
     
     if (a >= 0) {
          v1 = a + b; // BB1
     } else {
          v1 = 0; // BB2
     }
     
     int v2 = a + b; // BB3
     return v1 + v2;
}

The redundancy is partial because the expression is computed more than once along some path in the CFG, not all paths. In fact, on the path BB1 - BB3 the expression is computed twice, but on the path BB2 - BB3 it is computed only once. In this case, PRE must insert new occurrences of the expression in order to obtain a full redundancy, and then use temporary variables as before. Adding a computation in BB2 would do the job.

One nice thing about PRE is that loop invariants can be seen as partial redundancies. The idea is that you can get into the loop from two locations: from before the loop (at the 1st iteration), and from inside the loop itself (at any other iteration). If there is a computation inside the loop that is in fact a loop invariant, PRE will spot this, and will handle the BB before the loop as a place where to insert a new computation to get a full redundancy. At this point, the computation inside the loop would be replaced by an use of the temporary stored before the loop, effectively performing "loop invariant code motion".

To learn more about how this actually works, see Massi's blog entry (http://primates.ximian.com/~massi/blog/archive/2004/Sep-14.html)

Arrays Bounds Check Removal

This optimization also built by Massimiliano allows the JIT to removes the array bounds checks that are automatically generated on every array access if the JIT is able to determine that the access to the array will always be bounded.

Typically for code like this:

 for (int i = 0; i < a.Length; i++) {
    a[i] = i;
 }

The JIT has to generate something like this:

 for (int i = 0; i <a.Length; i++) {
    if (i < a.LowerBoundary || i >= a.HigherBoundary)
        throw new IndexOutOfRangeException ();
    a[i] = i;
 }

This is so that programmers do not accidentally write outside of the boundary of the array.

With this optimization, the JIT compiler can infer from the loop that the variable 'i' will always be within the boundaries of the array, so the code produced becomes:

 for (int i = 0; i < a.Length; i++) {
    a[i] = i;
 }

Tree Pattern Matching

The new JIT engines uses three intermediate representations: the source is the CIL which is transformed into a forest of trees; This is fed into a BURS instruction selector that generates the final low-level intermediate representation.

There are a couple of books that deal with this technique: "A Retargetable C Compiler" and "Advanced Compiler Design and Implementation" are good references.

Useful links

See our Papers section for various articles describing virtual machines and JIT compilers.

Porting

See the Porting page for more details on porting Mono to a new platform.

Projects Under Development

There are a number of projects being developed in branches or on separate trees for the runtime, these are:

  • Compacting GC: A generational, compacting GC for Mono.
  • Linear: An update to the JIT's internal representation (IR).
  • JIT Regalloc: A new register allocation framework.
  • SafeHandles: Support for 2.0 SafeHandles.

COM and XPCOM

Mono's COM support can be used to interop with system COM components in Windows and in Linux (if you use a COM implementation). Additionally, Mono's COM support can be used to access software based on Mozilla's XPCOM.

For details on the progress, see the COM Interop page.