Alexander Kyte runtime

By virtue of using LLVM, Mono has access to a wide suite of tools and optimization backends. A lot of active research uses LLVM IR. One such research project, Souper, tries to brute-force a search for missed optimizations in our emitted code. The .NET community may have software projects that benefit from using Souper directly to generate code, rather than waiting for us to find ways to automate those optimizations ourselves. This algorithm can generate code that would be very challenging for a traditional compiler to find.

The Mono .NET VM is a rather nimble beast. Rather than requiring all users to live with the performance characteristics of a given policy, we often choose to create multiple backends and bindings that exploit what’s best of the native platform while presenting a common interface. Part of this is the choice between using an interpreter, a Just-In-Time compiler, or an Ahead-Of-Time compiler.

AOT compilation is attractive to some projects for the combination of optimized code with low start-up time. This is the classic advantage of native code over code from a JIT or interpreter. AOT code is often much worse than code from a JIT because of a need for indirection in code that references objects in run-time memory. It’s important for AOT code to exploit every possible optimization to make up for this disadvantage. For this, we increasingly rely on optimizations performed by LLVM.

LLVM’s optimization passes can analyze a program globally. It is able to see through layers of abstractions and identify repeated or needless operations in a program’s global flow. Likewise, it can examine the operations in a small segment of code and make them perfect with respect to one another. Sometimes though, we fail to optimize code. Classic compilers work by analyzing the control-flow and dataflow of a program and matching on specific patterns such as stores to variables that aren’t used later and constants that are stored to variables rather than being propagated everywhere they can be. If the pattern matches, the transformation can take place. Sometimes the code we feed into LLVM does not match the patterns of inefficiency that it looks for, and we don’t get an optimization.

What’s worse is that we don’t know that we hit this optimization blocker. We don’t know what we expect from code until it’s a problem and we’re really spending time optimizing it. Spotting trends in generated machine code across thousands of methods is incredibly labor intensive. Often only really bad code that runs many many times will catch attention. Fixing every single missed optimization and finding every single missed optimization becomes a chicken-and-egg problem.

The solution to some manifestations of this problem is the use of superoptimizers. The academic discipline of superoptimizers is very old. The idea is to treat the code that was written as more of a restriction, a specification. The superoptimizer generates a ton of native code and checks the ways in which it behaves differently than the written code. If it can generate a faster native code sequence than the compiler generated while keeping behavior exactly the same, it wins.

This “exactly the same” part can be incredibly expensive if not done correctly. The computational effort involved has historically kept superoptimization from being used very often. Since then, it has gotten a lot easier to run computationally intensive jobs. Computer hardware has become orders of magnitude more powerful. Theorems around equivalence checking and control-flow representations made more powerful claims and used algorithms with better running times. We are therefore seeing superoptimization research reemerge at this time.

One superoptimizer in particular, named Souper, has reached maturity while interoperating with the industry standard code generator (LLVM) and the industry standard SMT engine (Z3). It has kickstarted a renewed faith in researchers that superoptimization is a reasonable policy. It can take the LLVM IR that a compiler was going to feed into LLVM, and compute better IR. This can sometimes take a lot of time, and the code emitted is the result of a process that isn’t auditable. The pipeline is placing total faith in Souper for the correctness of generated code.

It’s mostly useful for compiler engineers to use to tell that optimizations were missed, and to identify how to fix that using conventional pattern matching over the program’s control-flow and dataflow graphs. That said, Souper offers the ability to drop in for clang and to generate the code that is run. Some projects are eager to make any trade-offs for performance that are acceptable. Other projects may want to get a feel for how fast they could run if they were to invest making sure Mono generates good code. If the compile time increase doesn’t discourage them, many projects may find some benefit in such an optimizing compiler.

I recommend that curious readers install Z3, get a checkout of,

and complete the compilation process described in that documentation.

When AOTing code with Mono, they’re going to want to pass the commandline flags named there into the ---aot=llvmopts= argument.

As of the time of this writing, that is

llvmopts="-load /path/to/ -souper -z3-path=/usr/bin/z3" 

Mono will then allow Souper to step in during the middle of the LLVM compilation and try it’s best at brute-forcing some better code. If there’s anything short and fast that does the job better, it will be found.

It is frankly amazing that Mono can get such extensive optimizations simply by compiling to LLVM IR. Without changing a single line of Mono’s source, we changed our compilation pipeline in truly dramatic ways. This shows off the lack of expectations that Mono has about the layout of our generated code. This shows off the flexibility of LLVM as a code generation framework and to Mono as an embedded runtime. Embedders using Mono should consider using our LLVM backend with this and other third-party LLVM optimization passes. Feedback about the impact of our research on real-world programs will help us decide what we should be using by default.

Ludovic Henry, Miguel de Icaza, Aleksey Kliger, Bernhard Urban and Ming Zhou runtime

During the 2018 Microsoft Hack Week, members of the Mono team explored the idea of replacing the Mono’s code generation engine written in C with a code generation engine written in C#.

In this blog post we describe our motivation, the interface between the native Mono runtime and the managed compiler and how we implemented the new managed compiler in C#.


Mono’s runtime and JIT compiler are entirely written in C, a highly portable language that has served the project well. Yet, we feel jealous of our own users that get to write code in a high-level language and enjoy the safety, the luxury and reap the benefits of writing code in a high-level language, while the Mono runtime continues to be written in C.

We decided to explore whether we could make Mono’s compilation engine pluggable and then plug a code generator written entirely in C#. If this were to work, we could more easily prototype, write new optimizations and make it simpler for developers to safely try changes in the JIT.

This idea has been explored by research projects like the JikesRVM, Maxime and Graal for Java. In the .NET world, the Unity team wrote an IL compiler to C++ compiler called il2cpp. They also experimented with a managed JIT recently.

In this blog post, we discuss the prototype that we built. The code mentioned in this blog post can be found here:

Interfacing with the Mono Runtime

The Mono runtime provides various services, just-in-time compilation, assembly loading, an IO interface, thread management and debugging capabilities. The code generation engine in Mono is called mini and is used both for static compilation and just-in-time compilation.

Mono’s code generation has a number of dimensions:

  • Code can be either interpreted, or compiled to native code
  • When compiling to native code, this can be done just-in-time, or it can be batch compiled, also known as ahead-of-time compilation.
  • Mono today has two code generators, the light and fast mini JIT engine, and the heavy duty engine based on the LLVM optimizing compiler. These two are not really completely unaware of the other, Mono’s LLVM support reuses many parts of the mini engine.

This project started with a desire to make this division even more clear, and to swap up the native code generation engine in ‘mini’ with one that could be completely implemented in a .NET language. In our prototype we used C#, but other languages like F# or IronPython could be used as well.

To move the JIT to the managed world, we introduced the ICompiler interface which must be implemented by your compilation engine, and it is invoked on demand when a specific method needs to be compiled.

This is the interface that you must implement:

interface ICompiler {
    CompilationResult CompileMethod (IRuntimeInformation runtimeInfo,
                                     MethodInfo methodInfo,
                                     CompilationFlags flags,
                                     out NativeCodeHandle nativeCode);

    string Name { get; }

The CompileMethod () receives a IRuntimeInformation reference, which provides services for the compiler as well as a MethodInfo that represents the method to be compiled and it is expected to set the nativeCode parameter to the generated code information.

The NativeCodeHandle merely represents the generated code address and its length.

This is the IRuntimeInformation definition, which shows the methods available to the CompileMethod to perform its work:

interface IRuntimeInformation {
    InstalledRuntimeCode InstallCompilationResult (CompilationResult result,
                                                   MethodInfo methodInfo,
                                                  NativeCodeHandle codeHandle);

    object ExecuteInstalledMethod (InstalledRuntimeCode irc,
                                   params object[] args);

    ClassInfo GetClassInfoFor (string className);

    MethodInfo GetMethodInfoFor (ClassInfo classInfo, string methodName);

    FieldInfo GetFieldInfoForToken (MethodInfo mi, int token);

    IntPtr ComputeFieldAddress (FieldInfo fi);

    /// For a given array type, get the offset of the vector relative to the base address.
    uint GetArrayBaseOffset(ClrType type);

We currently have one implementation of ICompiler, we call it the the “BigStep” compiler. When wired up, this is what the process looks like when we compile a method with it:

Managed JIT overview

The mini runtime can call into managed code via CompileMethod upon a compilation request. For the code generator to do its work, it needs to obtain some information about the current environment. This information is surfaced by the IRuntimeInformation interface. Once the compilation is done, it will return a blob of native instructions to the runtime. The returned code is then “installed” in your application.

Now there is a trick question: Who is going to compile the compiler?

The compiler written in C# is initially executed with one of the built-in engines (either the interpreter, or the JIT engine).

The BigStep Compiler

Our first ICompiler implementation is called the BigStep compiler.

This compiler was designed and implemented by a developer (Ming Zhou) not affiliated with Mono Runtime Team. It is a perfect showcase of how the work we presented through this project can quickly enable a third-party to build their own compiler without much hassle interacting with the runtime internals.

The BigStep compiler implements an IL to LLVM compiler. This was convenient to build the proof of concept and ensure that the design was sound, while delegating all the hard compilation work to the LLVM compiler engine.

A lot can be said when it comes to the design and architecture of a compiler, but our main point here is to emphasize how easy it can be, with what we have just introduced to Mono runtime, to bridge IL code with a customized backend.

The IL code is streamed into to the compiler interface through an iterator, with information such as op-code, index and parameters immediately available to the user. See below for more details about the prototype.

Hosted Compiler

Another beauty of moving parts of the runtime to the managed side is that we can test the JIT compiler without recompiling the native runtime, so essentially developing a normal C# application.

The InstallCompilationResult () can be used to register compiled method with the runtime and the ExecuteInstalledMethod () are can be used to invoke a method with the provided arguments.

Here is an example how this is used code:

public static int AddMethod (int a, int b) {
  return a + b;

public void TestAddMethod ()
  ClassInfo ci = runtimeInfo.GetClassInfoFor (typeof (ICompilerTests).AssemblyQualifiedName);
  MethodInfo mi = runtimeInfo.GetMethodInfoFor (ci, "AddMethod");
  NativeCodeHandle nativeCode;

  CompilationResult result = compiler.CompileMethod (runtimeInfo, mi, CompilationFlags.None, out nativeCode);
  InstalledRuntimeCode irc = runtimeInfo.InstallCompilationResult (result, mi, nativeCode);

  int addition = (int) runtimeInfo.ExecuteInstalledMethod (irc, 1, 2);
  Assert.AreEqual (addition, 3);

We can ask the host VM for the actual result, assuming it’s our gold standard:

int mjitResult = (int) runtimeInfo.ExecuteInstalledMethod (irc, 666, 1337);
int hostedResult = AddMethod (666, 1337);
Assert.AreEqual (mjitResult, hostedResult);

This eases development of a compiler tremendously.

We don’t need to eat our own dog food during debugging, but when we feel ready we can flip a switch and use the compiler as our system compiler. This is actually what happens if you run make -C mcs/class/Mono.Compiler run-test in the mjit branch: We use this API to test the managed compiler while running on the regular Mini JIT.

Native to Managed to Native: Wrapping Mini JIT into ICompiler

As part of this effort, we also wrapped Mono’s JIT in the ICompiler interface.


MiniCompiler calls back into native code and invokes the regular Mini JIT. It works surprisingly well, however there is a caveat: Once back in the native world, the Mini JIT doesn’t need to go through IRuntimeInformation and just uses its old ways to retrieve runtime details. Though, we can turn this into an incremental process now: We can identify those parts, add them to IRuntimeInformation and change Mini JIT so that it uses the new API.


We strongly believe in a long-term value of this project. A code base in managed code is more approachable for developers and thus easier to extend and maintain. Even if we never see this work upstream, it helped us to better understand the boundary between runtime and JIT compiler, and who knows, it might will help us to integrate RyuJIT into Mono one day 😉

We should also note that IRuntimeInformation can be implemented by any other .NET VM: Hello CoreCLR folks 👋

If you are curious about this project, ping us on our Gitter channel.

Appendix: Converting Stack-Based OpCodes into Register Operations

Since the target language was LLVM IR, we had to build a translator that converted the stack-based operations from IL into the register-based operations of LLVM.

Since many potential target are register based, we decided to design a framework to make it reusable of the part where we interpret the IL logic. To this goal, we implemented an engine to turn the stack-based operations into the register operations.

Consider the ADD operation in IL. This operation pops two operands from the stack, performing addition and pushing back the result to the stack. This is documented in ECMA 335 as follows:

  Stack Transition:
      ..., value1, value2 -> ..., result

The actual kind of addition that is performed depends on the types of the values in the stack. If the values are integers, the addition is an integer addition. If the values are floating point values, then the operation is a floating point addition.

To re-interpret this in a register-based semantics, we treat each pushed frame in the stack as a different temporary value. This means if a frame is popped out and a new one comes in, although it has the same stack depth as the previous one, it’s a new temporary value.

Each temporary value is assigned a unique name. Then an IL instruction can be unambiguously presented in a form using temporary names instead of stack changes. For example, the ADD operation becomes

Temp3 := ADD Temp1 Temp2

Other than coming from the stack, there are other sources of data during evaluation: local variables, arguments, constants and instruction offsets (used for branching). These sources are typed differently from the stack temporaries, so that the downstream processor (to talk in a few) can properly map them into their context.

A third problem that might be common among those target languages is the jumping target for branching operations. IL’s branching operation assumes an implicit target should the result be taken: The next instruction. But branching operations in LLVM IR must explicitly declare the targets for both taken and not-taken paths. To make this possible, the engine performs a pre-pass before the actual execution, during which it gathers all the explicit and implicit targets. In the actual execution, it will emit branching instructions with both targets.

As we mentioned earlier, the execution engine is a common layer that merely translates the instruction to a more generic form. It then sends out each instruction to IOperationProcessor, an interface that performs actual translation. Comparing to the instruction received from ICompiler, the presentation here, OperationInfo, is much more consumable: In addition to the op codes, it has an array of the input operands, and a result operand:

public class OperationInfo
  ... ...
  internal IOperand[] Operands { get; set; }
  internal TempOperand Result { get; set; }
  ... ...

There are several types of the operands: ArgumentOperand, LocalOperand, ConstOperand, TempOperand, BranchTargetOperand, etc. Note that the result, if it exists, is always a TempOperand. The most important property on IOperand is its Name, which unambiguously defines the source of data in the IL runtime. If an operand with the same name comes in another operation, it unquestionably tells us the very same data address is targeted again. It’s paramount to the processor to accurately map each name to its own storage.

The processor handles each operand according to its type. For example, if it’s an argument operand, we might consider retrieving the value from the corresponding argument. An x86 processor may map this to a register. In the case of LLVM, we simply go to fetch it from a named value that is pre-allocated at the beginning of method construction. The resolution strategy is similar for other operands:

  • LocalOperand: fetch the value from pre-allocated address
  • ConstOperand: use the const value carried by the operand
  • BranchTargetOperand: use the index carried by the operand

Since the temp value uniquely represents an expression stack frame from CLR runtime, it will be mapped to a register. Luckily for us, LLVM allows infinite number of registers, so we simply name a new one for each different temp operand. If a temp operand is reused, however, the very same register must as well.

We use LLVMSharp binding to communicate with LLVM.

Calvin Buckley porting

Note: This is a guest post by Calvin Buckley (@NattyNarwhal on GitHub) introducing the community port of Mono to IBM AIX and IBM i. If you’d like to help with this community port please contact the maintainers on Gitter.

C# REPL running under IBM i

You might have noticed this in the Mono 5.12 release notes, Mono now includes support for IBM AIX and IBM i; two very different yet (mostly!) compatible operating systems. This post should serve as an introduction to this port.

What does it take to port Mono?

Porting Mono to a new operating system is not as hard as you might think! Pretty much the entire world is POSIX compliant these days, and Mono is a large yet manageable codebase due to a low number of dependencies, use of plain C99, and an emphasis on portability. Most common processor architectures in use are supported by the code generator, though more obscure ISAs will have some caveats.

Pretty much all of the work you do will be twiddling #ifdefs to accommodate for the target platform’s quirks; such as missing or different preprocessor definitions and functions, adding the platform to definitions so it is supported by core functionality, and occasionally having to tweak the runtime or build system to handle when the system does something completely differently than others. In the case of AIX and IBM i, I had to do all of these things.

Where would I be without IBM?

For some background on what needed to happen, we can start by giving some background on our target platforms.

Both of our targets run on 64-bit PowerPC processors in big endian mode. Mono does support PowerPC, and Bernhard Urban maintains it. What is odd about the calling conventions on AIX (shared occasionally by Linux) is the use of function descriptors, which means that pointers to functions do not point to code, but instead point to metadata about them. This can cause bugs in the JIT if you are not careful to consume or produce function descriptors instead of raw pointers when needed. Because the runtime is better tested on 64-bit PowerPC, and machines are fast enough that the extra overhead is not significant, we always build a 64-bit runtime.

In addition to a strange calling convention, AIX also has a different binary format - that means that currently, the ahead-of-time compiler does not work. While most Unix-like operating systems use ELF, AIX (and by extension, IBM i for the purposes of this port) use XCOFF, a subset of the Windows PE binary format.

AIX is a Unix (descended from the System V rather than the BSD side of the family) that runs on PowerPC systems. Despite being a Unix, it has some quirks of its own, that I will describe in this article.

Unix? What’s a Unix?

IBM i (formerly known as i5/OS or OS/400) is decidedly not a Unix. Unlike Unix, it has an object-based filesystem where all objects are mapped into a single humongous address space, backed on disk known as single level storage – real main storage (RAM) holds pages of objects “in use” and acts as a cache for objects that reside permanently on disk. Instead of flat files, IBM i uses database tables as the means to store data. (On IBM i, all files are database tables, and a file is just one of the “object types” supported by IBM i; others include libraries and programs.) Programs on IBM i are not simple native binaries, but instead are “encapsulated” objects that contain an intermediate form, called Machine Interface instructions, (similar to MSIL/CIL) that is then translated and optimized ahead-of-time for the native hardware (or upon first use); this also provides part of the security model and has allowed users to transition from custom CISC CPUs to enhanced PowerPC variants, without having to recompile their programs from the original source code.

This sounds similar to running inside of WebAssembly rather than any kind of Unix – So, then, how do you port programs dependent on POSIX? IBM i provides an environment called PASE (Portable Application Solutions Environment) that provides binary compatibility for AIX executables, for a large subset of the AIX ABI, within the IBM i. But Unix and IBM i are totally different; Unix has files and per-process address spaces, and IBM i normally does not, so how do you make these incongruent systems work?

To try to bridge the gap, IBM i also has an “Integrated File System” that supports byte-stream file objects in a true hierarchical file system directory hierarchy. For running Unix programs that expect their own address space, IBM i provides something called “teraspace” that provides a large private address space per process or job. This requires IBM i to completely changes the MMU mode and does a cache/TLB flush every time it enters and exits the Unix world, making system calls somewhat expensive; in particular, forking and I/O. While some system calls are not implemented, there are more than enough to port non-trivial AIX programs to the PASE environment, even with its quirks and performance limitations. You could even build them entirely inside of the PASE environment.

A port to the native IBM i environment outputting MI code with the ahead of time compiler has been considered, but would take a lot of work to write an MI backend for the JIT, use the native APIs in the runtime, and handle how the environment is different from anything else Mono runs on. As such, I instead PASE and AIX for the ease of porting existing POSIX compatible code.

What happened to port it?

The port came out of some IBM i users expressing an interest in wanting to run .NET programs on their systems. A friend of mine involved in the IBM i community had noticed I was working on a (mostly complete, but not fully working) Haiku port, and approached me to see if it could be done. Considering that that I now had experience with porting Mono to new platforms, and there was already a PowerPC JIT, I decided to take the challenge.

The primary porting target was IBM i, with AIX support being a by-product. Starting by building on IBM i, I set up a chroot environment to work in, (chroot support was added to PASE fairly recently), setting up a toolchain with AIX packages. Initial bring-up of the port happened on IBM i, up to the point where the runtime was built, but execution of generated code was not happening. One problem with building on IBM i, however, is that the performance limitations really start to show. While building took the same amount of time on the system I had access to (dual POWER6, taking about roughly 30 minutes to build the runtime) as AIX due to it mostly being computation, the configure script was extremely impacted due to its emphasis on many small reads and writes with lots of forking. Whereas it took AIX 5 minutes and Linux 2 minutes to run through the configure script, it took IBM i well over an hour to run through all of it. (Ouch!)

At this point, I submitted the initial branch as a pull request for review. A lot of back and forth went on to work on the underlying bugs as well as following proper style and practices for Mono. I set up an AIX VM on the machine, and switched to cross-compiling from AIX to IBM i; targeting both platforms with the same source and binary. Because I was not building on IBM i any longer, I had to periodically copy binaries over to IBM i, to check if Mono was using missing libc functions or system calls, or if I had tripped on some behaviour that PASE exhibits differently from AIX. With the improved iteration time, I could start working on the actual porting work much more quickly.

To help with matters where I was unsure exactly how AIX worked, David Edelsohn from IBM helped by explaining how AIX handles things like calling conventions, libraries, issues with GCC, and best practices for dealing with porting things to AIX.

What needed to change?

There are some unique aspects of AIX and the subset that PASE provides, beyond the usual #ifdef handling.

What did we start with?

One annoyance I had was how poor the GNU tools are on AIX. GNU binutils are effectively useless on AIX, so I had to explicitly use IBM’s binutils, and deal with some small problems related to autotools with environment variables and assumption of GNU ld features in makefiles. I had also dealt with some issues in older versions of GCC (which is actually fairly well supported on AIX, all things considered) that made me upgrade to a newer version. However, GCC’s “fixincludes” tool to try to mend GCC compatibility issues in system header files in fact mangled them, causing them to be missing some definitions found in libraries. (Sometimes they were in libc, but never defined in the headers in the first place!)

Improper use of function pointers was sometimes a problem. Based on the advice of Bernhard, there was a problem with the function descriptors #ifdefs, which had caused a mix-up interpreting function pointers as code. Once that had been fixed, Mono was running generated code on AIX for the first time – quite a sight to behold!

What’s a “naidnE?”

One particularly nerve-racking issue that bugged me while trying to bootstrap was with the Decimal type returning a completely bogus value when dividing, causing a non-sense overflow condition. Because of constant inlining, this occurred when building the BCL, so it was hard to put off. With some careful debugging from my friend, comparing the variable state between x86 and PPC when dividing a decimal, we had determined exactly where the incorrect endianness handling had taken place and I had came up with a fix.

While Mono has historically handled different endianness just fine, Mono has started to replace portions of its own home-grown BCL with CoreFX, (the open-source Microsoft BCL) and it did not have the same rigor towards endianness issues. Mono does patch CoreFX code, but it sometimes pulls in new code that has not had endianness (or other such possible compatibility issues) worked out yet and thus requires further patching. In this case, the code had already been fixed for big endian before, but pulling in updated code from CoreFX had created a new problem with endianness.


On AIX, there are two ways to handle libraries. One is your typical System V style linking with .so libraries; this isn’t used by default, but can be forced. The other way is the “native” way to do it, where objects are stored in an archive (.a) typically used for holding objects used for static linking. Because AIX always uses position-independent code, multiple objects are combined into a single object and then inserted into the archive. You can then access the library like normal. Using this technique, you can even fit multiple shared objects of the same version into a single archive! This took only minimal changes to support; I only had to adjust the dynamic library loader to tell it to look inside of archive files, and some build system tweaks to point it to the proper archive and objects to look for. (Annoyingly, we have to hardcode some version names of library objects. Even then, the build system still needs revision for cases when it assumes that library names are just the name and an extension.)

What’s “undefined behaviour?”

When Mono tries to access an object reference, and the reference (a pointer) is null, (that is, zero) Mono does not normally check to see if the pointer is null. On most operating systems, when a process accesses invalid memory such as a null pointer, it sends the process a signal (such as SIGSEGV) and if the program does not handle that signal, it will terminate the program. Normally, Mono registers a signal handler, and instead of checking for null, it would just try to dereference a null pointer anyways to let the signal handler interrupt and return an exception to managed code instead. AIX doesn’t do that – it lets programs dereference null pointers anyway! What gives?

Accessing memory via a null pointer is not actually defined by the ANSI C standards – this is a case of a dreaded undefined behaviour. Mono relied on the assumption that most operating systems did it in the typical way of sending a signal to the process. What AIX instead does is to implement a “null page” mapped at 0x0 and accepts reads and writes to it. (You could also execute from it, but since all zeroes is an invalid opcode on PowerPC, this does not do much but throw an illegal instruction signal at the process.) This is a historical decision, relating back to code optimizations implemented in older IBM compilers made where they used speculative execution in compiler-generated code during the 1980s for improved performance when evaluating complex logical expressions. Because we cannot rely on handling a signal to catch the null dereference, we can instead force the behaviour to always check if pointers are null, (normally reserved for runtime debugging) to be on all the time.

What’s so boring about TLS?

BoringSSL is required to get modern TLS required by newer websites. The build system, instead of autotools and make, is CMake based. Luckily, this worked fine on AIX, though I had to apply some massaging for it to do 64-bit library mangling. For a while, I was stumped by an illegal instruction error, that turned out to be due to not linking in pthread to the library, and it not warning about it.

It turns out that even though BoringSSL was now working, one cipher suite (secp256r1) was not, so sites using that cipher were broken. To try to test it, I had gone “yak shaving” to build what was needed for the test harness according to the README; Ninja and Go. I had a heck of a time trying to build Go on a PPC Linux system to triage, but as it turned out, I did not actually need it anyway – Mono had tweaked the build system so that it was not needed after all; I just had to flip a CMake flag to let it build the tests and run them manually. After figuring out what exactly was wrong, it turned out to be an endianness issue in an optimized path. A fix was attempted for it, but in the end, only disabling it worked and let the cipher run fine on big endian PowerPC. Since the code came from Google code that has been rewritten in both BoringSSL and OpenSSL upstream’s latest sources, it is due to be replaced the next time Mono’s BoringSSL fork gets updated.

What else?

I had an issue with I/O getting some spurious and strange issues with threading. Threads would complain that they had an unexpected errno of 0. (indicating success) What happened was that AIX does not assume that all programs are thread-safe by default, so errno was not thread-local. One small #define later, and that was fixed. (Miguel de Icaza was amused that some operating systems still consider thread safety to be an advanced feature. 🙂)

We also found a cosmetic issue with uname. Most Unices put their version in the release field of the uname structure, and things like the kernel type in the version field. AIX and PASE however, put the major version in the version field, and the minor version in the release field. A simple sprintf for the AIX case was enough to fix this.

PASE has many quirks – this necessitated some patches to work around deficiencies; from bugs to unimplemented functions. I aim to target IBM i 7.1 or newer, so I worked around some bugs that have been fixed in newer versions. A lot of this I cleaned up with some more preprocessor definitions.

What’s next?

Now that Mono runs on these platforms, there’s still a lot of work left to be done. The ahead of time compiler needs to be reworked to emit XCOFF-compatible code, libgdiplus needs to be ported, Roslyn is broken on ppc64be, continuous integration would be useful to detect build failures, the build system is still a bit weird regarding AIX libraries, and plenty more where that came from. Despite all this, the fact the port works well enough already in its current state should provide a solid foundation to work with, going forward.

Laurent Sansonetti runtime

As you may know we have been working on bringing Mono to the WebAssembly platform. As part of the effort we have been pursuing two strategies; one that uses the new Mono IL interpreter to run managed code at runtime, and one that uses full static (AOT) compilation to create one .wasm file that can be executed natively by the browser.

We intend the former to be used for quickly reloading C# code and prototyping and the latter for publishing your final application, with all the optimizations enabled. The interpreter work has now been integrated into Mono’s source code and we are using it to develop, port and tune the managed libraries to work on WebAssembly.

This post is about the progress that we have been making on doing static compilation of .NET code to run on WebAssembly.

mono-wasm in action

WebAssembly static compilation in Mono is orchestrated with the mono-wasm command-line tool. This program takes IL assemblies as input and generates a series of files in an output directory, notably an index.wasm file containing the WebAssembly code for your assemblies as well as all other dependencies (the Mono runtime, the C library and the mscorlib.dll library).

$ cat hello.cs
class Hello {
  static int Main(string[] args) {
    System.Console.WriteLine("hello world!");
    return 0;
$ mcs -nostdlib -noconfig -r:../../dist/lib/mscorlib.dll hello.cs -out:hello.exe
$ mono-wasm -i hello.exe -o output
$ ls output
hello.exe        index.html        index.js        index.wasm        mscorlib.dll

mono-wasm uses a version of the Mono compiler that, given C# assemblies, generates LLVM bitcode suitable to be passed to the LLVM WebAssembly backend. Similarly, we have been building the Mono runtime and a C library with a version of clang that also generates LLVM WebAssembly bitcode.

Until recently, mono-wasm was linking all the bitcode into a single LLVM module then performing the WebAssembly code generation on it. While this created a functional .wasm file, this had the downside of taking a significant amount of time (half a minute on a recent MacBook Pro) every time we were building a project as a lot of code was in play. Some of the code, the runtime bits and the mscorlib.dll library, never changed and yet were still being processed for WebAssembly code generation every time.

We were thrilled to hear in late November of last year that the LLVM linker (lld) was getting WebAssembly support.

Since then, we changed our mono-wasm tool to perform incremental compilation of project dependencies into separate .wasm files, and we integrated lld’s new WebAssembly driver in the tool. Thanks to this approach, we now perform WebAssembly code generation only when required, and in our testing builds now complete in less than a second once the dependencies (runtime bits and mscorlib.dll) have already been compiled into WebAssembly.

mono-wasm's new linking phase

Additionally, mono-wasm used to use the LLVM WebAssembly target to create source files that would then be passed to the Binaryen toolchain to create the .wasm code. We have been testing the backend’s ability to generate .wasm object files directly (with the wasm32-unknown-unknown-wasm triple) and so far it seems promising enough that we changed mono-wasm accordingly. We also noticed a slight decrease in build time.

  Old toolchain New toolchain (First Compile) New toolchain (Rebuild)
Full application build ~40s ~30s <1s
Hello World program ~40s <1s <1s

There is still a lot of work to do on bringing C# to WebAssembly, but we are happy with this new approach and the progresses we are making. Feel free to watch this space for further updates. You can also track the work on the mono-wasm GitHub repository.

For those of you that want to take this for a spin you can download a preview release, unzip and run “make” in the samples. This currently requires MacOS High Sierra to run.

Miguel de Icaza runtime

Mono is complementing its Just-in-Time compiler and its static compiler with a .NET interpreter allowing a few new ways of running your code.

In 2001 when the Mono project started, we wrote an interpreter for the .NET instruction set and we used this to bootstrap a self-hosted .NET development environment on Linux.

At the time we considered the interpreter a temporary tool that we could use while we built a Just-in-Time (JIT) compiler. The interpreter (mint) and the JIT engine (mono) existed side-by-side until we could port the JIT engine to all the platforms that we supported.

When generics were introduced, the engineering cost of keeping both the interpreter and the JIT engine was not worth it, and we did not see much value in the extra work to keep it around, so we removed the interpreter.

We later introduced full static compilation of .NET code. This is a technology that we introduced to target platforms that do not allow for dynamic code generation. iOS was the main driver for this, but it opened the doors to allow Mono to run on gaming consoles like the PlayStation and the Xbox.

The main downside of full static compilation is that a completely new executable has to be recreated every time that you update your code. This is a slow process and one that was not suitable for interactive development that is practiced by some.

For example, some game developers like to adjust and tweak their game code, without having to trigger a full recompilation. The static compilation makes this scenario impractical, so they resort to embedding a scripting language into their game code to quickly iterate and tune their projects.

This lack of .NET dynamic capabilities also prevented many interesting uses of .NET as a teaching or prototyping tool in these environments. Things like Xamarin Workbooks, or simple scripting could not use .NET languages and had to resort to other solutions on these platforms.

Frank Krueger, while building his Continuous IDE, needed such environment on iOS so much that he wrote his own .NET interpreter using F# to bring his vision of having a complete development environment for .NET on the iPad.

To address these issues, and to support some internal Microsoft products, we brought Mono’s interpreter back to life, and it is back with a twist.

New Mono Interpreter

We resuscitated Mono’s old interpreter and upgraded its .NET support, adding the support for generics and upgraded it to run .NET as it exists in 2017. Next is adding support for mixed-mode execution.

It is one of the ways that Mono runs on WebAssembly today for example (the other being the static compilation using LLVM)

The interpreter is now part of mainline Mono and it passes a large part of our extensive test suites, you can use it today when building Mono from source code, like this:

$ mono --interpreter yourassembly.exe

Mixed Mode Execution

While the interpreter alone is now in great shape, we are currently working on a configuration that will allow us to mix both interpreted code with statically compiled code or Just-in-Time compiled code, we call this mixed mode execution.

For platforms like iOS, PlayStation and Xbox, this means that you can precompile your core libraries or core application, and still support loading and executing code dynamically. Gaining the benefits of having all your core libraries optimized with LLVM, but still have the flexibility of running some dynamic code.

This will allow game developers to prototype, experiment and tweak their games using .NET languages on their system without having to recompile their applications.

It will open the doors for scriptable applications on device using .NET languages as well.

Future work

We are extending the capabilities of the interpreter to handle various interesting scenarios. These are some of the projects ahead of us:

Improvements for Statically Compiled Mono

The full ahead-of-time compilation versions of Mono (iOS, Consoles) do not ship with an implementation of System.Reflection.Emit. This made sense as the capability could not be supported, but now that we have an interpreter, we can.

There are several uses for this.

The System.Linq.Expressions API which is used extensively by many advanced scenarios like Entity Framework or by users leveraging the C# compiler to parse expressions into expression trees, you have probably seen the code in scenarios like this:

Expression sum = a + b;
var adder = sum.Compile ();
adder ();

In Full AOT scenarios, the way that we made Entity Framework and the above work was to ship an interpreter for the above Expression class. This expression interpreter has limitations, and is also a large one.

By enabling System.Reflection.Emit powered by the interpreter we can remove a lot of code.

This will also allow the scripting languages that have been built for .NET to work on statically compiled environments, like IronPython, IronRuby and IronScheme.

To allow this, we are completing the work for mixed-mode execution. That means that the interpreted code complements existing statically compiled .NET code.

Better Isolation

Earlier on this post, I mentioned that one of the idioms that we previously failed to address was the hot-reloading of code by developers that deployed their app and tweaked their game code (or their code for that matter) live.

We are completing our support for AppDomains to enable this scenario.

Researching Mixed Mode Options

The interpreter is a lighter option to run some code. We found that certain programs can run faster by being interpreted than being executed with the JIT engine.

We intend to explore a mixed mode of execution, sometimes called tiered compilation.

We could instruct the interpreter to execute code that is known to not be performance sensitive - for example, static constructors or other initialization code that only runs once to reduce both memory usage, generated code usage and execution time.

Another consideration is to run code in interpreted mode, and if we exceed some threshold switch to a JIT compiled implementation of the method, or use attributes to annotate methods that are worth the trouble and methods that are not worth the trouble optimizing.