LLVM Backend

Mono includes a backend which compiles methods to native code using LLVM instead of the built in JIT.

Usage

The back end requires the usage of our LLVM fork/branches, see ‘The LLVM Mono Branch’ section below.

The llvm back end can be enabled by passing --enable-llvm=yes or --with-llvm=<llvm prefix> to configure.

Platform support

LLVM is currently supported on x86, amd64, arm and arm64.

Architecture

The backend works as follows:

  • first, normal mono JIT IR is generated from the IL code
  • the IR is transformed to SSA form
  • the IR is converted to the LLVM IR
  • the LLVM IR is compiled by LLVM into native code

LLVM is accessed through the LLVM C binding.

The backend doesn’t currently support all IL features, like vararg calls. Methods using such features are compiled using the normal mono JIT. Thus LLVM compiled and JITted code can coexist in the same process.

Sources

The backend is in the files mini-llvm.c and mini-llvm-cpp.cpp. The former contains the bulk of the backend, while the latter contains c++ code which is needed because of deficiencies in the LLVM C binding which the backend uses.

The LLVM Mono Branch

We maintain a fork/branch of LLVM with various changes to enable better integration with mono. The repo is at:

https://github.com/dotnet/llvm-project

The LLVM backend is currently only supported when using this version of LLVM. When using this version, it can compile about 99% of mscorlib methods.

Changes relative to stock LLVM

The branch currently contains the following changes:

  • additional mono specific calling conventions.
  • support for loads/stores which can fault using LLVM intrinsics.
  • support for saving the stack locations of some variables into the exception handling info emitted by LLVM.
  • support for stores into TLS on x86.
  • the LLVM version string is changed to signal that this is a branch, i.e. it looks like “2.8svn-mono”.
  • workarounds to force LLVM to generate direct calls on amd64.
  • support for passing a blockaddress value as a parameter.
  • emission of EH/unwind info in a mono-specific compact format.

The changes consist of about 1.5k lines of code. The majority of this is the EH table emission.

Branches

  • release/6.x and release/9.x contain our changes

Maintaining the repository

The release/* branches are maintained by regularly rebasing them on top of upstream. This makes examining our changes easier. To merge changes from upstream to this repo, do:

git remote add upstream https://github.com/llvm/llvm-project.git
git fetch upstream
git rebase upstream/<target branch>
<fix conflicts/commit>
git push origin

Due to the rapid pace of development, and the frequent reorganization/refactoring of LLVM code, merge conflicts are pretty common, so maintaining our fork is time consuming. A subset of our changes can probably be submitted to upstream LLVM, but it would require some effort to clean them up, document them, etc.

Restrictions

There are a number of constructs that are not supported by the LLVM backend. In those cases the Mono code generation engine will fall back to Mono’s default compilation engine.

Exception Handlers

Nested exception handlers are not supported because of the differences in sematics between mono’s exception handling the c++ abi based exception handling used by LLVM.

Varargs

These are implemented using a special calling convention in mono, i.e. passing a hidden ‘signature cookie’ argument, and passing all vararg arguments on the stack. LLVM doesn’t support this calling convention.

It might be possible to support this using the LLVM vararg intrinsics.

save_lmf

Wrapper methods which have method->save_lmf set are not yet supported.

Calling conventions

Some complicated parameter passing conventions might not be supported on some platforms.

Implementation details

Virtual calls

The problem here is that the trampoline handing virtual calls needs to be able to obtain the vtable address and the offset. This is currently done by an arch specific function named mono_arch_get_vcall_slot_addr (), which works by disassembling the calling code to find out which register contains the vtable address. This doesn’t work for LLVM since we can’t control the format of the generated code, so disassembly would be very hard. Also, sometimes the code generated by LLVM is such that the vtable address cannot be obtained at all, i.e.:

 mov %rax, <offset>(%rax)
 call %rax

To work around these problems, we use a separate vtable trampoline for each vtable slot index. The trampoline obtains the ‘this’ argument from the registers/stack, whose location is dicated by the calling convention. The ‘this’ argument plus the slot index can be used to compute the vtable slot and the called method.

Interface calls

The problem here is that these calls receive a hidden argument called the IMT argument which is passed in a non-ABI register by the JIT, which cannot be done with LLVM. So we call a trampoline instead, which sets the IMT argument, then makes the virtual call.

Unwind info

The JIT needs unwind info to unwind through LLVM generated methods. This is solved by obtaining the exception handling info generated by LLVM, then extracting the unwind info from it.

Exception Handling

Methods with exception clauses are supported, altough there are some corner cases in the class library tests which still fail when ran with LLVM.

LLVM uses the platform specific exception handling abi, which is the c++ ehabi on linux, while we use our home grown exception handling system. To make these two work together, we only use one LLVM EH intrinsic, the llvm.eh.selector intrinsic. This will force LLVM to generate exception handling tables. We decode those tables in mono_unwind_decode_fde () to obtain the addresses of the try-catch clauses, and save those to MonoJitInfo, just as with JIT compiled code. Finally clauses are handled differently than with JITted code. Instead of calling them from mono_handle_exception (), we save the exception handling state in TLS, then branch to them the same way we would branch to a catch handler. the code generated from ENDFINALLY will call mono_resume_unwind (), which will resume exception handling from the information saved in TLS.

LLVM doesn’t support implicit exceptions thrown by the execution of instructions. An implicit exception is for example a NullReferenceException that would be raised when you access an invalid memory location, typically in Mono and .NET, an uninitialized pointer.

Implicit exceptions are implemented by adding a bunch of LLVM intrinsics to do loads/stores, and calling them using the LLVM ‘invoke’ instruction.

Instead of generating DWARF/c++ EHABI exception handling tables, we generate our own tables using a mono specific format, which the mono runtime reads during execution. This has the following advantages:

  • the tables are compact and take up less space.
  • we can generate a lookup table similar to .eh_frame_hdr which is normally generated by the linker, allowing us to support macOS/iOS, since the apple linker doesn’t support .eh_frame_hdr.
  • the tables are pointed to by a normal global symbol, instead of residing in a separate segment, whose address cannot be looked up under macOS.

Generic Sharing

There are two problems here: passing/receiving the hidden rgctx argument passed to some shared methods, and obtaining its value/the value of ‘this’ during exception handling.

The former is implemented by adding a new mono specific calling convention which passes the ‘rgctx’ argument in the non-ABI register where mono expects it, i.e. R10 on amd64. The latter is implemented by marking the variables where these are stored with a mono specific LLVM custom metadata, and modifying LLVM to emit the final stack location of these variables into the exception handling info, where the runtime can retrieve it.

AOT Support

This is implemented by emitting the LLVM IR into a LLVM bytecode file, then using the LLVM llc compiler to compile it, producing a .s file, then we append our normal AOT data structures, plus the code for methods not supported by LLVM to this file.

A runtime which is not configured by –enable-llvm=yes can be made to use LLVM compiled AOT modules by using the –llvm command line argument: mono –llvm hello.exe

Porting the backend to new architectures

The following changes has to be made to port the LLVM backend to a new architecture:

  • Define MONO_ARCH_LLVM_SUPPORTED in mini-<ARCH>.h.
  • Implement mono_arch_get_llvm_call_info () in mini-<ARCH>.h. This function is a variant of the arch specific get_call_info () function, it should return calling convention information for a signature.
  • Define MONO_CONTEXT_SET_LLVM_EXC_REG() in mini-<ARCH>.h to the register used to pass the exception object to LLVM compiled landing pads. This is usually defined by the platform ABI.
  • Implement the LLVM exception throwing trampolines in exceptions-<ARCH>.c. These trampolines differ from the normal ones because they receive the PC address of the throw site, instead of a displacement from the start of the method. See exceptions-amd64.c for an example.
  • Implement the resume_unwind () trampoline, which is similar to the throw trampolines, but instead of throwing an exception, it should call mono_resume_unwind () with the constructed MonoContext.

LLVM problems

Here is a list of problems whose solution would probably require changes to LLVM itself. Some of these problems are solved in various ways by changes on the LLVM Mono branch.

  • the llvm.sqrt intrinsic doesn’t work with NaNs, even through the underlying C function/machine instruction probably works with them. Worse, an optimization pass transforms sqrt(NaN) to 0.0, changing program behaviour, and masking the problem.
  • there is no fabs intrinsic, instead llc seems to replace calls to functions named ‘fabs’ with the corresponding assembly, even if they are not the fabs from libm ?
  • There is no way to tell LLVM that a result of a load is constant, i.e. in a loop like this:
  for (int i = 0; i < arr.Length; ++i)
     arr [i] = 0

The arr.Length load cannot be moved outside the loop, since the store inside the loop can alias it. There is a llvm.invariant.start/end intrinsic, but that seems to be only useful for marking a memory area as invariant inside a basic block, so it cannot be used to mark a load globally invariant.

http://hlvm.llvm.org/bugs/show_bug.cgi?id=5441

  • LLVM has no support for implicit exceptions:

http://llvm.org/bugs/show_bug.cgi?id=1269

  • LLVM thinks that loads from a NULL address lead to undefined behaviour, while it is quite well defined on most unices (SIGSEGV signal being sent). If an optimization pass determines that the source address of a load is NULL, it changes it to undef/unreachable, changing program behaviour. The only way to work around this seems to be marking all loads as volatile, which probably doesn’t help optimizations.
  • There seems to be no way to disable specific optimizations when running ‘opt’, i.e. do -std-compile-opts except tailcallelim.
  • The x86 JIT seems to generate normal calls as
  mov reg, imm
  call *reg

This makes it hard/impossible to patch the calling address after the called method has been compiled. <p> http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-December/027999.html

  • LLVM Bugs: [1]

Future Work

Array Bounds Check (ABC) elimination

Mono already contains a ABC elimination pass, which is fairly effective at eliminating simple bounds check, i.e. the one in:

for (int i = 0; i < arr.Length; ++i)

  sum += arr [i];

However, it has problems with “partially redundant” check, i.e. checks which cannot be proven to be reduntant, but they are unlikely to be hit at runtime. With LLVM’s extensive analysis and program transformation passes, it might be possible to eliminate these from loops, by changing them to loop-invariant checks and hoisting them out of loops, i.e. changing:

  for (int i = 0; i < len; ++i)
    sum += arr [i];

to:

  if (len < arr.Length) {
      <loop without checks>
  } else {
      <loop with checks>
  }

LLVM has a LoopUnswitch pass which can do something like this for constant expressions, it needs to be extended to handle the ABC checks too. Unfortunately, this cannot be done currently because the arr.Length instruction is converted to a volatile load by mono’s LLVM backend, since it can fault if arr is null. This means that the load is not loop invariant, so it cannot be hoisted out of the loop.