Giving Mono Souper Powers

Alexander Kyte December 6, 2018 runtime

By virtue of using LLVM, Mono has access to a wide suite of tools and optimization backends. A lot of active research uses LLVM IR. One such research project, Souper, tries to brute-force a search for missed optimizations in our emitted code. The .NET community may have software projects that benefit from using Souper directly to generate code, rather than waiting for us to find ways to automate those optimizations ourselves. This algorithm can generate code that would be very challenging for a traditional compiler to find.

The Mono .NET VM is a rather nimble beast. Rather than requiring all users to live with the performance characteristics of a given policy, we often choose to create multiple backends and bindings that exploit what’s best of the native platform while presenting a common interface. Part of this is the choice between using an interpreter, a Just-In-Time compiler, or an Ahead-Of-Time compiler.

AOT compilation is attractive to some projects for the combination of optimized code with low start-up time. This is the classic advantage of native code over code from a JIT or interpreter. AOT code is often much worse than code from a JIT because of a need for indirection in code that references objects in run-time memory. It’s important for AOT code to exploit every possible optimization to make up for this disadvantage. For this, we increasingly rely on optimizations performed by LLVM.

LLVM’s optimization passes can analyze a program globally. It is able to see through layers of abstractions and identify repeated or needless operations in a program’s global flow. Likewise, it can examine the operations in a small segment of code and make them perfect with respect to one another. Sometimes though, we fail to optimize code. Classic compilers work by analyzing the control-flow and dataflow of a program and matching on specific patterns such as stores to variables that aren’t used later and constants that are stored to variables rather than being propagated everywhere they can be. If the pattern matches, the transformation can take place. Sometimes the code we feed into LLVM does not match the patterns of inefficiency that it looks for, and we don’t get an optimization.

What’s worse is that we don’t know that we hit this optimization blocker. We don’t know what we expect from code until it’s a problem and we’re really spending time optimizing it. Spotting trends in generated machine code across thousands of methods is incredibly labor intensive. Often only really bad code that runs many many times will catch attention. Fixing every single missed optimization and finding every single missed optimization becomes a chicken-and-egg problem.

The solution to some manifestations of this problem is the use of superoptimizers. The academic discipline of superoptimizers is very old. The idea is to treat the code that was written as more of a restriction, a specification. The superoptimizer generates a ton of native code and checks the ways in which it behaves differently than the written code. If it can generate a faster native code sequence than the compiler generated while keeping behavior exactly the same, it wins.

This “exactly the same” part can be incredibly expensive if not done correctly. The computational effort involved has historically kept superoptimization from being used very often. Since then, it has gotten a lot easier to run computationally intensive jobs. Computer hardware has become orders of magnitude more powerful. Theorems around equivalence checking and control-flow representations made more powerful claims and used algorithms with better running times. We are therefore seeing superoptimization research reemerge at this time.

One superoptimizer in particular, named Souper, has reached maturity while interoperating with the industry standard code generator (LLVM) and the industry standard SMT engine (Z3). It has kickstarted a renewed faith in researchers that superoptimization is a reasonable policy. It can take the LLVM IR that a compiler was going to feed into LLVM, and compute better IR. This can sometimes take a lot of time, and the code emitted is the result of a process that isn’t auditable. The pipeline is placing total faith in Souper for the correctness of generated code.

It’s mostly useful for compiler engineers to use to tell that optimizations were missed, and to identify how to fix that using conventional pattern matching over the program’s control-flow and dataflow graphs. That said, Souper offers the ability to drop in for clang and to generate the code that is run. Some projects are eager to make any trade-offs for performance that are acceptable. Other projects may want to get a feel for how fast they could run if they were to invest making sure Mono generates good code. If the compile time increase doesn’t discourage them, many projects may find some benefit in such an optimizing compiler.

I recommend that curious readers install Z3, get a checkout of

https://github.com/google/souper,

and complete the compilation process described in that documentation.

When AOTing code with Mono, they’re going to want to pass the commandline flags named there into the ---aot=llvmopts= argument.

As of the time of this writing, that is

llvmopts="-load /path/to/libsouperPass.so -souper -z3-path=/usr/bin/z3"

Mono will then allow Souper to step in during the middle of the LLVM compilation and try it’s best at brute-forcing some better code. If there’s anything short and fast that does the job better, it will be found.

It is frankly amazing that Mono can get such extensive optimizations simply by compiling to LLVM IR. Without changing a single line of Mono’s source, we changed our compilation pipeline in truly dramatic ways. This shows off the lack of expectations that Mono has about the layout of our generated code. This shows off the flexibility of LLVM as a code generation framework and to Mono as an embedded runtime. Embedders using Mono should consider using our LLVM backend with this and other third-party LLVM optimization passes. Feedback about the impact of our research on real-world programs will help us decide what we should be using by default.