Making calls to WebAssembly fast and implementing anyref

Posted on Wed 04 July 2018 in mozilla • 7 min read

Since this is the end of the first half-year, I think it is a good time to reflect and show some work I've been doing over the last few months, apart from the regular batch of random issues, security bugs, reviews and the fixing of 24 bugs found by our fuzzers.

Bug 1319203: Make JS to WebAssembly calls blazingly fast

If we want more WebAssembly (wasm) adoption, there shouldn't be a big costly barrier between the two universes. That is, calls from one world to the other should be fast. For a very long time, calls from JS to asm.js/WebAssembly have been quite slow in Firefox. In fact, we didn't optimize them at all. For ease and speed of implementation at the time, asm.js call activations (data structures recording information about the function being currently called in the VM) were very different from the JS ones. This difference indicated some significant structural differences, like the capability to reconstruct call stack information used by Error() stack frames, or just tracing the stack for garbage collection purposes. After putting a lot of hard work into refactoring and low-level changes over the last year, Spidermonkey was finally ripe for an optimization.

When we call from JS to asm.js/wasm, the call passes through C++, does a bunch of work and then calls into a piece of glue code directly written in assembly: the interpreter entry stub. This stub is quite small: it just copies out the C++ arguments into the right places the wasm function being called expects, sets up some small machine state, calls into the function, then does error checking and eventually returns to the C++ caller. The critical part is JIT compilation. JIT compilation means that the code is compiled to machine code by the just-in-time compiler, IonMonkey. When a JS function has been JIT-compiled and it calls into wasm, then the caller would have to go back to C++ first, before the control flow is redirected to WebAssembly.

Diagram showing interpreter entry stub

Starting with Firefox 60, the JIT compiler makes no distinctions between calling a JavaScript function or a WebAssembly function, meaning it uses the same call optimizations for both kinds of function. A new piece of glue code, the JIT entry stub, is generated for each exported function: it converts and unboxes the arguments read from the JIT-compiled JS caller into the right primitive types as expressed in the wasm function's signature, sets up some machine registers, calls into the wasm function being called and then converts the result into a format the JS caller will understand.

Diagram showing JIT entry stub

As you can see, the C++ step that was originally required to call wasm from JS has been completely eliminated!

This resulted in massive speedups over a variety of different situations: when a wasm function is directly / indirectly / polymorphically called, or used as a getter/setter, or called by Function.prototype.call/apply, when the call is missing required arguments, etc. Here's a brief summary of the results, but there might be a full-blown blog post about these optimizations coming on Mozilla Hacks at some point in the future. (calling 1 billion times into very simple functions, lower is better)

Charts showing evolution of performance

This work is not entirely done yet: we can still even better optimize in the case of a function call from JS when the called wasm function is definitely known to be a unique wasm target; see the tracking bug.

Bug 1422043: Lazy entry stub generation

The previous bug resolution came with an important memory issue: every exported function now generates a rather big chunk of code for the JIT entry, having an impact on the memory occupied by the code itself. This would be fine in most situations where the number of exported functions is generally low. But when the wasm module exports a Table (think of the equivalent of a C++ function table with signature checks), we have to assume that every single function, including those not explicitly exported, needs entry stubs. Indeed, each function can be eventually called through the Table, after calls to WebAssembly.Table.set. In fact, the existing code already suffered from this because of the interpreter entries, but it had been largely amplified by the much larger JIT entry stubs.

To fix this, we've decided to lazily generate all the entry stubs for functions exported through a table. That is, if a function is explicitly exported, its stubs will be generated at wasm compile time, but other functions won't have stubs yet. If a non-exported function is called through a Table, we'll generate the entry stubs the first time it is called. This involves some fun interactions with our tiered compilation mechanism, which can compile functions and create new entry stubs in the background while the running thread will generate lazy ones.

Not only this fixed the memory regression introduced by bug 1319203, but it actually made the situation even better than the baseline, because we didn't need to generate those interpreter entries for table-exported functions by default anymore:

Charts showing evolution of memory usage

Since it's not entirely readable from the chart: after the patches, the AngryBots and ZenGarden entry stubs memory usages went down to respectively 262 and 362 KB. This was also a relatively huge win in compilation times, but on such a low scale that it didn't make a huge difference on total compile time.

Bug 1447591: Remove wasm::BinaryToText

WebAssembly is a binary format, and there is an equivalent human-readable and debuggable text format: the WebAssembly Text format, or WAT format. While SpiderMonkey once directly produced WAT for display in C++, it's now easier for debugger.html to do so in JS. This also made the mapping between bytecode offsets and text offsets (source maps) more consistent with the display, and it could be useful in other places where this project is being used. Recently after confirming that the C++ implementation wasn't used anymore, I was able to remove it. It's not every day that you get a net loss of around 5,500 lines of code, which is always nice: less code means fewer bugs and less maintenance burden, especially when the code is dead.

Bug 1445272 / 1450261: Implement basic anyref support

A new proposal has been made to the WebAssembly specification committee a few months ago: to add reference types to the type system. Reference types are a new way to represent a reference to any host values. In a Web environment, this means being capable of playing with JavaScript values within WebAssembly. This is a huge difference with the existing type system, which only contains primitive types: integers represented on 32 or 64 bits, IEEE754 floating-point numbers represented on 32 or 64 bits. This is also a first step for implementing garbage collection (GC) integration within WebAssembly: since these reference values have been allocated on the GC heap in JavaScript, they need to be traced during wasm execution.

The basic implementation of this feature in the first bug allows one to use a new type, called anyref, as part of a function's signature or in local variables, be it in a function definition or an imported function. This allows using JS variables within wasm and pass them around to other JS functions. The second bug implemented the capability to read and write anyref values in wasm Globals [1]. Since Globals can be manipulated outside of the wasm Module thanks to their JS API, and garbage collections can happen at any time in JS, we needed to implement GC barriers to make sure that the stored value would not be marked as unused during tracing. There is good literature explaining why these barriers are needed and what they do, so I will not expand too much on the topic.

Here's an example of usage according to latest spec drafts (and therefore subject to change for now):

(module
    (func $alert (import "env" "alert") (param anyref))
    (global $global_ref (mut anyref) (ref.null anyref))
    (func (export "set_and_alert") (param $param anyref) (result anyref)
        ;; Put the previous value of $global_ref on the virtual value stack.
        get_global $global_ref
        ;; Get the argument anyref value and store it in $global_ref.
        get_local $param
        set_global $global_ref
        ;; Call the $alert method with the argument anyref value.
        get_local $param
        call $alert
        ;; The previous value of $global_ref is still on the stack and will be
        ;; returned.
    )
)

Example of wasm text format using anyref.

(async function() {
    let { instance } = await WebAssembly.instantiate(wasmBinary, {
        env: {
            alert(obj) {
                alert(`Hello, ${obj.name}!`);
            }
        }
    });
    console.log(instance.exports.set_and_alert({
        name: 'world',
        secretVal: 42
    }));
    // alerts "Hello, world!", logs null

    console.log(JSON.stringify(instance.exports.set_and_alert({
        name: 'there'
    })));
    // alerts "Hello, there!", logs { name: 'world!', secretVal: 42 }
})();

Example of JavaScript using the module defined above, passing JS values and reading them from WebAssembly.

This is a very preliminary prototype and it might change in the next few months. If you feel adventurous, you can try it on Firefox Nightly by setting the about:config pref javascript.options.wasm_gc to true; note that we haven't fully hooked this up to garbage collection yet, so your experimentation might occasionally throw out-of-memory exceptions. In any case, if you see something, say something.

Say you are a compiler developer, and you would like to port your language to WebAssembly, and your language uses a GC. At the moment, the only way you can do this is by compiling your garbage collector to WebAssembly, and it would be backed by the wasm Module's memory. This works, but it won't be very efficient. Plus, there's already a very efficient, solidly tested, constantly improving garbage collector in your browser that uses all the possible dirty low-level tricks known to mankind, which is the GC being used for JavaScript. What if we could give you access to the garbage collector directly? Then you'd just need to give a way to define structures, and then could use a set of opcodes to allocate them, read and write fields on them, etc. At the moment, the reference types proposal only allows you to move garbage-collected values around. There's also code in Firefox Nightly to experiment with defining your own data structures and using them, but it is very very early. If you're interested in following us implementing more parts, this tracking issue might be of interest.

[1] Think of a C++ "global" value, not a JavaScript "global".

Future work

There is still much more work to be done on the implementation of WebAssembly in Spidermonkey, to implement other new proposals, to make it faster, or to have even better generated code.

A big thank you for the proofreading to Waldo, steveklabnik and ashleygwilliams. Extra thanks go to Ashley who also drew the two diagrams showing how stubs evolved.