Design goals: fast interpreter, polyfill ease, and preorder versus postorder encoding #239

jfbastien · 2015-06-30T17:33:49Z

We were recently discussing preorder versus postorder encoding. Different folks (@titzer, @kg, @ncbray, @sunfishcode, @lukewagner) have explored this in the prototype repos and elsewhere. We haven't reached an agreement yet on which is better (considering compression, decoding / interpretation time, implementation complexity).

Postorder may be better for interpreter, preorder for CFG building. We need more experiments and data to figure out what the tradeoffs are.

Once we have this data, we need to answer a bigger question:

Is it a goal to create a fast interpreter?
Is the polyfill a very short-term stopgap, or will it stay alive for a while?

kripken · 2015-06-30T17:36:27Z

I think a fast interpreter is a worthy goal. A lot of code on the web ends up executed exactly once, and just interpreting the code can be faster than JITing it first. So I would expect wasm interpreters to be used in production.

sunfishcode · 2015-06-30T17:40:01Z

I think a fast interpreter is a worthy secondary goal. WebAssembly as we know it today isn't going to be a great choice for code on the web which is expected to be executed exactly once. JavaScript handles that use case quite well already.

MikeHolman · 2015-06-30T17:41:14Z

@kripken Agreed. We will definitely be shipping an interpreter, and we have found interpreter perf is pretty good, and is even acceptable for startup on asm.js (we had a baseline JIT but it didn't really buy much over a baseline interpreter). If we can reduce startup time by not having to do a bytecode generation pass on the main thread, I think that is a noble cause.

titzer · 2015-06-30T17:51:11Z

At the minimum you have to do a pass to validate the types, so if that pass
spits out an internal bytecode along the way, it's pretty close to free.

I think fast interpretation of direct wire format is going to be a tough
nut to crack and could drive us away from the primary goals of size and
fast JIT compilation.

On Tue, Jun 30, 2015 at 7:41 PM, Michael Holman notifications@github.com
wrote:

@kripken https://github.com/kripken Agreed. We will definitely be
shipping an interpreter, and we have found interpreter perf is pretty good,
and is even acceptable for startup on asm.js (we had a baseline JIT but it
didn't really buy much over a baseline interpreter). If we can reduce
startup time by not having to do a bytecode generation pass on the main
thread, I think that is a noble cause.

—
Reply to this email directly or view it on GitHub
#239 (comment).

kripken · 2015-06-30T17:54:51Z

I agree that interpreting is secondary to baseline or full-optimizing JITing. However it still feels quite important. It would be nice if wasm had an option to annotate a module with "-O0" or "-O2" etc., which PNaCl has now, and then websites could mark code as "-O0" when they know it executes only once, in which case the browser could avoid a large amount of overhead. It would be far better than JS engines can do now, in fact, so it would be a big improvement for the web.

jfbastien · 2015-06-30T17:57:17Z

@kripken I'd rather annotate on a per-function basis, instead of whole module.

kripken · 2015-06-30T17:57:57Z

Agreed, that would be even better.

kg · 2015-06-30T18:13:11Z

The polyfill consideration is pretty important and I've seen a mixture of views expressed here, so it's worthwhile to figure out whether there's any agreement on that part of the premise for any pre/post-order decision we make.

Do we believe that the polyfill will live around a year at most, as some people have expressed? I personally consider that implausible based on the history of internet technology adoption, but it is possible if every involved party pushes really hard. If we collectively think that sort of aggressive adoption schedule is possible and commit to push hard for it, that frees us to not spend too much time worrying about the polyfill when designing the file format. If we can't ensure a compressed time-scale, any format decisions made in favor of an interpreter could compromise polyfill performance for years.

titzer · 2015-06-30T18:26:28Z

From my perspective, it seems likely the polyfill will continue to exist to
support older browsers, even 2-3 years from now.

On Tue, Jun 30, 2015 at 8:13 PM, Katelyn Gadd notifications@github.com
wrote:

The polyfill consideration is pretty important and I've seen a mixture of
views expressed here, so it's worthwhile to figure out whether there's any
agreement on that part of the premise for any pre/post-order decision we
make.

Do we believe that the polyfill will live around a year at most, as some
people have expressed? I personally consider that implausible based on the
history of internet technology adoption, but it is possible if every
involved party pushes really hard. If we collectively think that sort of
aggressive adoption schedule is possible and commit to push hard for it,
that frees us to not spend too much time worrying about the polyfill when
designing the file format. If we can't ensure a compressed time-scale, any
format decisions made in favor of an interpreter could compromise polyfill
performance for years.

—
Reply to this email directly or view it on GitHub
#239 (comment).

qwertie · 2015-06-30T18:51:24Z

I view WebAssembly as something that, if designed well, is a standard (or a precursor to one) that will last over 100 years. With today's self-updating browsers, the life of the polyfill will be very brief in comparison.

paleozogt · 2015-07-01T15:22:20Z

@kg Given that "Safari is the new IE", I suspect that the Polyfill will be around for a long time.

MikeHolman · 2015-07-02T17:23:40Z

For the last couple days at lunch @LouisLaf and I have been going back and forth on the polyfill. One thing we've discussed is having a wasm feature check and then the server can serve up either the polyfilled js (gzipped) or the wasm. Obviously, this will not be quite as small as the gzipped binary, but from the numbers in the FAQ it's still pretty good.

If we do this, then we no longer need to consider the polyfill when discussing the binary encoding.

kripken · 2015-07-02T17:29:26Z

It reduces the need for quick translation of the binary encoding to JS, but we would still need a practical way to do that translation which emits efficient JS.

Relying on separate downloads would make hosting such applications a little more complicated for people, but it's debatable if that's a serious problem or not.

MikeHolman · 2015-07-02T17:43:51Z

we would still need a practical way to do that translation which emits efficient JS.

@kripken Of course, but that is a matter of AST semantics, not the binary encoding. Right now, the binary encoding is being optimized for speed of the polyfill. If we decouple the polyfill, then the binary encoding can be optimized for other things.

kripken · 2015-07-02T18:36:48Z

Agreed, assuming we are ok with the hosting complexity, it would remove the need for focusing on quick translation to JS in the binary encoding.

kg · 2015-07-02T19:11:05Z

Server side conversion does not seem like a viable approach.

It requires smart, carefully configured servers instead of being able to host an wasm application anywhere. It also prevents an intelligent polyfill that generates code based on the target browser's feature support. It prevents any strategy other than AOT compilation of an entire module. It requires the server software/configuration to be updated in response to wasm revisions instead of putting a different polyfill url in the webpage. It complicates scenarios where multiple versions of the polyfill may need to be used (for example, running a very old wasm application on one page, and a very new one on another.) It complicates compatibility fallback in cases where you can't easily sniff support from the server - native module loads but browser's native implementation is too old so fallback to polyfill occurs, &c.

MikeHolman · 2015-07-02T23:12:10Z

@kg It's true that there is additional overhead for hosting. But I don't really understand the rest of your argument.

It also prevents an intelligent polyfill that generates code based on the target browser's feature support.

How would this be used? What kind of feature testing would a client-side polyfill do that couldn't be relayed back to the server to have the appropriate code served up? Are we expecting the polyfill to be so polymorphic?

It prevents any strategy other than AOT compilation of an entire module.

How? I certainly am not pushing for a full AOT compilation strategy and I don't see a conflict.

It complicates compatibility fallback in cases where you can't easily sniff support from the server - native module loads but browser's native implementation is too old so fallback to polyfill occurs, &c.

We can still keep the strategy where ops which are not yet implemented have js polyfill fallbacks. We should not have a case where someone has a wasm implementation, but it is so fundamentally wrong that a full js polyfill is needed.

The only type of scenario I can think of is someone implements SharedArrayBuffer in js, but doesn't implement any threading in wasm. You have a wasm module that uses threading and you can't use it here, but it is fundamental to your app, so you must fall back completely to a polyfill. Is this even polyfillable in a single pass? And is that not information the client can figure out before deciding which content to download? And is this what we believe that polyfill will be used for? I hope we're not already conceding to non-conformant wasm implementations. Maybe I'm giving a strawman, but I really can't think of a good fallback case that this is blocking.

kg · 2015-07-03T00:52:26Z

How would this be used? What kind of feature testing would a client-side polyfill do that couldn't be relayed back to the server to have the appropriate code served up? Are we expecting the polyfill to be so polymorphic?

There are future language improvements on the table that a polyfill would ideally be able to take advantage of, like simd.js and int64 and weakrefs. We can't have the polyfill generate code for this offline at present. If we want to do offline code generation on the server for polyfill scenarios, this would mean wasm applications wouldn't be able to leverage these features until they're integrated directly into a native wasm VM. I think that's undesirable. There are other scenarios where serving the exact same code up to everyone could be problematic. If we try to do all of this in the native VM through bodges you will end up having to cross the FFI every time you interact with these experimental features, which is super not awesome.

It prevents any strategy other than AOT compilation of an entire module.

How? I certainly am not pushing for a full AOT compilation strategy and I don't see a conflict.

I'm not sure how else to describe 'translate the entire wasm module on the server and feature detect to decide whether to send it to the browser' other than 'ahead-of-time compilation'.

It complicates compatibility fallback in cases where you can't easily sniff support from the server - native module loads but browser's native implementation is too old so fallback to polyfill occurs, &c.

We can still keep the strategy where ops which are not yet implemented have js polyfill fallbacks. We should not have a case where someone has a wasm implementation, but it is so fundamentally wrong that a full js polyfill is needed.

This still implies crossing the FFI boundary repeatedly and has nasty implications for the shape of data on the stack and where values need to live. There are already scenarios where people have needed to generate 'almost asm' code instead of 'use asm' code for the short-term benefits. We're not talking 'fundamentally wrong' here, just 'things have changed a lot and we aren't doing explicit WebAssembly versioning'.

The only type of scenario I can think of is someone implements SharedArrayBuffer in js, but doesn't implement any threading in wasm. You have a wasm module that uses threading and you can't use it here, but it is fundamental to your app, so you must fall back completely to a polyfill. Is this even polyfillable in a single pass? And is that not information the client can figure out before deciding which content to download? And is this what we believe that polyfill will be used for? I hope we're not already conceding to non-conformant wasm implementations. Maybe I'm giving a strawman, but I really can't think of a good fallback case that this is blocking.

The feature detection proposals we've discussed are at least partially designed to enable you to early-detect whether or not a module relies on features your VM doesn't support. This would enable early fallback to a JS polyfill or interpreter. On the other hand, it's my impression that I'm the only person who cares about that scenario, so maybe it doesn't matter. See #90. The fallback philosophy that has been discussed so far has tiers (opcode can be implemented with a wasm fallback function; opcode can be implemented with a js fallback function; only way to implement is by running a polyfill that implements it.) The instruction table idea described near the bottom of that issue is important.

I'm not sure what we believe the polyfill will be used for. But I think it will probably live at least 3 years, and we want to ensure that we can always fall back to it in any scenario where we want to experiment with sweeping changes/improvements to WebAssembly. We also want to empower people to prototype changes against real-world applications without requiring them to directly modify the code of a native VM or use the interpreter.

I think having a high-performance, flexible polyfill that can load and run webassembly modules on the client is really important. Any solution that requires you to run something on the server and do sniffing to serve up files to clients positions us as akin to something like Dart or CoffeeScript instead of positioning us as a successor to technologies like asm.js and pnacl. A client-side polyfill (hopefully loaded off a CDN operated by a trustworthy browser vendor) is much easier to update and keep current than a sniffing mechanism built into HTTPDs. (Do we really think we can make sniffing work? I thought history had shown that it's always fragile.)

lukewagner · 2015-07-04T08:49:12Z

(Sorry for delay, on vacation; processing asynchronously.) After recent discussions with @MikeHolman and @LouisLaf, @sunfishcode and I were re-examining the pre- vs. post-order question. Their positive experience with using an interpreter as a the warm-up baseline (instead of a dumb compiler) was pretty interesting and made interpreter perf more interesting than during previous considerations.

So first, I think a key point is the one @titzer made above that an engine is probably going to want to do some streaming validation and transformation to an internal suitable-for-direct-interpretation bytecode anyway, regardless of how we design our binary format (unless we massively specialize for an assumed interpreter implementation, something I hope we can all agree is a bad idea). That combined with streaming compilation I think negates any startup advantage of interp over dumb-compiler; I think the main argument for an interp is just simplicity/portability.

Next, I think the polyfill can get by well enough with postorder and thus we can drop polyfill from our consideration. In the worst case (assuming we want to keep things single-pass and avoid any intermediate analyses, which I think is critical for startup time), we can just pessimistically insert a fixed number of spaces that get back-patched to parens and coercions as necessary. (Preorder makes this unnecessary b/c you can usually judge whether parens/coercions are necessary before splatting out the children.) This may bloat the output size 2-3x though (hard to know w/o implementing). If this ended up hurting (maybe contributing to OOM or slow startup), I realized we could do better by integrating the polyfill with the specific layer and thus the specific layer format could be designed to be optimal for polyfilling. Since we're not standardizing the specific layer any time soon, these blemishes would be swept away in the long term.

So what I think it comes down to is: which format is easiest to compile (to internal interp IR, dumb compiler, smart compiler) where "easiest" is both performance and simplicity.

Initially, I thought preorder was the clear winner. That was mostly b/c I imagined a recursive descent parser (such elegance, much maths). However, I does seem like stack-based algos get grosser with preorder. (I'm thinking about pure expressions here; for control-flow ops, I assume we wouldn't require a pure postorder since not even interp wants that.) OTOH, postorder practically mandates a stack-based algo (which could be a good thing if it squeezes out browser variance!).

There was also the argument that pre-order allows type-segregated opcode index spaces. However, as I was convincing myself in #58, type-segregation is probably not a size win at the raw layer. There is also the advantage that type-segregation requires less work to validate (ops are correctly-typed by construction, don't have to test), but it remains to be seen how much this wins in practice.

So currently, I'm actually leaning slightly in favor of post-order, but I'd be most interested to hear from others implementing experimental wasm decoders/compilers (esp. @titzer, having already implemented preorder in v8-native, I think?).

titzer · 2015-07-04T16:31:13Z

The more I think about this, the only advantage that I see for a postorder
tree layout is:

a.) if it were only in a "basic block", then one could interpret the basic
block directly. Currently, we have blocks, but they can have nested blocks,
etc. I'm not sure what distinction we'd make here.
b.) if one wants to build an explicit AST tree structure, then the decoder
is nicer; simply build a new node when popping the stack

The v8-native-prototype decodes a pre-order layout of the AST directly to
TurboFan IR (with SSA renaming) in a single pass. Preorder has these
advantages:

a.) The decoder knows when control flow is starting for ifs and loops:

it splits the SSA renaming environment for the branches of the if
it introduces phis for the variables at the header of the loops
it allocates structures to merge the control flow and SSA renaming from
continues and breaks
b.) The decoder knows the expected type of every expression when it starts
decoding it.
c.) blocks, switches, and other variable-length tree nodes have a known
arity, since the decoder sees the block or switch bytecode first.
d.) a "sub AST" can be easily be identified. e.g. if we want to decode only
a single if or loop, we can point to the first bytecode and the keep
decoding until the decoder "pops" its internal stack to be empty.
e.) the above point will allow a decoder to determine the assigned
variables in a loop at the loop header, reducing the number of phis
introduced at loop headers. (TF does this for JS source with an AST walk)
f.) The advantage d) above could allow modules to overlap the storage of
function bodies or subcontrol in interesting ways, along the lines that
Katelyn was exploring.

For compilers, building the graph in a single pass in this way also allows
eager optimization, if we want to do that on the graph; we can do constant
folding, range analysis, strength reduction, directly on each graph node as
it is built, because no fixup pass is necessary. I can't see how to do this
on a post-order layout without building an ephemeral AST first and then
visiting that.

I could even conceive of a single-pass bytecode or even machine code
generator that works directly on the bytecode, with the same built-in
stack. This is basically how fullcodegen works in V8; it walks the JS AST
and directly generates code from it using recursive descent.

-B

On Sat, Jul 4, 2015 at 10:49 AM, Luke Wagner notifications@github.com
wrote:

(Sorry for delay, on vacation; processing asynchronously.) After recent
discussions with @MikeHolman https://github.com/MikeHolman and @LouisLaf
https://github.com/LouisLaf, @sunfishcode
https://github.com/sunfishcode and I were re-examining the pre- vs.
post-order question. Their positive experience with using an interpreter as
a the warm-up baseline (instead of a dumb compiler) was pretty interesting
and made interpreter perf more interesting than during previous
considerations.

So first, I think a key point is the one @titzer
https://github.com/titzer made above that an engine is probably going
to want to do some streaming validation and transformation to an internal
suitable-for-direct-interpretation bytecode anyway, regardless of how
we design our binary format (unless we massively specialize for an assumed
interpreter implementation, something I hope we can all agree is a bad
idea). That combined with streaming compilation I think negates any startup
advantage of interp over dumb-compiler; I think the main argument for an
interp is just simplicity/portability.

Next, I think the polyfill can get by well enough with postorder and thus
we can drop polyfill from our consideration. In the worst case (assuming we
want to keep things single-pass and avoid any intermediate analyses, which
I think is critical for startup time), we can just pessimistically insert a
fixed number of spaces that get back-patched to parens and coercions as
necessary. (Preorder makes this unnecessary b/c you can usually judge
whether parens/coercions are necessary before splatting out the children.)
This may bloat the output size 2-3x though (hard to know w/o implementing).
If this ended up hurting (maybe contributing to OOM or slow startup), I
realized we could do better by integrating the polyfill with the specific
layer
https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md#binary-encoding
and thus the specific layer format could be designed to be optimal for
polyfilling. Since we're not standardizing the specific layer any t ime
soon , these blemishes would be swept away in the long term.

So what I think it comes down to is: which format is easiest to compile
(to internal interp IR, dumb compiler, smart compiler) where "easiest" is
both performance and simplicity.

Initially, I thought preorder was the clear winner. That was mostly b/c I
imagined a recursive descent parser (such elegance, much maths). However, I
does seem like stack-based algos get grosser with preorder. (I'm thinking
about pure expressions here; for control-flow ops, I assume we wouldn't
require a pure postorder since not even interp wants that.) OTOH, postorder
practically mandates a stack-based algo (which could be a good thing if it
squeezes out browser variance!).

There was also the argument that pre-order allows type-segregated opcode
index spaces. However, as I was convincing myself in #58
#58, type-segregation is
probably not a size win at the raw layer
https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md#binary-encoding.
There is also the advantage that type-segregation requires less work to
validate (ops are correctly-typed by construction, don't have to test), but
it remains to be seen how much this wins in practice.

So currently, I'm actually leaning slightly in favor of post-order, but
I'd be most interested to hear from others implementing experimental wasm
decoders/compilers (esp. @titzer https://github.com/titzer, having
already implemented preorder in v8-native, I think?).

—
Reply to this email directly or view it on GitHub
#239 (comment).

lukewagner · 2015-07-06T19:12:51Z

@titzer Those are all great points. It seems like all but one are concerning control flow. I was thinking the same thing, which is why I added this caveat above:

(I'm thinking about pure expressions here; for control-flow ops, I assume we wouldn't require a pure postorder since not even interp wants that.)

The one non-control-flow advantage you mentioned was (b). This is mostly what I've been thinking about and what I was considering in my above comment: are there any advantages to preorder for pure expressions?

LouisLaf · 2015-07-06T20:47:04Z

I would like to re-iterate the importance of an interpreter for a runtime like wasm:

Ease of prototyping, debugging and baseline.
Fast startup. Chakra uses a fast interpreter for JavaScript and asm.js startup, and we've had a lot of success with them. We will have an interpreter for WebAssembly.
Run once code. An interpreter is ideal for code that runs only once.
Secure scenarios where JIT'ing is not permitted. We have instances of this with Xbox for example.
Low memory scenarios like IoT. The ability for direct interpretation of the code is particularly important here.

A post-order representation for expression works much better for interpretation than pre-order, but a pure post-order AST representation really does not work great. You really need a better way to represent flow since you don't want the interpreter to have to process code which isn't being executed. This is possible and include all of the advantages @titzer pointed at in #223.

Types can be inferred from leafs (adding more precise typing for locals would help here), you can have types available right away. Flow can be done in a pre-order way, and the addition of target offsets for branches would make it efficient to interpret.

So you could have something like: if (x > y) { x = y } else { y = x}
1: get_local x
2: get_local y
3: gt
4: if (offset 8)
5: get_local y
6: set_local x
7: else (offset 11)
8: get_local x
9: set_local y
10: endif (may not be needed)
11:...

Something like this would be easy for an interpreter or a JIT. The CFG is well represented, With precise types on x and y, all the other types are easy to infer. The polyfill would be quite easy to write but would need a simple stack.

titzer · 2015-07-07T06:46:36Z

I agree with the goals of the interpreter for fast start up and, e.g. run
once code. For the latter case, it might still be possible to make an
"AST-walking" interpreter that essentially keeps an explicit stack to deal
with preorder code. That wouldn't require generating a second bytecode
representation, which would be silly for startup code, though it wouldn't
be fast as a bytecodes already laid out in postorder.

On Mon, Jul 6, 2015 at 10:47 PM, Louis Lafreniere notifications@github.com
wrote:

I would like to re-iterate the importance of an interpreter for a runtime
like wasm:

Ease of prototyping, debugging and baseline.

Fast startup. Chakra uses a fast interpreter for JavaScript and asm.js
startup, and we've had a lot of success with them. We will have an
interpreter for WebAssembly.

Run once code. An interpreter is ideal for code that runs only once.

Secure scenarios where JIT'ing is not permitted. We have instances of
this with Xbox for example.

Low memory scenarios like IoT. The ability for direct interpretation of
the code is particularly important here.

A post-order representation for expression works much better for
interpretation than pre-order, but a pure post-order AST representation
really does not work great. You really need a better way to represent flow
since you don't want the interpreter to have to process code which isn't
being executed. This is possible and include all of the advantages @titzer
https://github.com/titzer pointed at in #223
#223.

Types can be inferred from leafs (adding more precise typing for locals
would help here), you can have types available right away. Flow can be done
in a pre-order way, and the addition of target offsets for branches would
make it efficient to interpret.

So you could have something like: if (x > y) { x = y } else { y = x}
1: get_local x
2: get_local y
3: gt
4: if (offset 8)
5: get_local y
6: set_local x
7: else (offset 11)
8: get_local x
9: set_local y
10: endif (may not be needed)
11:...

Something like this would be easy for an interpreter or a JIT. The CFG is
well represented, With precise types on x and y, all the other types are
easy to infer. The polyfill would be quite easy to write but would need a
simple stack.

—
Reply to this email directly or view it on GitHub
#239 (comment).

lukewagner · 2015-07-07T19:07:31Z

@titzer What do you think about using pre-order for control flow, post-order for expressions?

titzer · 2015-07-07T22:39:11Z

I still think interpreting the bytecode directly is going to be tough. E.g.
upon more reflection I realized that even in preorder, you need to know the
start of the branches of an if w.r.t. to the if bytecode. If we have
[if][true-expression][false-expression], the engine needs to find the start
of the [false-expression] cheaply (i.e. constant time) without parsing the
true-expression. That would probably mean adding a size or offset to the
if, which kind of defeats the AST representation, IMO.

On Tue, Jul 7, 2015 at 9:08 PM, Luke Wagner notifications@github.com
wrote:

@titzer https://github.com/titzer What do you think about using
pre-order for control flow, post-order for expressions?

—
Reply to this email directly or view it on GitHub
#239 (comment).

titzer · 2015-07-07T22:39:41Z

Sorry, I mean, [if][cond-expression][true-expression][false-expression].

On Wed, Jul 8, 2015 at 12:39 AM, Ben L. Titzer titzer@google.com wrote:

I still think interpreting the bytecode directly is going to be tough.
E.g. upon more reflection I realized that even in preorder, you need to
know the start of the branches of an if w.r.t. to the if bytecode. If we
have [if][true-expression][false-expression], the engine needs to find the
start of the [false-expression] cheaply (i.e. constant time) without
parsing the true-expression. That would probably mean adding a size or
offset to the if, which kind of defeats the AST representation, IMO.

On Tue, Jul 7, 2015 at 9:08 PM, Luke Wagner notifications@github.com
wrote:

@titzer https://github.com/titzer What do you think about using
pre-order for control flow, post-order for expressions?

—
Reply to this email directly or view it on GitHub
#239 (comment)
.

kg · 2015-07-07T22:40:47Z

Yeah, inline representation of child nodes of variable length kinda wrecks any approach for efficiently interpreting the bytecode right out of storage. I think you end up having to decode the AST no matter what, and hope that the decoding pass is cheap.

MikeHolman · 2015-07-07T22:44:14Z

@titzer Yes, you would have to encode offsets, which @LouisLaf alluded to.

lukewagner · 2015-07-08T08:16:30Z

@titzer The reason I'm favoring postorder for expressions isn't direct interpretation, but because it seems slightly more natural for iterative (i.e., non-recursive) decoding of expressions (not control flow).

@MikeHolman Is that a proposal to include offsets in the raw binary format? I think this would have worse size and complicate/slow general validation for this one case. Assuming 'no', though, and you want offsets, then you'd need to translate to an internal bytecode with offsets (so it's not direct interpretation). I could also imagine an interpreter that JITed with a very very low hotness count (like 2), taking advantage of the fact that most code is dead or run-once and in this case, you might not even need offsets.

rossberg · 2015-07-08T11:04:49Z

FWIW, I agree: for a mostly tree-like data structure, post-order serialisation is more natural, and likely also slightly more efficient to decode -- at least that was what we measured a long time ago in a project I was involved in. Though that was for general serialisation; if it's an AST and you also want to be able to create a CFG from that directly then the equation might change somewhat.

@luke, your suggestion seems dependent on that other question, whether statements should be expressions. If they are, you probably cannot (or don't want to) represent control differently. Would that be a big issue, though? It's easy to extend postorder deserialisation to handle DAGs and general graphs.

titzer · 2015-07-08T12:52:53Z

I don't think postorder expression encoding makes any difference (positive
or negative) if one chooses a recursive decent parsing strategy. If one
chooses an LR parser strategy, there are already reduce actions in the
middle and end of decoding if's, so it's not a big deal to add reduce
actions for preorder expressions. In fact, that's exactly what the
v8-native-decoder does.

On Wed, Jul 8, 2015 at 1:05 PM, rossberg-chromium notifications@github.com
wrote:

FWIW, I agree: for a mostly tree-like data structure, post-order
serialisation is more natural, and likely also slightly more efficient to
decode -- at least that was what we measured a long time ago in a project I
was involved in. Though that was for general serialisation; if it's an AST
and you also want to be able to create a CFG from that directly then the
equation might change somewhat.

@luke https://github.com/luke, your suggestion seems dependent on that
other question, whether statements should be expressions. If they are, you
probably cannot (or don't want to) represent control differently. Would
that be a big issue, though? It's easy to extend postorder deserialisation
to handle DAGs and general graphs.

—
Reply to this email directly or view it on GitHub
#239 (comment).

creationix · 2015-07-08T15:34:19Z

If we have [if][true-expression][false-expression], the engine needs to find the start of the [false-expression] cheaply (i.e. constant time) without parsing the true-expression.

I recently implemented an AST based interpreter for microcontrollers. It's not based on wasm bytecodes, but I did it using a binary AST. I tried several styles including luajit inspired fixed-width bytecodes, forth style stack operations and pre-order AST. The pre-order was by far faster in my implementations even without jump offsets in my if/elif/else nodes. I simply have a skip function that quicky skips over exactly one expression worth of bytecode. On laptop my interpreter is able to sum the first 1,000,000,000 integers in about 20 seconds.

Yes embedding jump offsets after IF/ELIF and before ELIF/ELSE would probably make it faster, but it greatly complicates the bytecode. For my uses (kids learning on microcontrollers) scanning is plenty fast. Just a data point to consider.

creationix · 2015-07-08T15:36:10Z

Also fwiw, in my language, everything including control flow is an expression.

lukewagner · 2015-07-08T18:15:00Z

@rossberg-chromium Even in the statements-are-expressions case, I think it'd be ok to nest pre-/post-order. It would be a problem for a generic traversal algo (that doesn't know anything about the specific node types), but for a custom wasm decoder (that switches on the decoded opcode before moving on to the next), I think I see how it'd work (though of course I'd want to implement to be sure).

kripken · 2015-07-08T18:33:16Z

@creationix: that's interesting data, thanks. do you have insight as to why preorder was faster in your interpreter? perhaps it was heavily tied to the fact you have fixed-size bytecodes?

bnjbvr · 2015-07-10T15:03:26Z

I like the idea of an interpreter for the reasons @LouisLaf brought up. In particular,

Secure scenarios where JIT'ing is not permitted. We have instances of this with Xbox for example.

would be of interest for the Tor browser (which is based on Firefox). Using only a wasm interpreter in addition to a JS interpreter sounds like a possibly acceptable tradeoff for the Tor browser (rather than blocking all JS, as they do, as of today).

creationix · 2015-07-10T15:27:35Z

@kripken I'm not entirely sure why it's so much faster. Perhaps I just programmed better that way.

With both, I have 7-bit opcodes with 1 bit to signify it's an opcode.

1xxxxxxx - opcode, literal entry into enum table starting at 128.
0mxxxxxx mxxxxxxx* - variable length unsigned integer. Numbers under 64 represent themselves.

I don't use fixed-size codes for the preorder or post-order versions, that was only for the luajit style bytecodes (4 bytes each).

The preorder version can be seen visually in the IDE for kids I'm working on at http://uscript.creationix.com/ (though I graphically show math ops as infix to make it easier for kids)

In preorder, there is no length header or even end marker. Every op has a fixed set of operators. For variable length bodies, I have a DO x expression* expression. I use the native C stack as my interpreter stack. It's essentially a recursive descent interpreter. There are no function parameters and instead there are a fixed set of global variables.

For postorder, it's more like forth, which I expected to be much faster. I still have 7-bit ops and variable length integers, but each op simply consumes items on the stack and/or pushes new items on the stack. I have to pass in a length parameter to eval to know how many token to consume. I hadn't decided if I wanted a length header or end markers. After benchmarking, the pre-order was an order of magnitude faster.

Post-order https://github.com/creationix/uscript/blob/master/fscript.c
Pre-order https://github.com/creationix/uscript/blob/master/uscript.c

edit: added screenshot of IDE prototype since it's likely to change in the near future.

jfbastien · 2016-07-26T22:13:16Z

Conclusion: fast intrp and polyfill aren't primary goals.

jfbastien added the question label Jun 30, 2015

jfbastien added this to the MVP milestone Jun 30, 2015

lukewagner mentioned this issue Jul 4, 2015

Eliminate Expression/Statement distinction in AST? #223

Closed

sunfishcode mentioned this issue Jul 21, 2015

Define instruction set #187

Closed

kripken added blocks_binary and removed blocks_binary labels Jul 28, 2015

sunfishcode added the binary format label Aug 4, 2015

jfbastien mentioned this issue Aug 13, 2015

More rationale on the choice of pre-order and control structures in the binary format #295

Closed

sunfishcode removed the question label Jul 12, 2016

jfbastien closed this as completed Jul 26, 2016

Design goals: fast interpreter, polyfill ease, and preorder versus postorder encoding #239

Design goals: fast interpreter, polyfill ease, and preorder versus postorder encoding #239

Comments

jfbastien commented Jun 30, 2015

kripken commented Jun 30, 2015

sunfishcode commented Jun 30, 2015

MikeHolman commented Jun 30, 2015

titzer commented Jun 30, 2015

kripken commented Jun 30, 2015

jfbastien commented Jun 30, 2015

kripken commented Jun 30, 2015

kg commented Jun 30, 2015

titzer commented Jun 30, 2015

qwertie commented Jun 30, 2015

paleozogt commented Jul 1, 2015

MikeHolman commented Jul 2, 2015

kripken commented Jul 2, 2015

MikeHolman commented Jul 2, 2015

kripken commented Jul 2, 2015

kg commented Jul 2, 2015

MikeHolman commented Jul 2, 2015

kg commented Jul 3, 2015

lukewagner commented Jul 4, 2015

titzer commented Jul 4, 2015

lukewagner commented Jul 6, 2015

LouisLaf commented Jul 6, 2015

titzer commented Jul 7, 2015

lukewagner commented Jul 7, 2015

titzer commented Jul 7, 2015

titzer commented Jul 7, 2015

kg commented Jul 7, 2015

MikeHolman commented Jul 7, 2015

lukewagner commented Jul 8, 2015

rossberg commented Jul 8, 2015

titzer commented Jul 8, 2015

creationix commented Jul 8, 2015

creationix commented Jul 8, 2015

lukewagner commented Jul 8, 2015

kripken commented Jul 8, 2015

bnjbvr commented Jul 10, 2015

creationix commented Jul 10, 2015

jfbastien commented Jul 26, 2016