Stacktop #360

kg · 2015-09-18T21:05:39Z

Now that global vars are gone we need a simple way to track things like the current top of the stack. I think we should do this with a special opcodes or host imports, i.e.

(import $get_stacktop "runtime" "get_stacktop")
(import $set_stacktop "runtime" "set_stacktop")

We probably will want to have a similar register for the current thread ID once we have threads (this will make it easier for applications to do TLS, etc).

Solving this in user-space is a bad idea, I think.

The text was updated successfully, but these errors were encountered:

AndrewScheidecker · 2015-09-18T21:22:18Z

From looking at Emscripten code converted to WASM, there's an awful lot of code like this:

(set_local $local15 (load_global $_STACKTOP))
(store_global $_STACKTOP (i32.add (load_global $_STACKTOP) (i32.const 32)))
...
(store_global $_STACKTOP (get_local $local15))
(return (get_local $local2))

I think an alloca operation that is implicitly popped on return is all you need:

(set_local $local15 (alloca (i32.const 32)))
...
(return (get_local $local2))

kg · 2015-09-18T21:27:28Z

Alloca would be perfect. That's all I want.

sunfishcode · 2015-09-18T21:28:43Z

The current thinking from #154 and related conversations is that when we add threads, we'll add thread-local unaliased globals, which are expected to be well suited for these tasks.

In the brief time before we add threads, storing STACKTOP in linear memory is not very problematic since there's only one thread. And, this saves us from having to add new features to the language. Without threads yet, it's difficult to make sure that features like this that we add now will interact well with threads when we do add them.

kg · 2015-09-18T21:29:27Z

User-space STACKTOP is dangerous because it's a fundamental ABI primitive. Whatever becomes convention in the MVP era, we'll be stuck with forever.

sunfishcode · 2015-09-18T21:31:22Z

We're not planning to have a stable ABI until we add dynamc linking.

sunfishcode · 2015-09-18T21:33:07Z

... stable library ABI, that is. wasm itself will be a stable format, but an MVP-era wasm (static) library may not be able to be linked with a dynamic-linking-era wasm libary.

sunfishcode · 2015-09-18T21:38:04Z

Links to discussions of ABIs: for C and C++ and dynamic linking in general.

kg · 2015-09-18T22:37:12Z

Are we really okay with every function body containing duplicated logic to set/restore a custom STACKTOP variable (at entrance, and every exit point)? Do we think C-style stack is that uncommon? If we specced alloca and a simple convention for where the stack top is stored, we'd be stripping a lot of duplicated code out of function ASTs and making it easier to evolve existing code as we move forward.

I don't think there's any argument to be made that STACKTOP's definition interacts with threads poorly, either.

jfbastien · 2015-09-18T22:45:04Z

I recommend reading the following:

The research has done very similar things to what we're discussing: shadow stack (wasm's locals), on-heap stack for address-taken things (wasm's stack top). They have numbers on how much of each exists in real-world code, and numbers on what that costs at runtime.

ghost · 2015-09-18T22:49:16Z

@sunfishcode Would it cause any problems to add TLS to the MVP for this use? MVP is single-threaded so the polyfill could implement these as asm.js globals? This might save reworking code generators in the interim.

jfbastien · 2015-09-18T22:54:38Z

(@kg and I discussed this in person)

I don't think it's worth having alloca at this point in time: it's purely a tentative size win on .wasm file size, and a small simplification for code generators. It doesn't help generated code (e.g. x86 generation is the same), and it's any faster. It has spec problem (can I alloca in a loop? Or just at function start?).

It may be a nice size win, but I think we want data before we spec it in.

sunfishcode · 2015-09-18T23:26:32Z

@JSStats That was originally my preference as well, however there's concern that if wasm has thread-local variables before it has threads, they may not be designed right, and it could cause problems when threads are added. It seems like the risk of trouble ought to be low, but on the other hand, managing STACKTOP is believed to be only a minor inconvenience for code generators.

AndrewScheidecker · 2015-09-18T23:55:00Z

I concur that just putting STACKTOP in the instance memory is good enough for the MVP.

(can I alloca in a loop? Or just at function start?)

Yes, I was thinking something the same as LLVM's or C's alloca. If the function uses alloca, the runtime must implicitly save the user SP at entry, and restore it before exiting the function. The alloca operation itself would simply bump the user SP by the desired amount and yield the previous value.

There are two components to the alloca proposal:

Does the user code control its stack pointer, or is it some intrinsic state of the runtime?
If it's an intrinsic state of the runtime, is alloca the best way to provide access to it?

The argument for letting the runtime control the stack pointer is that it wouldn't require any modifications to extend to support threading, dynamic linking, exceptions, or a cross-language ABI. The argument against is that it should eventually be possible to reproduce the functionality with a thread-local, imported global variable.

I don't have a clear argument for alloca. It's more constrained than giving the user arbitrary control of the stack pointer, which could be bad. However, it's more constrained in a way that possibly makes it easier to implement stack unwinding and related features (exceptions? debugging? GC?).

lukewagner · 2015-09-22T12:57:56Z

The reason that we're seeing the load of globals into locals in the Emscripten-generated code is because global loads in general (when not eliminated by GVN) require real loads from memory; storing the global in a local makes it easy for even a simple compiler to store stacktop in a register for the entire duration of the function call. Note that having explicit globals doesn't fix this since globals can be mutated by calls just as well as memory. Thus replacing the get_globals by load_memorys (and having the compiler try to maximally hoist these) seems totally natural and dumb-compiler friendly.

Agreed that where we want to go is unaliased TLS variables that can literally be stored in the engine's TLS array and, for dynamic linking, this would be solved by an explicit named sharing. In the meantime, where all we support is static-linking/single-threading, I agree with above comments that a constant heap address if Good Enough.

lukewagner closed this as completed Oct 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stacktop #360

Stacktop #360

kg commented Sep 18, 2015

AndrewScheidecker commented Sep 18, 2015

kg commented Sep 18, 2015

sunfishcode commented Sep 18, 2015

kg commented Sep 18, 2015

sunfishcode commented Sep 18, 2015

sunfishcode commented Sep 18, 2015

sunfishcode commented Sep 18, 2015

kg commented Sep 18, 2015

jfbastien commented Sep 18, 2015

ghost commented Sep 18, 2015

jfbastien commented Sep 18, 2015

sunfishcode commented Sep 18, 2015

AndrewScheidecker commented Sep 18, 2015

lukewagner commented Sep 22, 2015

Stacktop #360

Stacktop #360

Comments

kg commented Sep 18, 2015

AndrewScheidecker commented Sep 18, 2015

kg commented Sep 18, 2015

sunfishcode commented Sep 18, 2015

kg commented Sep 18, 2015

sunfishcode commented Sep 18, 2015

sunfishcode commented Sep 18, 2015

sunfishcode commented Sep 18, 2015

kg commented Sep 18, 2015

jfbastien commented Sep 18, 2015

ghost commented Sep 18, 2015

jfbastien commented Sep 18, 2015

sunfishcode commented Sep 18, 2015

AndrewScheidecker commented Sep 18, 2015

lukewagner commented Sep 22, 2015