Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stacktop #360

Closed
kg opened this issue Sep 18, 2015 · 14 comments
Closed

Stacktop #360

kg opened this issue Sep 18, 2015 · 14 comments

Comments

@kg
Copy link
Contributor

kg commented Sep 18, 2015

Now that global vars are gone we need a simple way to track things like the current top of the stack. I think we should do this with a special opcodes or host imports, i.e.

(import $get_stacktop "runtime" "get_stacktop")
(import $set_stacktop "runtime" "set_stacktop")

We probably will want to have a similar register for the current thread ID once we have threads (this will make it easier for applications to do TLS, etc).

Solving this in user-space is a bad idea, I think.

@AndrewScheidecker
Copy link

From looking at Emscripten code converted to WASM, there's an awful lot of code like this:

(set_local $local15 (load_global $_STACKTOP))
(store_global $_STACKTOP (i32.add (load_global $_STACKTOP) (i32.const 32)))
...
(store_global $_STACKTOP (get_local $local15))
(return (get_local $local2))

I think an alloca operation that is implicitly popped on return is all you need:

(set_local $local15 (alloca (i32.const 32)))
...
(return (get_local $local2))

@kg
Copy link
Contributor Author

kg commented Sep 18, 2015

Alloca would be perfect. That's all I want.

@sunfishcode
Copy link
Member

The current thinking from #154 and related conversations is that when we add threads, we'll add thread-local unaliased globals, which are expected to be well suited for these tasks.

In the brief time before we add threads, storing STACKTOP in linear memory is not very problematic since there's only one thread. And, this saves us from having to add new features to the language. Without threads yet, it's difficult to make sure that features like this that we add now will interact well with threads when we do add them.

@kg
Copy link
Contributor Author

kg commented Sep 18, 2015

User-space STACKTOP is dangerous because it's a fundamental ABI primitive. Whatever becomes convention in the MVP era, we'll be stuck with forever.

@sunfishcode
Copy link
Member

We're not planning to have a stable ABI until we add dynamc linking.

@sunfishcode
Copy link
Member

... stable library ABI, that is. wasm itself will be a stable format, but an MVP-era wasm (static) library may not be able to be linked with a dynamic-linking-era wasm libary.

@sunfishcode
Copy link
Member

Links to discussions of ABIs: for C and C++ and dynamic linking in general.

@kg
Copy link
Contributor Author

kg commented Sep 18, 2015

Are we really okay with every function body containing duplicated logic to set/restore a custom STACKTOP variable (at entrance, and every exit point)? Do we think C-style stack is that uncommon? If we specced alloca and a simple convention for where the stack top is stored, we'd be stripping a lot of duplicated code out of function ASTs and making it easier to evolve existing code as we move forward.

I don't think there's any argument to be made that STACKTOP's definition interacts with threads poorly, either.

@jfbastien
Copy link
Member

I recommend reading the following:

The research has done very similar things to what we're discussing: shadow stack (wasm's locals), on-heap stack for address-taken things (wasm's stack top). They have numbers on how much of each exists in real-world code, and numbers on what that costs at runtime.

@ghost
Copy link

ghost commented Sep 18, 2015

@sunfishcode Would it cause any problems to add TLS to the MVP for this use? MVP is single-threaded so the polyfill could implement these as asm.js globals? This might save reworking code generators in the interim.

@jfbastien
Copy link
Member

(@kg and I discussed this in person)

I don't think it's worth having alloca at this point in time: it's purely a tentative size win on .wasm file size, and a small simplification for code generators. It doesn't help generated code (e.g. x86 generation is the same), and it's any faster. It has spec problem (can I alloca in a loop? Or just at function start?).

It may be a nice size win, but I think we want data before we spec it in.

@sunfishcode
Copy link
Member

@JSStats That was originally my preference as well, however there's concern that if wasm has thread-local variables before it has threads, they may not be designed right, and it could cause problems when threads are added. It seems like the risk of trouble ought to be low, but on the other hand, managing STACKTOP is believed to be only a minor inconvenience for code generators.

@AndrewScheidecker
Copy link

I concur that just putting STACKTOP in the instance memory is good enough for the MVP.

(can I alloca in a loop? Or just at function start?)

Yes, I was thinking something the same as LLVM's or C's alloca. If the function uses alloca, the runtime must implicitly save the user SP at entry, and restore it before exiting the function. The alloca operation itself would simply bump the user SP by the desired amount and yield the previous value.

There are two components to the alloca proposal:

  1. Does the user code control its stack pointer, or is it some intrinsic state of the runtime?
  2. If it's an intrinsic state of the runtime, is alloca the best way to provide access to it?

The argument for letting the runtime control the stack pointer is that it wouldn't require any modifications to extend to support threading, dynamic linking, exceptions, or a cross-language ABI. The argument against is that it should eventually be possible to reproduce the functionality with a thread-local, imported global variable.

I don't have a clear argument for alloca. It's more constrained than giving the user arbitrary control of the stack pointer, which could be bad. However, it's more constrained in a way that possibly makes it easier to implement stack unwinding and related features (exceptions? debugging? GC?).

@lukewagner
Copy link
Member

The reason that we're seeing the load of globals into locals in the Emscripten-generated code is because global loads in general (when not eliminated by GVN) require real loads from memory; storing the global in a local makes it easy for even a simple compiler to store stacktop in a register for the entire duration of the function call. Note that having explicit globals doesn't fix this since globals can be mutated by calls just as well as memory. Thus replacing the get_globals by load_memorys (and having the compiler try to maximally hoist these) seems totally natural and dumb-compiler friendly.

Agreed that where we want to go is unaliased TLS variables that can literally be stored in the engine's TLS array and, for dynamic linking, this would be solved by an explicit named sharing. In the meantime, where all we support is static-linking/single-threading, I agree with above comments that a constant heap address if Good Enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants