-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a toolchain-independent ABI document, and propose _initialize
#203
base: main
Are you sure you want to change the base?
Add a toolchain-independent ABI document, and propose _initialize
#203
Conversation
The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should. WASI has [defined a convention] using an exported function named `_initialize`, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way. To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the `_initialize` function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed. Related activities ------------------ In the component model, there is a proposal to add a [second initialization phase]. If that's done, then component-model toolchains could arrange for this `_initialize` function to be called automatically by this second initialization mechanism. Considered alternatives ----------------------- It is tempting to use the [Wasm start function] for C++ constructors; this has been [extensively discussed], and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports. It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step. Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use that name instead of `_initialize`, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use `_initialize` as the exported function, which may call `__wasm_call_ctors` in its body. Process ------- We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process: - I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR. - After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility. - If the CG is in favor of it, then I'd propose we merge this PR. [defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md [second initialization phase]: WebAssembly/component-model#146 [Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start [extensively discussed]: WebAssembly/design#1160 [`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section
I am in favor of this proposal. As a separate process question, is there a list of producers that need to be updated? It may help to communicate early with those toolchains. It might also help if this proposal encourages toolchains to link back to this PR in their issue/PR to make it easier to determine when support is added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with all of this.
Perhaps we can come up with a better title than "Toolchain-independent ABI"... by definition aren't all the things in this repo (e.g. BasicCABI and Linker etc) toolchain independent. Maybe BasicModuleABI?
Great issue description! That background content is almost worth more than the actual document. |
@ricochet I think many languages have a potential use for this. But in terms of things that need to be updated, I think the bigger coordination concern here is with embedders. In some cases we'll need engines to call @sbc100 BasicModuleABI works for me. Renamed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting question about the process. Most of the stuff in tool-conventions is here specifically to avoid the need for full CG-level process. I guess this API reflects a larger level of cross-toolchain interoperation than even what we have here already.
Co-authored-by: Derek Schuff <dschuff@chromium.org>
Add an agenda item for discussing the proposed `_initialize` convention in tool-conventions. - This would add a new tool-conventions document named BasicModuleABI.md which would define the `_initialize` convention. - See WebAssembly/tool-conventions#203 for details. I'm following the process described at the bottom of [this comment]. I'm open to alternative approaches to solving the problem. [this comment]: WebAssembly/tool-conventions#203 (comment)
I've incorporated various bits of feedback, and the discussion here now seems to have settled. Following the process I described above, I've now filed an agenda item to discuss this with the CG: WebAssembly/meetings#1253 |
Add an agenda item for discussing the proposed `_initialize` convention in tool-conventions. - This would add a new tool-conventions document named BasicModuleABI.md which would define the `_initialize` convention. - See WebAssembly/tool-conventions#203 for details. I'm following the process described at the bottom of [this comment]. I'm open to alternative approaches to solving the problem. [this comment]: WebAssembly/tool-conventions#203 (comment)
What if the toolchain injected a check for initialization status into exports? AFAICT that would make it impossible to misuse a module by not calling its initialization, as that would be done lazily. The code would be roughly:
I count about 6-7 bytes to wrap each exported function. The dynamic cost would be 3 instructions: If we wanted to reduce either of those costs, another option would be to add an explicit dependency to the export mechanism. Roughly, to call this export, this other one-shot function must be called. And the engine could do that lazily, or it could trap if that function hasn't called, etc. |
I think this generally makes sense, and I don't have any better ideas, but I want to stress what I see as the downside here: If this is adopted by some VMs and not others, then wasm files will become less portable. Specifically, someone might notice that the same wasm file runs in one way in, say, wasmtime and wasm3 (which call Now, maybe that's fine and expected. But I wonder if we can do a little better than leave it as a surprising thing for people to run into. Some vague thoughts:
I'm not sure either of those is a good idea! Just some thoughts. Put another way, I think it would be good if we had a clear answer for someone that builds a new wasm VM tomorrow and asks themselves, "should I call |
Maybe we could add a non-normative note to the Core and/or JS specs mentioning this toolchain convention and pointing out that just core spec initialization may not be enough to make the module usable as intended. |
If a module exports a function named `_initialize` with no arguments and no | ||
return values, and does not export a function named `_start`, the toolchain | ||
that produced my assume that on any instance of the module, this `_initialize` | ||
function is called before any other exports are accessed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the relationship with similar functions like __wasm_call_ctors?
eg. which one to call if multiple of them are exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, and indeed, it may be too late to try to design anything, to avoid breaking existing setups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _initialize
and _start
functions are in charge of calling __wasm_call_ctors
.
I see __wasm_call_ctors
as an llvm-internal function which is not designed to be called directly by users but instead by higher level toolchain functions like _start
and _initialize
(or other crt1-type functions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. but there are existing conventions to call exported __wasm_call_ctors
.
eg. https://github.com/emscripten-core/emscripten/blob/bec42dac7873903d09d713963e34020c22a8bd2d/src/library_dylink.js#L846
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that is because emscripten does it own thing here and basically implements _start
/ _initialize
in JavaScript. I wouldn't worry about this case since emscripten only ever loads modules that is builds itself.. we may convert to using a more standards-based approach in the future, but for now emscripten-built wasm modules are fairly specific to emscripten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, the standard-based way to go is to export either _start
or _initialize
. Exporting and directly calling main
and/or __wasm_call_ctors
(as emscripten does) is non-standard.
It would also be non-standard/undefined, for example, to call __wasm_call_ctors
directly from user C/C++ code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, the standard thing suggested here is to ignore __wasm_call_ctors
__post_instantiate
etc even if they are exported, and only cares _initialize
? (besides that, _start
in case of wasi)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make sense to me yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. it makes sense to me too.
Per WebAssembly/tool-conventions#203 and recent LLVM and wasi-libc changes, libraries may export an `_initialize` function instead of `__wasm_call_ctors`, and we should call whichever one we find. Note that we expect to find at most one of the two, and will return an error if both are found. Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Per WebAssembly/tool-conventions#203 and recent LLVM and wasi-libc changes, libraries may export an `_initialize` function instead of `__wasm_call_ctors`, and we should call whichever one we find. Note that we expect to find at most one of the two, and will return an error if both are found. Signed-off-by: Joel Dice <joel.dice@fermyon.com>
The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should.
WASI has defined a convention using an exported function named
_initialize
, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way.To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the
_initialize
function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed.Related activities
In the component model, there is a proposal to add a second initialization phase. If that's done, then component-model toolchains could arrange for this
_initialize
function to be called automatically by this second initialization mechanism.Considered alternatives
It is tempting to use the Wasm start function for C++ constructors; this has been extensively discussed, and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports.
It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step.
Wasm-ld has a
__wasm_call_ctors
function, and in theory we could use that name instead of_initialize
, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use_initialize
as the exported function, which may call__wasm_call_ctors
in its body.Process
We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process:
I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR.
After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility.
If the CG is in favor of it, then I'd propose we merge this PR.