Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the limitations of wasm-ctor-eval? #6964

Open
TyOverby opened this issue Sep 22, 2024 · 9 comments
Open

What are the limitations of wasm-ctor-eval? #6964

TyOverby opened this issue Sep 22, 2024 · 9 comments

Comments

@TyOverby
Copy link
Contributor

I'm trying to understand what type of code I should expect to be fully evaluatable by wasm-ctor-eval, and which instructions it won't be able to evaluate. "Calls to imported functions" are the example of an unevaluatable instruction used in the readme, but are there other constructs that it cant evaluate (or values that it can't serialize back into the wasm file?)

If this information is already documented somewhere, then I apologize - I couldn't find it... If it just hasn't been written down though, I'd be happy to contribute docs after I learn enough to do so!

@kripken
Copy link
Member

kripken commented Sep 22, 2024

Off the top of my head, one missing feature is tracking table updates (table.set), and there are probably a few other specific TODOs we haven't gotten around to. But in general it should be able to eval anything that can be seen at compile time, even complex things like recursive GC values.

If you run into a limitation, please file an issue.

@TyOverby
Copy link
Contributor Author

TyOverby commented Sep 22, 2024

Thanks!

I started playing around with wasm-ctor-eval and found that the first thing I hit was calling an imported function for performing pointer equality testing, and I suspect that if I got far enough, I'll be missing the math imports.

Wasm_of_ocaml has a pretty long module initialization phase at startup, and it would be very useful if wasm-ctor-eval could be used to eliminate much of it. Do you think that there's a future where more imported functions can be simulated inside of wasm-ctor-eval, like what was done for environment variables, stdin, and command line parameters?

@kripken
Copy link
Member

kripken commented Sep 23, 2024

@TyOverby Interesting question! Maybe we can find a good way to do that. I'd lean towards something modular, maybe using dynamic linking of native plugins, or loading wasm modules and using the internal wasm interpreter we already have here. (For use cases like yours where the imports are JS, something like QuickJS could be used, compiled natively or into wasm.)

That should work well for math functions, but for pointer equality testing it would require deeper integration, which might be complex. For that, it seems like a less-modular approach of adding code in binaryen itself could make sense. I'm not necessarily opposed to that, if we can find a modular way to do it.

But, can't you do ref.eq inside the wasm, for pointer equality? (I'm not familiar with your compiler, sorry.)

@TyOverby
Copy link
Contributor Author

TyOverby commented Sep 25, 2024

But, can't you do ref.eq inside the wasm, for pointer equality? (I'm not familiar with your compiler, sorry.)

I'm trying to remember why that doesn't work for us; maybe it has something to do with GC functions? Are they comparable?

@kripken
Copy link
Member

kripken commented Sep 25, 2024

Ah, right, functions are the exception. You can compare struct and array references, but not function references.

I guess you do need function reference equality? If so, it might be more efficient to box function references in tiny structs, where there is a 1:1 mapping between the wrapper structs and the functions. Equality checks are then just equality checks on the wrapper structs, and calling the functions costs just an extra struct load. The overhead of going through JS would be massive in comparison to that (especially since it will create JS wrappers around the wasm functions).

@vouillon
Copy link
Contributor

vouillon commented Oct 1, 2024

We have to deal with a very large code base, written by many people. In this code base, the OCaml equality may be used to compare JavaScript objects, which was working fine when compiling to JavaScript. Since we are boxing JavaScript, the physical equality ref.eq will typically return false when comparing them even when the JavaScript strict equally would return true. If we were using ref.eq, the code may thus contain some bugs that are hard to track, since they do not result in failures at compile time, nor traps at runtime. So we are actually comparing values using ref.eq by default, but when we have two boxed JavaScript objects, we call a JavaScript function (x,y)=>x===y.

I'm a bit surprise that this function is the first imported item encountered, though. Where would the JavaScript objects come from?

@kripken
Copy link
Member

kripken commented Oct 1, 2024

Can you not ensure a 1:1 mapping of boxes to JS objects? One way is to keep a reference on the JS object to the wasm box, so that you never create another box for it (that is, a "make box" function would check if there is already a box for that JS object, and use it if so).

Once you have a 1:1 mapping then I don't see how this would be a problem:

the physical equality ref.eq will typically return false when comparing them even when the JavaScript strict equally would return true.

With 1:1 mapping, ref.eq would return true if and only if the two objects are the same.

@vouillon
Copy link
Contributor

I'm not sure how we could implement this without leaking memory, since JavaScript weak maps do not work with primitive objects such as strings and numbers.
Also, I don't know what the performance impact of using such as map would be. Boxing JavaScript is quite cheap. And in the common case, to implement the equality operator, we just add to type checks, which are fast.

@kripken
Copy link
Member

kripken commented Oct 14, 2024

I'm not sure how we could implement this without leaking memory, since JavaScript weak maps do not work with primitive objects such as strings and numbers.

To make sure we are on the same page, here is what I am imagining in more detail:

;; Type for a function wrapper.
(type $wrapper (struct (ref func)))

;; A wrapper for a function $foo.
(global $foo-wrapper (ref $wrapper) (struct.new $wrapper (ref.func $foo)))

;; Every place the compiler would normally emit `(ref.func $foo)` it instead emits this:
(global.get $foo-wrapper)

;; Every place the compiler would normally emit `(ref func)` it emits `(ref $wrapper)`

;; Comparison is then simple: we compare the 1:1 wrappers.
(func $compare-funcs (param $x (ref $wrapper)) (param $y (ref $wrapper)) (result i32)
  (ref.eq (local.get $x) (local.get $y))
)

;; Calling is a slightly slower, `(call_ref ..)` is replaced by
(call_ref (struct.get $wrapper 0 ..))

;; Helper for JS, wrap an arbitrary function
(func $wrap-js-func (export "wrap_js_func") (param $js (ref func)) (result $wrapper)
  (struct.new $wrapper (local.get $js))
)

And for JS,

function makeWrapper(func) {
  if (!func.wrapper || !func.wrapper.deref()) {
    // No existing wrapper: make a new one. By stashing it on the object, we will
    // always use the same wrapper for this JS object, allowing ref.eq in wasm to
    // work properly, as there is a 1:1 mapping of functions to wrappers. We use
    // a WeakRef so that we do not keep the wasm object alive unnecessarily
    // (though this means we may end up freeing it and creating it again later).
    func.wrapper = new WeakRef(wasm.exports.wrap_js_func(func));
  }
 return func.wrapper.deref();
}

I don't think this can leak?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants