[Wasm GC] wasm-ctor-eval: Handle cycles of data #5685

kripken · 2023-04-21T20:37:59Z

A cycle of data is something we can't just naively emit as wasm globals. If
at runtime we end up, for example, with an object A that refers to itself,
then we can't just emit

(global $A
  (struct.new $A
    (global.get $A)))

The struct.get is of this very global, and such a self-reference is invalid. So
we need to break such cycles as we emit them. The simple idea used here
is to find paths in the cycle that are nullable and mutable, and replace the
initial value with a null that is fixed up later in the start function:

(global $A
  (struct.new $A
    (ref.null $A)))

(func $start
  (struct.set
    (global.get $A)
    (global.get $A)))
)

This was left as a TODO, but it turns out it is common in e.g. j2wasm output.

Co-authored-by: Thomas Lively <tlively@google.com>

tlively · 2023-05-02T21:29:58Z

Sorry for the review delay here. This is still on my list of things to get to. The code LGTM fwiw, and I just have to look over the large test file.

tlively

Thanks for your patience with the back-and-forth here. TODO comment looks good, and going with the simplest possible algorithm seems reasonable for now.

Another thing we might want to consider at some point is that all these globals will root objects that might otherwise be garbage collected. It would be possible in principle to avoid creating new globals at all, but it would be a code size and performance trade off because the start function would have to traverse the original globals to find the patch points.

tlively · 2023-05-05T15:55:29Z

test/lit/ctor-eval/partial-global.wat

+  (nop)
+  (loop
+   (global.set $global
+    (i32.const 999) ;; this must not be applied to the global!


It seems like it would be valid to apply 999 to the global, wouldn't it?

Yes, I guess in this simple module it isn't observable. But in general that would be dangerous. I'm adding this comment now:

;; (It is true that in this simple module it would be ok to set 999 to the ;; global, but if the global were exported for example then that would not ;; be the case, nor would it be the case if the code did $global = $global + 1 ;; or such. That is, since the global.set is not evalled away, its effects ;; must not be applied; we do both atomically or neither, so that the ;; global.set's execution only happens once.)

tlively · 2023-05-05T16:01:06Z

test/lit/ctor-eval/gc-cycle.wast

+;; CHECK-NEXT:  )
+;; CHECK-NEXT: )
+
+;; CHECK:      (func $0_3 (type $none_=>_none)


I believe you wrote before that naming the function $test won't make wasm-ctor-eval preserve its name. Why does function test get renamed to 0_3? Shouldn't wasm-ctor-eval be able to preserve the function names?

It can at least keep the base name, I guess, even if it adds a prefix... I added base names now.

It adds a prefix because in general an export might have another use. So we copy the function, modify that, and make the export use that one (so that other uses still use the original, unmodified function). As a result the export points to a function with a different name than before.

kripken · 2023-05-05T17:42:01Z

Another thing we might want to consider at some point is that all these globals will root objects that might otherwise be garbage collected.

Hmm, we can add globals for objects as we go, that are later GC'd in later execution. But when we get to that later execution we'd be ok, I think. It's true the globals would remain, but globaldce will remove them, as they have no uses. The one thing I was worried about as I read your comment was the start function that we add to, but we clear it each time, so I think that works out too?

tlively · 2023-05-05T19:32:45Z

I don't understand how all references to the new globals will be optimized out. Surely we can't optimize out the uses we add to the start function?

kripken · 2023-05-05T19:43:23Z

We clear those out each time we apply the state. Imagine we have this:

function foo() {
  var x = { y: 10, z: null };
  x.z = x;
  // time A
  log(x.y);
  // time B
  return 10;
}

After we eval up to "time A" the program will look like this:

var global$x = { y: 10, z: null };

function start() {
  global$x.z = global$x;
}

function foo() {
  var x = global$x;
  log(x.y);
  // time B
  return 10;
}

When we get to "time B" we have this:

var global$x = { y: 10, z: null };

function start() {
}

function foo() {
  return 10;
}

Note that we cleared out the start function. We clear it out in each step and then re-fill it based on the state currently alive, but at this time x (or the allocation written to x inside foo) no longer lives.

A subtle point is that the global is not enough to keep things alive, since ctor-eval evals globals and then evals functions (the same as wasm modules do when they are initialized), so when it works on foo it has already stopped working on globals - so it does not scan global$x for live state. That is, "defining globals" are globals that are added later and cannot, by themselves, keep things alive.

tlively · 2023-05-05T21:50:51Z

I still must be misunderstanding fundamental. Where did the call to log go in the above example?

kripken · 2023-05-05T21:54:59Z

Imagine it's evalled out, that is, it's a none-typed expression and after we eval it it might alter some global state, but that is global state that we can serialize, so we remove that expression, apply that global state, and continue to eval onwards. (Sorry, "log" is maybe a bad name here since logging usually goes outside the module; this could have been "use".)

tlively · 2023-05-05T22:01:19Z

But it's the global state that gets serialized that I'm worried about forming new roots. Are you saying that when wasm-ctor-eval finishes, all the global state that has been serialized has been removed? In that case, why do we serialize any of the global state at all?

kripken · 2023-05-05T22:19:22Z

We serialize the state that is still needed. The program needs to execute as if we didn't run wasm-ctor-eval. So if some reference to an object exists, we have to keep that around. But if not, then we don't.

Imagine we have this:

function foo() {
  var x = { a: 10, b: null };
  var y = { a: 20, b: null };
  x.b = x;
  y.b = y;
  // .. use x and y in some computations, but do not capture refs to them ..
  return y;
}

When we eval up to the comment (including), we'd end up with this:

var global$y = { a: 10, b: null };

start {
  global$y.b = global$y;
}

function foo() {
  return global$y;
}

We ended up serializing one object, y, but not the other, x. Nothing remaining keeps x alive, so it's gone. (It is true that global$x existed temporarily during the eval stages. But once no uses remain, we no longer emit code in start for it, and the global is removed by globaldce.)

At least, that's how it should work. I'm not 100% sure current limitations allow it to succeed perfectly, there are some hacks there. But if not it should be fixable.

tlively · 2023-05-05T22:47:32Z

Aha, so in that case y would be rooted in a global for the life of the instance, whereas in the original program it might have had a much shorter lifetime.

kripken · 2023-05-05T22:53:31Z

Oh right, that's true. ctor-eval can remove y later during evalling if that works out, but if not then it is true that the final wasm defines y in a global, which the VM probably keeps alive forever. So this might be an issue in VMs but not in ctor-eval.

If it turns out to be a serious issue in VMs then we could work on an optimization pass to "localize" globals...

tlively · 2023-05-06T01:08:47Z

Ok yes, this was the problem I was trying to describe. I think it would be very difficult to try to localize globals, but it would be possible to change the back-patching logic for cycles to at least avoid creating additional globals used just for back-patching.

kripken · 2023-05-06T01:14:14Z

We don't create any new globals just for back-patching, though - we define all GC allocations in globals anyhow. That's the simple thing to do so that they can be used anywhere they are needed. Figuring out which GC allocations don't need to be in globals seems as hard as localizing globals, at first thought?

tlively · 2023-05-06T01:20:44Z

No, it should be easier because wasm-ctor-eval knows what the original globals are, and it can construct their values as trees of allocation operations in constant expressions combined with tree-traversing back patching logic in the start function for cycles. That doesn't require any kind of global dataflow analysis that a general localization optimization would require.

kripken · 2023-05-08T22:46:55Z

Ah, now I see what you mean @tlively . Yes, it might make sense to try to avoid adding globals here because of that.

A cycle of data is something we can't just naively emit as wasm globals. If at runtime we end up, for example, with an object A that refers to itself, then we can't just emit (global $A (struct.new $A (global.get $A))) The struct.get is of this very global, and such a self-reference is invalid. So we need to break such cycles as we emit them. The simple idea used here is to find paths in the cycle that are nullable and mutable, and replace the initial value with a null that is fixed up later in the start function: (global $A (struct.new $A (ref.null $A))) (func $start (struct.set (global.get $A) (global.get $A))) ) This is not optimal in terms of breaking cycles, but it is fast (linear time) and simple, and does well in practice on j2wasm (where cycles in fact occur).

kripken added 30 commits April 18, 2023 16:15

test

867d357

work

1f0885b

work

d5d4231

builds

ab09cdf

nice

16ce3a3

work

0eb6fae

format

44d60c8

work

be86f1b

work

ecccf26

work

e7a59bf

test

fb6021e

comment

66e0c0e

forewach

2ad9604

totally broken

d483576

work around gcc bug

9aa06b2

test

fc69e8c

test

2b2c3f9

failz

e1c96b5

chak

86f817d

test

95b1ad9

a new dawn

8e54a32

struggle

467ff28

buildz

94a5fdd

hmm

1758483

work

df82f93

fix

16b6d47

fix

db23b4f

fix

865d6c0

fix

a4f2817

fix

36f4b58

kripken and others added 8 commits April 28, 2023 11:33

format

9ca4d67

comments

5171fac

add a note on doing better

c09871d

rename

57127f0

comment

05cb4b2

Update src/tools/wasm-ctor-eval.cpp

d484ba6

Co-authored-by: Thomas Lively <tlively@google.com>

Merge remote-tracking branch 'origin/eval.gc.cycle' into eval.gc.cycle

e004db2

name keepalive

a0be923

tlively approved these changes May 5, 2023

View reviewed changes

kripken added 3 commits May 5, 2023 10:42

Merge remote-tracking branch 'origin/main' into eval.gc.cycle

dfff6a7

clarify

651cd35

use base names

3bec18d

kripken merged commit 6086df0 into main May 5, 2023

kripken deleted the eval.gc.cycle branch May 5, 2023 19:46

kripken mentioned this pull request Sep 22, 2024

What are the limitations of wasm-ctor-eval? #6964

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Wasm GC] wasm-ctor-eval: Handle cycles of data #5685

[Wasm GC] wasm-ctor-eval: Handle cycles of data #5685

kripken commented Apr 21, 2023

tlively commented May 2, 2023

tlively left a comment

tlively May 5, 2023

kripken May 5, 2023

tlively May 5, 2023

kripken May 5, 2023

kripken commented May 5, 2023

tlively commented May 5, 2023

kripken commented May 5, 2023

tlively commented May 5, 2023

kripken commented May 5, 2023

tlively commented May 5, 2023

kripken commented May 5, 2023

tlively commented May 5, 2023

kripken commented May 5, 2023

tlively commented May 6, 2023

kripken commented May 6, 2023

tlively commented May 6, 2023

kripken commented May 8, 2023

[Wasm GC] wasm-ctor-eval: Handle cycles of data #5685

[Wasm GC] wasm-ctor-eval: Handle cycles of data #5685

Conversation

kripken commented Apr 21, 2023

tlively commented May 2, 2023

tlively left a comment

Choose a reason for hiding this comment

tlively May 5, 2023

Choose a reason for hiding this comment

kripken May 5, 2023

Choose a reason for hiding this comment

tlively May 5, 2023

Choose a reason for hiding this comment

kripken May 5, 2023

Choose a reason for hiding this comment

kripken commented May 5, 2023

tlively commented May 5, 2023

kripken commented May 5, 2023

tlively commented May 5, 2023

kripken commented May 5, 2023

tlively commented May 5, 2023

kripken commented May 5, 2023

tlively commented May 5, 2023

kripken commented May 5, 2023

tlively commented May 6, 2023

kripken commented May 6, 2023

tlively commented May 6, 2023

kripken commented May 8, 2023