Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Wasm GC] wasm-ctor-eval: Handle cycles of data #5685

Merged
merged 97 commits into from
May 5, 2023
Merged

[Wasm GC] wasm-ctor-eval: Handle cycles of data #5685

merged 97 commits into from
May 5, 2023

Conversation

kripken
Copy link
Member

@kripken kripken commented Apr 21, 2023

A cycle of data is something we can't just naively emit as wasm globals. If
at runtime we end up, for example, with an object A that refers to itself,
then we can't just emit

(global $A
  (struct.new $A
    (global.get $A)))

The struct.get is of this very global, and such a self-reference is invalid. So
we need to break such cycles as we emit them. The simple idea used here
is to find paths in the cycle that are nullable and mutable, and replace the
initial value with a null that is fixed up later in the start function:

(global $A
  (struct.new $A
    (ref.null $A)))

(func $start
  (struct.set
    (global.get $A)
    (global.get $A)))
)

This was left as a TODO, but it turns out it is common in e.g. j2wasm output.

@tlively
Copy link
Member

tlively commented May 2, 2023

Sorry for the review delay here. This is still on my list of things to get to. The code LGTM fwiw, and I just have to look over the large test file.

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience with the back-and-forth here. TODO comment looks good, and going with the simplest possible algorithm seems reasonable for now.

Another thing we might want to consider at some point is that all these globals will root objects that might otherwise be garbage collected. It would be possible in principle to avoid creating new globals at all, but it would be a code size and performance trade off because the start function would have to traverse the original globals to find the patch points.

(nop)
(loop
(global.set $global
(i32.const 999) ;; this must not be applied to the global!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would be valid to apply 999 to the global, wouldn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess in this simple module it isn't observable. But in general that would be dangerous. I'm adding this comment now:

  ;; (It is true that in this simple module it would be ok to set 999 to the
  ;; global, but if the global were exported for example then that would not
  ;; be the case, nor would it be the case if the code did $global = $global + 1
  ;; or such. That is, since the global.set is not evalled away, its effects
  ;; must not be applied; we do both atomically or neither, so that the
  ;; global.set's execution only happens once.)

;; CHECK-NEXT: )
;; CHECK-NEXT: )

;; CHECK: (func $0_3 (type $none_=>_none)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you wrote before that naming the function $test won't make wasm-ctor-eval preserve its name. Why does function test get renamed to 0_3? Shouldn't wasm-ctor-eval be able to preserve the function names?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can at least keep the base name, I guess, even if it adds a prefix... I added base names now.

It adds a prefix because in general an export might have another use. So we copy the function, modify that, and make the export use that one (so that other uses still use the original, unmodified function). As a result the export points to a function with a different name than before.

@kripken
Copy link
Member Author

kripken commented May 5, 2023

Another thing we might want to consider at some point is that all these globals will root objects that might otherwise be garbage collected.

Hmm, we can add globals for objects as we go, that are later GC'd in later execution. But when we get to that later execution we'd be ok, I think. It's true the globals would remain, but globaldce will remove them, as they have no uses. The one thing I was worried about as I read your comment was the start function that we add to, but we clear it each time, so I think that works out too?

@tlively
Copy link
Member

tlively commented May 5, 2023

I don't understand how all references to the new globals will be optimized out. Surely we can't optimize out the uses we add to the start function?

@kripken
Copy link
Member Author

kripken commented May 5, 2023

We clear those out each time we apply the state. Imagine we have this:

function foo() {
  var x = { y: 10, z: null };
  x.z = x;
  // time A
  log(x.y);
  // time B
  return 10;
}

After we eval up to "time A" the program will look like this:

var global$x = { y: 10, z: null };

function start() {
  global$x.z = global$x;
}

function foo() {
  var x = global$x;
  log(x.y);
  // time B
  return 10;
}

When we get to "time B" we have this:

var global$x = { y: 10, z: null };

function start() {
}

function foo() {
  return 10;
}

Note that we cleared out the start function. We clear it out in each step and then re-fill it based on the state currently alive, but at this time x (or the allocation written to x inside foo) no longer lives.

A subtle point is that the global is not enough to keep things alive, since ctor-eval evals globals and then evals functions (the same as wasm modules do when they are initialized), so when it works on foo it has already stopped working on globals - so it does not scan global$x for live state. That is, "defining globals" are globals that are added later and cannot, by themselves, keep things alive.

@kripken kripken merged commit 6086df0 into main May 5, 2023
@kripken kripken deleted the eval.gc.cycle branch May 5, 2023 19:46
@tlively
Copy link
Member

tlively commented May 5, 2023

I still must be misunderstanding fundamental. Where did the call to log go in the above example?

@kripken
Copy link
Member Author

kripken commented May 5, 2023

Imagine it's evalled out, that is, it's a none-typed expression and after we eval it it might alter some global state, but that is global state that we can serialize, so we remove that expression, apply that global state, and continue to eval onwards. (Sorry, "log" is maybe a bad name here since logging usually goes outside the module; this could have been "use".)

@tlively
Copy link
Member

tlively commented May 5, 2023

But it's the global state that gets serialized that I'm worried about forming new roots. Are you saying that when wasm-ctor-eval finishes, all the global state that has been serialized has been removed? In that case, why do we serialize any of the global state at all?

@kripken
Copy link
Member Author

kripken commented May 5, 2023

We serialize the state that is still needed. The program needs to execute as if we didn't run wasm-ctor-eval. So if some reference to an object exists, we have to keep that around. But if not, then we don't.

Imagine we have this:

function foo() {
  var x = { a: 10, b: null };
  var y = { a: 20, b: null };
  x.b = x;
  y.b = y;
  // .. use x and y in some computations, but do not capture refs to them ..
  return y;
}

When we eval up to the comment (including), we'd end up with this:

var global$y = { a: 10, b: null };

start {
  global$y.b = global$y;
}

function foo() {
  return global$y;
}

We ended up serializing one object, y, but not the other, x. Nothing remaining keeps x alive, so it's gone. (It is true that global$x existed temporarily during the eval stages. But once no uses remain, we no longer emit code in start for it, and the global is removed by globaldce.)

At least, that's how it should work. I'm not 100% sure current limitations allow it to succeed perfectly, there are some hacks there. But if not it should be fixable.

@tlively
Copy link
Member

tlively commented May 5, 2023

Aha, so in that case y would be rooted in a global for the life of the instance, whereas in the original program it might have had a much shorter lifetime.

@kripken
Copy link
Member Author

kripken commented May 5, 2023

Oh right, that's true. ctor-eval can remove y later during evalling if that works out, but if not then it is true that the final wasm defines y in a global, which the VM probably keeps alive forever. So this might be an issue in VMs but not in ctor-eval.

If it turns out to be a serious issue in VMs then we could work on an optimization pass to "localize" globals...

@tlively
Copy link
Member

tlively commented May 6, 2023

Ok yes, this was the problem I was trying to describe. I think it would be very difficult to try to localize globals, but it would be possible to change the back-patching logic for cycles to at least avoid creating additional globals used just for back-patching.

@kripken
Copy link
Member Author

kripken commented May 6, 2023

We don't create any new globals just for back-patching, though - we define all GC allocations in globals anyhow. That's the simple thing to do so that they can be used anywhere they are needed. Figuring out which GC allocations don't need to be in globals seems as hard as localizing globals, at first thought?

@tlively
Copy link
Member

tlively commented May 6, 2023

No, it should be easier because wasm-ctor-eval knows what the original globals are, and it can construct their values as trees of allocation operations in constant expressions combined with tree-traversing back patching logic in the start function for cycles. That doesn't require any kind of global dataflow analysis that a general localization optimization would require.

@kripken
Copy link
Member Author

kripken commented May 8, 2023

Ah, now I see what you mean @tlively . Yes, it might make sense to try to avoid adding globals here because of that.

radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
A cycle of data is something we can't just naively emit as wasm globals. If
at runtime we end up, for example, with an object A that refers to itself,
then we can't just emit

(global $A
  (struct.new $A
    (global.get $A)))

The struct.get is of this very global, and such a self-reference is invalid. So
we need to break such cycles as we emit them. The simple idea used here
is to find paths in the cycle that are nullable and mutable, and replace the
initial value with a null that is fixed up later in the start function:

(global $A
  (struct.new $A
    (ref.null $A)))

(func $start
  (struct.set
    (global.get $A)
    (global.get $A)))
)

This is not optimal in terms of breaking cycles, but it is fast (linear time)
and simple, and does well in practice on j2wasm (where cycles in fact
occur).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants