Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove binary_imm64 and int_compare_imm instructions #4721

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jameysharp
Copy link
Contributor

I have no idea yet whether this is a good change, and it definitely
should not be merged as-is. The main issues are that there are a bunch
of tests I haven't fixed up, and some optimizations I've removed. Also I
have no performance measurements yet to see what effect this has on
either compile time or generated code quality.

I just wanted to understand what the *_imm instructions were used for,
and trying to grep for them didn't work out well. So I removed them to
let the compiler tell me where they're used instead.

To avoid having to make as many changes, I introduced a InstImmBuilder
trait. It provides methods like iadd_imm which emit the pair of
instructions that legalization would have decomposed that instruction
into eventually anyway.

I have several questions that this may help with answering:

  • Is there significant savings, in IR code size and/or compile time,
    from inlining one immediate operand into some instructions at the
    frontend? Or is the widespread frontend use of these instructions just
    for convenience?

  • Can simple_preopt cleanly identify arithmetic identities involving
    constants without rewriting to use these combined forms?

  • Is there a compile-time cost to simple_preopt rewriting into combined
    forms, only for legalization to rewrite them away again soon after?

@bjorn3
Copy link
Contributor

bjorn3 commented Aug 16, 2022

cc #3250

One of my arguments against this is that it makes the clif ir harder to read and harder to generate.

@jameysharp
Copy link
Contributor Author

Hah, I guess I accidentally implemented what @cfallin proposed just by being confused. Thanks for digging that link up! It's helpful context.

I noticed a couple things that I think are existing bugs while exploring this:

  • The backends all panic if they see these opcodes during lowering... Except that x64 has rules in lower.isle to match iadd_imm. Can those rules ever fire?

  • I see simple_preopt was intended to rewrite "(iadd (iconst x) (iadd (iconst y) z))" to fold the constants together, and it uses the *_imm opcodes as an intermediate step. But it looks like that only actually happens if the constant is the first operand, not the second, because only one case has a recursive call to simplify.

Independent of whether we keep the *_imm instructions, I'm also wondering: Can simple_preopt be replaced with ISLE and match on the full tree structure, instead of rewriting to this intermediate form to avoid having to look three layers down in the tree of operators? I wonder if it would reduce compile time by not modifying instructions that it can't actually improve. Is that something that should happen after the e-graph work lands?

@afonso360
Copy link
Contributor

Wow, there is a lot more stuff attached to _imm than i expected!

Is it possible to print a comment with the const value if one of the sides of the iadd/isub/etc.. is a const? That way we could recover some of the readability.

For example:

v0 = iconst.i16 0xABCD
v1 = ...
v2 = ...
v3 = iadd.i16 v2, v0   ; v0 = 0xABCD

It would also help in cases other than *_imm. But probably does not cover all cases.

@bjorn3
Copy link
Contributor

bjorn3 commented Aug 16, 2022

Independent of whether we keep the *_imm instructions, I'm also wondering: Can simple_preopt be replaced with ISLE and match on the full tree structure

I believe that is exactly what will happen with the egraph PR if it gets merged. https://github.com/bytecodealliance/wasmtime/pull/4249/files#diff-a596f1a407284b0b3591c17053fe0eff7f0229b70a2fcd9282fc2aedf9cd38a4

@bjorn3
Copy link
Contributor

bjorn3 commented Aug 16, 2022

Is it possible to print a comment with the const value if one of the sides of the iadd/isub/etc.. is a const? That way we could recover some of the readability.

I like that! It would resolve my readability concerns. It doesn't resolve my clif ir generation concern though.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. cranelift:wasm labels Aug 16, 2022
@afonso360
Copy link
Contributor

afonso360 commented Aug 16, 2022

It doesn't resolve my clif ir generation concern though.

Reading the PR it looks like we keep the _imm builder functions, but they now emit the op + iconst, so there should no difference to users of cranelift. It that what you were referring to or did I miss something?

Although I do think we should note in the docs that those helpers are not actual instructions by themselves.

@bjorn3
Copy link
Contributor

bjorn3 commented Aug 16, 2022

Reading the PR it looks like we keep the _imm builder functions, but they now emit the op + iconst, so there should no difference to users of cranelift.

Right, missed that.

I have no idea yet whether this is a good change, and it definitely
should not be merged as-is. The main issues are that there are a bunch
of tests I haven't fixed up, and some optimizations I've removed. Also I
have no performance measurements yet to see what effect this has on
either compile time or generated code quality.

I just wanted to understand what the `*_imm` instructions were used for,
and trying to grep for them didn't work out well. So I removed them to
let the compiler tell me where they're used instead.

To avoid having to make as many changes, I introduced a `InstImmBuilder`
trait. It provides methods like `iadd_imm` which emit the pair of
instructions that legalization would have decomposed that instruction
into eventually anyway.

I have several questions that this may help with answering:

- Is there significant savings, in IR code size and/or compile time,
  from inlining one immediate operand into some instructions at the
  frontend? Or is the widespread frontend use of these instructions just
  for convenience?

- Can simple_preopt cleanly identify arithmetic identities involving
  constants without rewriting to use these combined forms?

- Is there a compile-time cost to simple_preopt rewriting into combined
  forms, only for legalization to rewrite them away again soon after?
@cfallin
Copy link
Member

cfallin commented Aug 16, 2022

@jameysharp thanks for exploring here. As others have pointed out, there's a lot of prior thought on this, and overall I think we do want to remove these variants in due time.

The backends all panic if they see these opcodes during lowering... Except that x64 has rules in lower.isle to match iadd_imm. Can those rules ever fire?

Currently the legalizer rewrites them from _imm form to op-with-iconst before the lowering backends see the CLIF; this is why we don't need to handle them in each backend.

I see simple_preopt was intended to rewrite "(iadd (iconst x) (iadd (iconst y) z))" to fold the constants together, and it uses the *_imm opcodes as an intermediate step. But it looks like that only actually happens if the constant is the first operand, not the second, because only one case has a recursive call to simplify.

Independent of whether we keep the *_imm instructions, I'm also wondering: Can simple_preopt be replaced with ISLE and match on the full tree structure, instead of rewriting to this intermediate form to avoid having to look three layers down in the tree of operators? I wonder if it would reduce compile time by not modifying instructions that it can't actually improve. Is that something that should happen after the e-graph work lands?

Indeed, that's one of the main effects of my mid-end optimizer work; it will replace simple_preopt just as you are suggesting.

Now, regarding iadd vs iadd_imm and friends:

I think that breaking the ops into smaller pieces, and keeping iadd separate from its imm, makes the most sense from an optimization-rules perspective: we otherwise need rules to match both. (For example, multiply-add needs to also match multiply-add-imm.) Likewise it's useful to keep iconst separate because we want to do special things with it sometimes, like rematerialization and/or use of constant pools.

Part of the reason for the original _imm forms, aside from the code-size advantage that @sunfishcode mentions in #3250, was that old Cranelift rewrote CLIF until each op had a direct machine-instruction equivalent; so iadd_imm was useful to represent the reg-immediate form of the x86 add instruction. We no longer have that constraint, though.

So I think the right way forward is in spirit similar to what you've done here, but (i) do it after the mid-end optimizer lands (pending RFC approval!) because there are some immediate followup thoughts in how to make legalization ride on top of it; and (ii) generate the _imm helpers on the InstBuilder in a programmatic way, rather than manually writing them.

@jameysharp
Copy link
Contributor Author

Is it possible to print a comment with the const value if one of the sides of the iadd/isub/etc.. is a const? That way we could recover some of the readability.
...

v3 = iadd.i16 v2, v0   ; v0 = 0xABCD

This turned out to be really easy to do. I've opened #4725 for that.

The backends all panic if they see these opcodes during lowering... Except that x64 has rules in lower.isle to match iadd_imm. Can those rules ever fire?

Currently the legalizer rewrites them from _imm form to op-with-iconst before the lowering backends see the CLIF; this is why we don't need to handle them in each backend.

Right. So these rules in x64's lower.isle can never match, yeah?

So I think the right way forward is in spirit similar to what you've done here, but (i) do it after the mid-end optimizer lands (pending RFC approval!) because there are some immediate followup thoughts in how to make legalization ride on top of it; and (ii) generate the _imm helpers on the InstBuilder in a programmatic way, rather than manually writing them.

(i) makes sense. (ii) isn't immediately (!) obvious to me: is it that you want to make this pattern available as a convenience on more instructions, or you're worried about the correctness of the hand-written wrappers, or...?

@cfallin
Copy link
Member

cfallin commented Aug 16, 2022

(i) makes sense. (ii) isn't immediately (!) obvious to me: is it that you want to make this pattern available as a convenience on more instructions, or you're worried about the correctness of the hand-written wrappers, or...?

Basically, generating the code is less error-prone and allows us to extend the pattern to arbitrary instructions as we choose fit; it's a lower-entropy encoding of the system design. We already generate .iadd() so generating .iadd_imm() given a special flag isn't a huge stretch. In contrast, mixing generated code and handwritten code (this PR) feels asymmetric and is difficult to maintain and extend in the future.

@cfallin
Copy link
Member

cfallin commented Aug 16, 2022

Right. So these rules in x64's lower.isle can never match, yeah?

Apparently so! We can go ahead and delete those; I suspect fuzz coverage, if we were to get it in terms of ISLE source, would show this as well.

@jameysharp
Copy link
Contributor Author

Right. So these rules in x64's lower.isle can never match, yeah?

Apparently so! We can go ahead and delete those; I suspect fuzz coverage, if we were to get it in terms of ISLE source, would show this as well.

Great, I've opened #4726 for deleting those.

Basically, generating the code is less error-prone and allows us to extend the pattern to arbitrary instructions as we choose fit; it's a lower-entropy encoding of the system design. We already generate .iadd() so generating .iadd_imm() given a special flag isn't a huge stretch. In contrast, mixing generated code and handwritten code (this PR) feels asymmetric and is difficult to maintain and extend in the future.

Okay, I can see that. It's not obvious to me what code that should generate, though, because the current InstBuilder traits assume each method will build exactly one instruction, consuming the builder. These _imm variants, being a kind of macro for multiple instructions, are fundamentally different.

But we can sort that out when it's time to re-visit this after your e-graph work lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. cranelift:wasm cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants