Remove binary_imm64 and int_compare_imm instructions #4721

jameysharp · 2022-08-16T17:12:11Z

I have no idea yet whether this is a good change, and it definitely
should not be merged as-is. The main issues are that there are a bunch
of tests I haven't fixed up, and some optimizations I've removed. Also I
have no performance measurements yet to see what effect this has on
either compile time or generated code quality.

I just wanted to understand what the *_imm instructions were used for,
and trying to grep for them didn't work out well. So I removed them to
let the compiler tell me where they're used instead.

To avoid having to make as many changes, I introduced a InstImmBuilder
trait. It provides methods like iadd_imm which emit the pair of
instructions that legalization would have decomposed that instruction
into eventually anyway.

I have several questions that this may help with answering:

Is there significant savings, in IR code size and/or compile time,
from inlining one immediate operand into some instructions at the
frontend? Or is the widespread frontend use of these instructions just
for convenience?
Can simple_preopt cleanly identify arithmetic identities involving
constants without rewriting to use these combined forms?
Is there a compile-time cost to simple_preopt rewriting into combined
forms, only for legalization to rewrite them away again soon after?

bjorn3 · 2022-08-16T17:15:27Z

cc #3250

One of my arguments against this is that it makes the clif ir harder to read and harder to generate.

jameysharp · 2022-08-16T17:22:20Z

Hah, I guess I accidentally implemented what @cfallin proposed just by being confused. Thanks for digging that link up! It's helpful context.

I noticed a couple things that I think are existing bugs while exploring this:

The backends all panic if they see these opcodes during lowering... Except that x64 has rules in lower.isle to match iadd_imm. Can those rules ever fire?
I see simple_preopt was intended to rewrite "(iadd (iconst x) (iadd (iconst y) z))" to fold the constants together, and it uses the *_imm opcodes as an intermediate step. But it looks like that only actually happens if the constant is the first operand, not the second, because only one case has a recursive call to simplify.

Independent of whether we keep the *_imm instructions, I'm also wondering: Can simple_preopt be replaced with ISLE and match on the full tree structure, instead of rewriting to this intermediate form to avoid having to look three layers down in the tree of operators? I wonder if it would reduce compile time by not modifying instructions that it can't actually improve. Is that something that should happen after the e-graph work lands?

afonso360 · 2022-08-16T17:27:34Z

Wow, there is a lot more stuff attached to _imm than i expected!

Is it possible to print a comment with the const value if one of the sides of the iadd/isub/etc.. is a const? That way we could recover some of the readability.

For example:

v0 = iconst.i16 0xABCD
v1 = ...
v2 = ...
v3 = iadd.i16 v2, v0   ; v0 = 0xABCD

It would also help in cases other than *_imm. But probably does not cover all cases.

bjorn3 · 2022-08-16T17:28:19Z

Independent of whether we keep the *_imm instructions, I'm also wondering: Can simple_preopt be replaced with ISLE and match on the full tree structure

I believe that is exactly what will happen with the egraph PR if it gets merged. https://github.com/bytecodealliance/wasmtime/pull/4249/files#diff-a596f1a407284b0b3591c17053fe0eff7f0229b70a2fcd9282fc2aedf9cd38a4

bjorn3 · 2022-08-16T17:29:13Z

Is it possible to print a comment with the const value if one of the sides of the iadd/isub/etc.. is a const? That way we could recover some of the readability.

I like that! It would resolve my readability concerns. It doesn't resolve my clif ir generation concern though.

afonso360 · 2022-08-16T17:55:17Z

It doesn't resolve my clif ir generation concern though.

Reading the PR it looks like we keep the _imm builder functions, but they now emit the op + iconst, so there should no difference to users of cranelift. It that what you were referring to or did I miss something?

Although I do think we should note in the docs that those helpers are not actual instructions by themselves.

bjorn3 · 2022-08-16T17:59:32Z

Reading the PR it looks like we keep the _imm builder functions, but they now emit the op + iconst, so there should no difference to users of cranelift.

Right, missed that.

I have no idea yet whether this is a good change, and it definitely should not be merged as-is. The main issues are that there are a bunch of tests I haven't fixed up, and some optimizations I've removed. Also I have no performance measurements yet to see what effect this has on either compile time or generated code quality. I just wanted to understand what the `*_imm` instructions were used for, and trying to grep for them didn't work out well. So I removed them to let the compiler tell me where they're used instead. To avoid having to make as many changes, I introduced a `InstImmBuilder` trait. It provides methods like `iadd_imm` which emit the pair of instructions that legalization would have decomposed that instruction into eventually anyway. I have several questions that this may help with answering: - Is there significant savings, in IR code size and/or compile time, from inlining one immediate operand into some instructions at the frontend? Or is the widespread frontend use of these instructions just for convenience? - Can simple_preopt cleanly identify arithmetic identities involving constants without rewriting to use these combined forms? - Is there a compile-time cost to simple_preopt rewriting into combined forms, only for legalization to rewrite them away again soon after?

cfallin · 2022-08-16T19:52:19Z

@jameysharp thanks for exploring here. As others have pointed out, there's a lot of prior thought on this, and overall I think we do want to remove these variants in due time.

The backends all panic if they see these opcodes during lowering... Except that x64 has rules in lower.isle to match iadd_imm. Can those rules ever fire?

Currently the legalizer rewrites them from _imm form to op-with-iconst before the lowering backends see the CLIF; this is why we don't need to handle them in each backend.

I see simple_preopt was intended to rewrite "(iadd (iconst x) (iadd (iconst y) z))" to fold the constants together, and it uses the *_imm opcodes as an intermediate step. But it looks like that only actually happens if the constant is the first operand, not the second, because only one case has a recursive call to simplify.

Independent of whether we keep the *_imm instructions, I'm also wondering: Can simple_preopt be replaced with ISLE and match on the full tree structure, instead of rewriting to this intermediate form to avoid having to look three layers down in the tree of operators? I wonder if it would reduce compile time by not modifying instructions that it can't actually improve. Is that something that should happen after the e-graph work lands?

Indeed, that's one of the main effects of my mid-end optimizer work; it will replace simple_preopt just as you are suggesting.

Now, regarding iadd vs iadd_imm and friends:

I think that breaking the ops into smaller pieces, and keeping iadd separate from its imm, makes the most sense from an optimization-rules perspective: we otherwise need rules to match both. (For example, multiply-add needs to also match multiply-add-imm.) Likewise it's useful to keep iconst separate because we want to do special things with it sometimes, like rematerialization and/or use of constant pools.

Part of the reason for the original _imm forms, aside from the code-size advantage that @sunfishcode mentions in #3250, was that old Cranelift rewrote CLIF until each op had a direct machine-instruction equivalent; so iadd_imm was useful to represent the reg-immediate form of the x86 add instruction. We no longer have that constraint, though.

So I think the right way forward is in spirit similar to what you've done here, but (i) do it after the mid-end optimizer lands (pending RFC approval!) because there are some immediate followup thoughts in how to make legalization ride on top of it; and (ii) generate the _imm helpers on the InstBuilder in a programmatic way, rather than manually writing them.

jameysharp · 2022-08-16T20:48:15Z

Is it possible to print a comment with the const value if one of the sides of the iadd/isub/etc.. is a const? That way we could recover some of the readability.
...
v3 = iadd.i16 v2, v0   ; v0 = 0xABCD

This turned out to be really easy to do. I've opened #4725 for that.

The backends all panic if they see these opcodes during lowering... Except that x64 has rules in lower.isle to match iadd_imm. Can those rules ever fire?

Currently the legalizer rewrites them from _imm form to op-with-iconst before the lowering backends see the CLIF; this is why we don't need to handle them in each backend.

Right. So these rules in x64's lower.isle can never match, yeah?

So I think the right way forward is in spirit similar to what you've done here, but (i) do it after the mid-end optimizer lands (pending RFC approval!) because there are some immediate followup thoughts in how to make legalization ride on top of it; and (ii) generate the _imm helpers on the InstBuilder in a programmatic way, rather than manually writing them.

(i) makes sense. (ii) isn't immediately (!) obvious to me: is it that you want to make this pattern available as a convenience on more instructions, or you're worried about the correctness of the hand-written wrappers, or...?

cfallin · 2022-08-16T20:53:21Z

(i) makes sense. (ii) isn't immediately (!) obvious to me: is it that you want to make this pattern available as a convenience on more instructions, or you're worried about the correctness of the hand-written wrappers, or...?

Basically, generating the code is less error-prone and allows us to extend the pattern to arbitrary instructions as we choose fit; it's a lower-entropy encoding of the system design. We already generate .iadd() so generating .iadd_imm() given a special flag isn't a huge stretch. In contrast, mixing generated code and handwritten code (this PR) feels asymmetric and is difficult to maintain and extend in the future.

cfallin · 2022-08-16T20:54:13Z

Right. So these rules in x64's lower.isle can never match, yeah?

Apparently so! We can go ahead and delete those; I suspect fuzz coverage, if we were to get it in terms of ISLE source, would show this as well.

jameysharp · 2022-08-16T21:05:50Z

Right. So these rules in x64's lower.isle can never match, yeah?

Apparently so! We can go ahead and delete those; I suspect fuzz coverage, if we were to get it in terms of ISLE source, would show this as well.

Great, I've opened #4726 for deleting those.

Basically, generating the code is less error-prone and allows us to extend the pattern to arbitrary instructions as we choose fit; it's a lower-entropy encoding of the system design. We already generate .iadd() so generating .iadd_imm() given a special flag isn't a huge stretch. In contrast, mixing generated code and handwritten code (this PR) feels asymmetric and is difficult to maintain and extend in the future.

Okay, I can see that. It's not obvious to me what code that should generate, though, because the current InstBuilder traits assume each method will build exactly one instruction, consuming the builder. These _imm variants, being a kind of macro for multiple instructions, are fundamentally different.

But we can sort that out when it's time to re-visit this after your e-graph work lands.

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. cranelift:wasm labels Aug 16, 2022

jameysharp force-pushed the remove-imm-insts branch from 966c571 to abe4268 Compare August 16, 2022 20:13

jameysharp mentioned this pull request Aug 16, 2022

Print constants in a comment in CLIF output #4725

Merged

jameysharp mentioned this pull request Aug 16, 2022

Remove unreachable x64 lowerings for iadd_imm #4726

Merged

afonso360 mentioned this pull request Sep 9, 2022

cranelift: Tracking issue for Interpreter support #4889

Open

afonso360 mentioned this pull request Oct 22, 2022

cranelift: Remove iconst.i128 #5075

Merged

afonso360 mentioned this pull request Mar 7, 2023

Cranelift: Segfault with unaligned stack load sunk into extractlane on x64 #5922

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove binary_imm64 and int_compare_imm instructions #4721

Remove binary_imm64 and int_compare_imm instructions #4721

jameysharp commented Aug 16, 2022

bjorn3 commented Aug 16, 2022

jameysharp commented Aug 16, 2022

afonso360 commented Aug 16, 2022

bjorn3 commented Aug 16, 2022

bjorn3 commented Aug 16, 2022

afonso360 commented Aug 16, 2022 •

edited

Loading

bjorn3 commented Aug 16, 2022

cfallin commented Aug 16, 2022

jameysharp commented Aug 16, 2022

cfallin commented Aug 16, 2022

cfallin commented Aug 16, 2022

jameysharp commented Aug 16, 2022

Remove binary_imm64 and int_compare_imm instructions #4721

Are you sure you want to change the base?

Remove binary_imm64 and int_compare_imm instructions #4721

Conversation

jameysharp commented Aug 16, 2022

bjorn3 commented Aug 16, 2022

jameysharp commented Aug 16, 2022

afonso360 commented Aug 16, 2022

bjorn3 commented Aug 16, 2022

bjorn3 commented Aug 16, 2022

afonso360 commented Aug 16, 2022 • edited Loading

bjorn3 commented Aug 16, 2022

cfallin commented Aug 16, 2022

jameysharp commented Aug 16, 2022

cfallin commented Aug 16, 2022

cfallin commented Aug 16, 2022

jameysharp commented Aug 16, 2022

afonso360 commented Aug 16, 2022 •

edited

Loading