[InstCombine] Improve select simplification based on known bits #97289

nikic · 2024-07-01T13:01:23Z

#95923 added support to simplify select arms based on known bits implied by the select condition. That PR was limited to the case where the arm folds to a constant.

This PR extends support to the case where the select arm simplifies in some other way, for example by removing a bitwise operation. This is done by calling SimplifyDemandedBits() with the adjusted SimplifyQuery.

Unfortunately, this case requires that the condition isn't undef.

Compile-time: http://llvm-compile-time-tracker.com/compare.php?from=11484cb817bcc2a6e2ef9572be982a1a5a4964ec&to=67da0f88b52c613a277f93cb8d0183939786a9c7&stat=instructions%3Au

Fixes #71533.

llvmbot · 2024-07-01T13:01:56Z

@llvm/pr-subscribers-llvm-transforms

Author: Nikita Popov (nikic)

Changes

#95923 added support to simplify select arms based on known bits implied by the select condition. That PR was limited to the case where the arm folds to a constant.

This PR extends support to the case where the select arm simplifies in some other way, for example by removing a bitwise operation. This is done by calling SimplifyDemandedBits() with the adjusted SimplifyQuery.

Unfortunately, this case requires that the condition isn't undef.

Compile-time: http://llvm-compile-time-tracker.com/compare.php?from=11484cb817bcc2a6e2ef9572be982a1a5a4964ec&to=67da0f88b52c613a277f93cb8d0183939786a9c7&stat=instructions%3Au

Fixes #71533.

Full diff: https://github.com/llvm/llvm-project/pull/97289.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (+28-14)
(modified) llvm/test/Transforms/InstCombine/select.ll (+4-7)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index 979495dae853e..fbbe2251b97e6 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -4052,22 +4052,36 @@ Instruction *InstCombinerImpl::visitSelectInst(SelectInst &SI) {
     });
     SimplifyQuery Q = SQ.getWithInstruction(&SI).getWithCondContext(CC);
     if (!CC.AffectedValues.empty()) {
-      if (!isa<Constant>(TrueVal) &&
-          hasAffectedValue(TrueVal, CC.AffectedValues, /*Depth=*/0)) {
-        KnownBits Known = llvm::computeKnownBits(TrueVal, /*Depth=*/0, Q);
-        if (Known.isConstant())
-          return replaceOperand(SI, 1,
-                                ConstantInt::get(SelType, Known.getConstant()));
-      }
+      std::optional<bool> NotUndef;
+      auto SimplifyOp = [&](unsigned OpNum) -> Instruction * {
+        Value *V = SI.getOperand(OpNum);
+        if (isa<Constant>(V) ||
+            !hasAffectedValue(V, CC.AffectedValues, /*Depth=*/0))
+          return nullptr;
+
+        if (!NotUndef)
+          NotUndef = isGuaranteedNotToBeUndef(CondVal);
 
+        if (*NotUndef) {
+          unsigned BitWidth = SelType->getScalarSizeInBits();
+          KnownBits Known(BitWidth);
+          if (SimplifyDemandedBits(&SI, OpNum, APInt::getAllOnes(BitWidth),
+                                   Known, /*Depth=*/0, Q))
+            return &SI;
+        } else {
+          KnownBits Known = llvm::computeKnownBits(V, /*Depth=*/0, Q);
+          if (Known.isConstant())
+            return replaceOperand(
+                SI, OpNum, ConstantInt::get(SelType, Known.getConstant()));
+        }
+        return nullptr;
+      };
+
+      if (Instruction *Res = SimplifyOp(1))
+        return Res;
       CC.Invert = true;
-      if (!isa<Constant>(FalseVal) &&
-          hasAffectedValue(FalseVal, CC.AffectedValues, /*Depth=*/0)) {
-        KnownBits Known = llvm::computeKnownBits(FalseVal, /*Depth=*/0, Q);
-        if (Known.isConstant())
-          return replaceOperand(SI, 2,
-                                ConstantInt::get(SelType, Known.getConstant()));
-      }
+      if (Instruction *Res = SimplifyOp(2))
+        return Res;
     }
   }
 
diff --git a/llvm/test/Transforms/InstCombine/select.ll b/llvm/test/Transforms/InstCombine/select.ll
index 63a4f74cbaaea..29805e07d0ac9 100644
--- a/llvm/test/Transforms/InstCombine/select.ll
+++ b/llvm/test/Transforms/InstCombine/select.ll
@@ -2989,9 +2989,8 @@ define i8 @select_replacement_loop3(i32 noundef %x) {
 
 define i16 @select_replacement_loop4(i16 noundef %p_12) {
 ; CHECK-LABEL: @select_replacement_loop4(
-; CHECK-NEXT:    [[AND1:%.*]] = and i16 [[P_12:%.*]], 1
-; CHECK-NEXT:    [[CMP21:%.*]] = icmp ult i16 [[P_12]], 2
-; CHECK-NEXT:    [[AND3:%.*]] = select i1 [[CMP21]], i16 [[AND1]], i16 0
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp ult i16 [[P_12:%.*]], 2
+; CHECK-NEXT:    [[AND3:%.*]] = select i1 [[CMP1]], i16 [[P_12]], i16 0
 ; CHECK-NEXT:    ret i16 [[AND3]]
 ;
   %cmp1 = icmp ult i16 %p_12, 2
@@ -4671,8 +4670,7 @@ define i8 @select_knownbits_simplify(i8 noundef %x)  {
 ; CHECK-LABEL: @select_knownbits_simplify(
 ; CHECK-NEXT:    [[X_LO:%.*]] = and i8 [[X:%.*]], 1
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i8 [[X_LO]], 0
-; CHECK-NEXT:    [[AND:%.*]] = and i8 [[X]], -2
-; CHECK-NEXT:    [[RES:%.*]] = select i1 [[CMP]], i8 [[AND]], i8 0
+; CHECK-NEXT:    [[RES:%.*]] = select i1 [[CMP]], i8 [[X]], i8 0
 ; CHECK-NEXT:    ret i8 [[RES]]
 ;
   %x.lo = and i8 %x, 1
@@ -4686,8 +4684,7 @@ define i8 @select_knownbits_simplify_nested(i8 noundef %x)  {
 ; CHECK-LABEL: @select_knownbits_simplify_nested(
 ; CHECK-NEXT:    [[X_LO:%.*]] = and i8 [[X:%.*]], 1
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i8 [[X_LO]], 0
-; CHECK-NEXT:    [[AND:%.*]] = and i8 [[X]], -2
-; CHECK-NEXT:    [[MUL:%.*]] = mul i8 [[AND]], [[AND]]
+; CHECK-NEXT:    [[MUL:%.*]] = mul i8 [[X]], [[X]]
 ; CHECK-NEXT:    [[RES:%.*]] = select i1 [[CMP]], i8 [[MUL]], i8 0
 ; CHECK-NEXT:    ret i8 [[RES]]
 ;

PR Link: llvm/llvm-project#97289

When calling SimplifyDemandedBits (as opposed to SimplifyDemandedInstructionBits), and there are multiple uses, always use SimplifyMultipleUseDemandedBits and drop the special case for root values. This fixes the ephemeral value detection, as seen by the restored assumes in tests. It may result in more or less simplification, depending on whether we get more out of having demanded bits or the ability to perform non-multi-use transforms. The change in the phi-known-bits.ll test is because the icmp operand now gets simplified based on demanded bits, which then prevents a different known bits simplification later. This also makes the code safe against future changes like #97289, which add more context that would have to be discarded for the multi-use case.

When calling SimplifyDemandedBits (as opposed to SimplifyDemandedInstructionBits), and there are multiple uses, always use SimplifyMultipleUseDemandedBits and drop the special case for root values. This fixes the ephemeral value detection, as seen by the restored assumes in tests. It may result in more or less simplification, depending on whether we get more out of having demanded bits or the ability to perform non-multi-use transforms. The change in the phi-known-bits.ll test is because the icmp operand now gets simplified based on demanded bits, which then prevents a different known bits simplification later. This also makes the code safe against future changes like llvm#97289, which add more context that would have to be discarded for the multi-use case.

dtcxzyw · 2024-07-07T17:11:20Z

This patch breaks a ton of and/or of icmps fold in some rust applications. See dtcxzyw/llvm-opt-benchmark#818.
Can we mitigate the problem by skipping logical and/or patterns?

goldsteinn · 2024-07-08T05:07:44Z

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

+          return nullptr;
+
+        if (!NotUndef)
+          NotUndef = isGuaranteedNotToBeUndef(CondVal);


Hmm? Aren't we going to end up skipping the noundef check on false arm if true arm makes it this far?

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

dtcxzyw · 2024-07-16T20:34:47Z

Can we revert #66740 after landing this patch? @cyyself found that #66740 caused a significant compile-time regression in SLPVectorizer when building RTL simulator for https://github.com/OpenXiangShan/XiangShan.

nikic · 2024-07-16T20:39:17Z

Can we revert #66740 after landing this patch? @cyyself found that #66740 caused a significant compile-time regression in SLPVectorizer when building RTL simulator for https://github.com/OpenXiangShan/XiangShan.

I don't really get how the optimization in this PR would help the motivating issues for #66740?

dtcxzyw · 2024-07-16T20:50:41Z

Can we revert #66740 after landing this patch? @cyyself found that #66740 caused a significant compile-time regression in SLPVectorizer when building RTL simulator for https://github.com/OpenXiangShan/XiangShan.

I don't really get how the optimization in this PR would help the motivating issues for #66740?

Sorry, I misread the fold in #66740 as B == odd ? (B & 1) : 0 -> B == odd ? 1 : 0.

cyyself · 2024-07-16T20:58:29Z

Can we revert #66740 after landing this patch? @cyyself found that #66740 caused a significant compile-time regression in SLPVectorizer when building RTL simulator for https://github.com/OpenXiangShan/XiangShan.

Yeah. And note that I have a minimal reproduction here: https://github.com/cyyself/verilator-slow-compile-cases/tree/master/XiangShan-136f64975e-0a9b31bb30

You may need some headers in Verilator to get this being compiled. For most distro, you can directly install a verilator.

dtcxzyw · 2024-07-16T21:07:05Z

Can we revert #66740 after landing this patch? @cyyself found that #66740 caused a significant compile-time regression in SLPVectorizer when building RTL simulator for https://github.com/OpenXiangShan/XiangShan.

Yeah. And note that I have a minimal reproduction here: https://github.com/cyyself/verilator-slow-compile-cases/tree/master/XiangShan-136f64975e-0a9b31bb30

You may need some headers in Verilator to get this being compiled. For most distro, you can directly install a verilator.

Can you file a separate ticket for this?

Extend decomposeBitTestICmp() to handle cases where the resulting comparison is of the form `icmp (X & Mask) pred Cmp` with non-zero `Cmp`. Add a flag to allow code to opt-in to this behavior and use it in the "log op of icmp" fold infrastructure. This addresses regressions from llvm#97289. Proofs: https://alive2.llvm.org/ce/z/hUhdbU

Extend decomposeBitTestICmp() to handle cases where the resulting comparison is of the form `icmp (X & Mask) pred C` with non-zero `C`. Add a flag to allow code to opt-in to this behavior and use it in the "log op of icmp" fold infrastructure. This addresses regressions from #97289. Proofs: https://alive2.llvm.org/ce/z/hUhdbU

nikic · 2024-10-04T13:21:55Z

Unfortunately #110836 did not fix all regressions here. For example this one still exists: https://alive2.llvm.org/ce/z/vQxX3c

nikic · 2024-10-04T14:03:08Z

Unoptimized IR for one of the problematic cases: https://llvm.godbolt.org/z/evx3zvMEc

Originally, the select true value was folded to a xor with sign mask. When we thread the icmp eq over the select it becomes a simple icmp. Instead demanded bits simplification now replaces the xor with an and and then shrinks it.

Let me try disabling that particular fold and see if it is improves things...

Extend decomposeBitTestICmp() to handle cases where the resulting comparison is of the form `icmp (X & Mask) pred C` with non-zero `C`. Add a flag to allow code to opt-in to this behavior and use it in the "log op of icmp" fold infrastructure. This addresses regressions from llvm#97289. Proofs: https://alive2.llvm.org/ce/z/hUhdbU

nikic requested review from dtcxzyw and goldsteinn July 1, 2024 13:01

llvmbot added the llvm:transforms label Jul 1, 2024

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Jul 1, 2024

pre-commit: test PR97289

b987b52

PR Link: llvm/llvm-project#97289

dtcxzyw mentioned this pull request Jul 1, 2024

pre-commit: test PR97289 dtcxzyw/llvm-opt-benchmark#818

Open

goldsteinn reviewed Jul 8, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp Show resolved Hide resolved

cyyself referenced this pull request in cyyself/verilator-slow-compile-cases Jul 17, 2024

XiangShan-136f64975e-0a9b31bb30: update failed LLVM commit

f218797

nikic mentioned this pull request Oct 2, 2024

[InstCombine] Decompose more icmps into masks #110836

Merged

Use demanded bits simplification

f11c7ce

nikic force-pushed the select-known-bits-4 branch from 7aaf72d to f11c7ce Compare October 4, 2024 08:20

nikic mentioned this pull request Oct 4, 2024

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

dtcxzyw mentioned this pull request Oct 4, 2024

pre-commit: PR97289 dtcxzyw/llvm-opt-benchmark#1435

Open

Try to disable fold

8c00ff4

dtcxzyw mentioned this pull request Oct 4, 2024

pre-commit: PR97289 dtcxzyw/llvm-opt-benchmark#1437

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Improve select simplification based on known bits #97289

[InstCombine] Improve select simplification based on known bits #97289

nikic commented Jul 1, 2024

llvmbot commented Jul 1, 2024

dtcxzyw commented Jul 7, 2024

goldsteinn Jul 8, 2024

dtcxzyw commented Jul 16, 2024

nikic commented Jul 16, 2024

dtcxzyw commented Jul 16, 2024

cyyself commented Jul 16, 2024

dtcxzyw commented Jul 16, 2024

nikic commented Oct 4, 2024

nikic commented Oct 4, 2024

[InstCombine] Improve select simplification based on known bits #97289

Are you sure you want to change the base?

[InstCombine] Improve select simplification based on known bits #97289

Conversation

nikic commented Jul 1, 2024

llvmbot commented Jul 1, 2024

dtcxzyw commented Jul 7, 2024

goldsteinn Jul 8, 2024

Choose a reason for hiding this comment

dtcxzyw commented Jul 16, 2024

nikic commented Jul 16, 2024

dtcxzyw commented Jul 16, 2024

cyyself commented Jul 16, 2024

dtcxzyw commented Jul 16, 2024

nikic commented Oct 4, 2024

nikic commented Oct 4, 2024