Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Improve select simplification based on known bits #97289

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Jul 1, 2024

#95923 added support to simplify select arms based on known bits implied by the select condition. That PR was limited to the case where the arm folds to a constant.

This PR extends support to the case where the select arm simplifies in some other way, for example by removing a bitwise operation. This is done by calling SimplifyDemandedBits() with the adjusted SimplifyQuery.

Unfortunately, this case requires that the condition isn't undef.

Compile-time: http://llvm-compile-time-tracker.com/compare.php?from=11484cb817bcc2a6e2ef9572be982a1a5a4964ec&to=67da0f88b52c613a277f93cb8d0183939786a9c7&stat=instructions%3Au

Fixes #71533.

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 1, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Nikita Popov (nikic)

Changes

#95923 added support to simplify select arms based on known bits implied by the select condition. That PR was limited to the case where the arm folds to a constant.

This PR extends support to the case where the select arm simplifies in some other way, for example by removing a bitwise operation. This is done by calling SimplifyDemandedBits() with the adjusted SimplifyQuery.

Unfortunately, this case requires that the condition isn't undef.

Compile-time: http://llvm-compile-time-tracker.com/compare.php?from=11484cb817bcc2a6e2ef9572be982a1a5a4964ec&to=67da0f88b52c613a277f93cb8d0183939786a9c7&stat=instructions%3Au

Fixes #71533.


Full diff: https://github.com/llvm/llvm-project/pull/97289.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (+28-14)
  • (modified) llvm/test/Transforms/InstCombine/select.ll (+4-7)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index 979495dae853e..fbbe2251b97e6 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -4052,22 +4052,36 @@ Instruction *InstCombinerImpl::visitSelectInst(SelectInst &SI) {
     });
     SimplifyQuery Q = SQ.getWithInstruction(&SI).getWithCondContext(CC);
     if (!CC.AffectedValues.empty()) {
-      if (!isa<Constant>(TrueVal) &&
-          hasAffectedValue(TrueVal, CC.AffectedValues, /*Depth=*/0)) {
-        KnownBits Known = llvm::computeKnownBits(TrueVal, /*Depth=*/0, Q);
-        if (Known.isConstant())
-          return replaceOperand(SI, 1,
-                                ConstantInt::get(SelType, Known.getConstant()));
-      }
+      std::optional<bool> NotUndef;
+      auto SimplifyOp = [&](unsigned OpNum) -> Instruction * {
+        Value *V = SI.getOperand(OpNum);
+        if (isa<Constant>(V) ||
+            !hasAffectedValue(V, CC.AffectedValues, /*Depth=*/0))
+          return nullptr;
+
+        if (!NotUndef)
+          NotUndef = isGuaranteedNotToBeUndef(CondVal);
 
+        if (*NotUndef) {
+          unsigned BitWidth = SelType->getScalarSizeInBits();
+          KnownBits Known(BitWidth);
+          if (SimplifyDemandedBits(&SI, OpNum, APInt::getAllOnes(BitWidth),
+                                   Known, /*Depth=*/0, Q))
+            return &SI;
+        } else {
+          KnownBits Known = llvm::computeKnownBits(V, /*Depth=*/0, Q);
+          if (Known.isConstant())
+            return replaceOperand(
+                SI, OpNum, ConstantInt::get(SelType, Known.getConstant()));
+        }
+        return nullptr;
+      };
+
+      if (Instruction *Res = SimplifyOp(1))
+        return Res;
       CC.Invert = true;
-      if (!isa<Constant>(FalseVal) &&
-          hasAffectedValue(FalseVal, CC.AffectedValues, /*Depth=*/0)) {
-        KnownBits Known = llvm::computeKnownBits(FalseVal, /*Depth=*/0, Q);
-        if (Known.isConstant())
-          return replaceOperand(SI, 2,
-                                ConstantInt::get(SelType, Known.getConstant()));
-      }
+      if (Instruction *Res = SimplifyOp(2))
+        return Res;
     }
   }
 
diff --git a/llvm/test/Transforms/InstCombine/select.ll b/llvm/test/Transforms/InstCombine/select.ll
index 63a4f74cbaaea..29805e07d0ac9 100644
--- a/llvm/test/Transforms/InstCombine/select.ll
+++ b/llvm/test/Transforms/InstCombine/select.ll
@@ -2989,9 +2989,8 @@ define i8 @select_replacement_loop3(i32 noundef %x) {
 
 define i16 @select_replacement_loop4(i16 noundef %p_12) {
 ; CHECK-LABEL: @select_replacement_loop4(
-; CHECK-NEXT:    [[AND1:%.*]] = and i16 [[P_12:%.*]], 1
-; CHECK-NEXT:    [[CMP21:%.*]] = icmp ult i16 [[P_12]], 2
-; CHECK-NEXT:    [[AND3:%.*]] = select i1 [[CMP21]], i16 [[AND1]], i16 0
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp ult i16 [[P_12:%.*]], 2
+; CHECK-NEXT:    [[AND3:%.*]] = select i1 [[CMP1]], i16 [[P_12]], i16 0
 ; CHECK-NEXT:    ret i16 [[AND3]]
 ;
   %cmp1 = icmp ult i16 %p_12, 2
@@ -4671,8 +4670,7 @@ define i8 @select_knownbits_simplify(i8 noundef %x)  {
 ; CHECK-LABEL: @select_knownbits_simplify(
 ; CHECK-NEXT:    [[X_LO:%.*]] = and i8 [[X:%.*]], 1
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i8 [[X_LO]], 0
-; CHECK-NEXT:    [[AND:%.*]] = and i8 [[X]], -2
-; CHECK-NEXT:    [[RES:%.*]] = select i1 [[CMP]], i8 [[AND]], i8 0
+; CHECK-NEXT:    [[RES:%.*]] = select i1 [[CMP]], i8 [[X]], i8 0
 ; CHECK-NEXT:    ret i8 [[RES]]
 ;
   %x.lo = and i8 %x, 1
@@ -4686,8 +4684,7 @@ define i8 @select_knownbits_simplify_nested(i8 noundef %x)  {
 ; CHECK-LABEL: @select_knownbits_simplify_nested(
 ; CHECK-NEXT:    [[X_LO:%.*]] = and i8 [[X:%.*]], 1
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i8 [[X_LO]], 0
-; CHECK-NEXT:    [[AND:%.*]] = and i8 [[X]], -2
-; CHECK-NEXT:    [[MUL:%.*]] = mul i8 [[AND]], [[AND]]
+; CHECK-NEXT:    [[MUL:%.*]] = mul i8 [[X]], [[X]]
 ; CHECK-NEXT:    [[RES:%.*]] = select i1 [[CMP]], i8 [[MUL]], i8 0
 ; CHECK-NEXT:    ret i8 [[RES]]
 ;

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Jul 1, 2024
nikic added a commit that referenced this pull request Jul 2, 2024
When calling SimplifyDemandedBits (as opposed to
SimplifyDemandedInstructionBits), and there are multiple uses,
always use SimplifyMultipleUseDemandedBits and drop the special
case for root values.

This fixes the ephemeral value detection, as seen by the restored
assumes in tests. It may result in more or less simplification,
depending on whether we get more out of having demanded bits or
the ability to perform non-multi-use transforms. The change in
the phi-known-bits.ll test is because the icmp operand now gets
simplified based on demanded bits, which then prevents a different
known bits simplification later.

This also makes the code safe against future changes like
#97289, which add more
context that would have to be discarded for the multi-use case.
lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024
When calling SimplifyDemandedBits (as opposed to
SimplifyDemandedInstructionBits), and there are multiple uses,
always use SimplifyMultipleUseDemandedBits and drop the special
case for root values.

This fixes the ephemeral value detection, as seen by the restored
assumes in tests. It may result in more or less simplification,
depending on whether we get more out of having demanded bits or
the ability to perform non-multi-use transforms. The change in
the phi-known-bits.ll test is because the icmp operand now gets
simplified based on demanded bits, which then prevents a different
known bits simplification later.

This also makes the code safe against future changes like
llvm#97289, which add more
context that would have to be discarded for the multi-use case.
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
When calling SimplifyDemandedBits (as opposed to
SimplifyDemandedInstructionBits), and there are multiple uses,
always use SimplifyMultipleUseDemandedBits and drop the special
case for root values.

This fixes the ephemeral value detection, as seen by the restored
assumes in tests. It may result in more or less simplification,
depending on whether we get more out of having demanded bits or
the ability to perform non-multi-use transforms. The change in
the phi-known-bits.ll test is because the icmp operand now gets
simplified based on demanded bits, which then prevents a different
known bits simplification later.

This also makes the code safe against future changes like
llvm#97289, which add more
context that would have to be discarded for the multi-use case.
@dtcxzyw
Copy link
Member

dtcxzyw commented Jul 7, 2024

This patch breaks a ton of and/or of icmps fold in some rust applications. See dtcxzyw/llvm-opt-benchmark#818.
Can we mitigate the problem by skipping logical and/or patterns?

return nullptr;

if (!NotUndef)
NotUndef = isGuaranteedNotToBeUndef(CondVal);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm? Aren't we going to end up skipping the noundef check on false arm if true arm makes it this far?

@dtcxzyw
Copy link
Member

dtcxzyw commented Jul 16, 2024

Can we revert #66740 after landing this patch? @cyyself found that #66740 caused a significant compile-time regression in SLPVectorizer when building RTL simulator for https://github.com/OpenXiangShan/XiangShan.

@nikic
Copy link
Contributor Author

nikic commented Jul 16, 2024

Can we revert #66740 after landing this patch? @cyyself found that #66740 caused a significant compile-time regression in SLPVectorizer when building RTL simulator for https://github.com/OpenXiangShan/XiangShan.

I don't really get how the optimization in this PR would help the motivating issues for #66740?

@dtcxzyw
Copy link
Member

dtcxzyw commented Jul 16, 2024

Can we revert #66740 after landing this patch? @cyyself found that #66740 caused a significant compile-time regression in SLPVectorizer when building RTL simulator for https://github.com/OpenXiangShan/XiangShan.

I don't really get how the optimization in this PR would help the motivating issues for #66740?

Sorry, I misread the fold in #66740 as B == odd ? (B & 1) : 0 -> B == odd ? 1 : 0.

@cyyself
Copy link
Contributor

cyyself commented Jul 16, 2024

Can we revert #66740 after landing this patch? @cyyself found that #66740 caused a significant compile-time regression in SLPVectorizer when building RTL simulator for https://github.com/OpenXiangShan/XiangShan.

Yeah. And note that I have a minimal reproduction here: https://github.com/cyyself/verilator-slow-compile-cases/tree/master/XiangShan-136f64975e-0a9b31bb30

You may need some headers in Verilator to get this being compiled. For most distro, you can directly install a verilator.

@dtcxzyw
Copy link
Member

dtcxzyw commented Jul 16, 2024

Can we revert #66740 after landing this patch? @cyyself found that #66740 caused a significant compile-time regression in SLPVectorizer when building RTL simulator for https://github.com/OpenXiangShan/XiangShan.

Yeah. And note that I have a minimal reproduction here: https://github.com/cyyself/verilator-slow-compile-cases/tree/master/XiangShan-136f64975e-0a9b31bb30

You may need some headers in Verilator to get this being compiled. For most distro, you can directly install a verilator.

Can you file a separate ticket for this?

cyyself referenced this pull request in cyyself/verilator-slow-compile-cases Jul 17, 2024
nikic added a commit to nikic/llvm-project that referenced this pull request Oct 2, 2024
Extend decomposeBitTestICmp() to handle cases where the resulting
comparison is of the form `icmp (X & Mask) pred Cmp` with non-zero
`Cmp`. Add a flag to allow code to opt-in to this behavior and use
it in the "log op of icmp" fold infrastructure.

This addresses regressions from llvm#97289.

Proofs: https://alive2.llvm.org/ce/z/hUhdbU
nikic added a commit to nikic/llvm-project that referenced this pull request Oct 4, 2024
Extend decomposeBitTestICmp() to handle cases where the resulting
comparison is of the form `icmp (X & Mask) pred Cmp` with non-zero
`Cmp`. Add a flag to allow code to opt-in to this behavior and use
it in the "log op of icmp" fold infrastructure.

This addresses regressions from llvm#97289.

Proofs: https://alive2.llvm.org/ce/z/hUhdbU
nikic added a commit that referenced this pull request Oct 4, 2024
Extend decomposeBitTestICmp() to handle cases where the resulting
comparison is of the form `icmp (X & Mask) pred C` with non-zero
`C`. Add a flag to allow code to opt-in to this behavior and use it in
the "log op of icmp" fold infrastructure.

This addresses regressions from #97289.

Proofs: https://alive2.llvm.org/ce/z/hUhdbU
@nikic
Copy link
Contributor Author

nikic commented Oct 4, 2024

Unfortunately #110836 did not fix all regressions here. For example this one still exists: https://alive2.llvm.org/ce/z/vQxX3c

@nikic
Copy link
Contributor Author

nikic commented Oct 4, 2024

Unoptimized IR for one of the problematic cases: https://llvm.godbolt.org/z/evx3zvMEc

Originally, the select true value was folded to a xor with sign mask. When we thread the icmp eq over the select it becomes a simple icmp. Instead demanded bits simplification now replaces the xor with an and and then shrinks it.

Let me try disabling that particular fold and see if it is improves things...

xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
Extend decomposeBitTestICmp() to handle cases where the resulting
comparison is of the form `icmp (X & Mask) pred C` with non-zero
`C`. Add a flag to allow code to opt-in to this behavior and use it in
the "log op of icmp" fold infrastructure.

This addresses regressions from llvm#97289.

Proofs: https://alive2.llvm.org/ce/z/hUhdbU
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[InstCombine] Missed optimization for select a%2==0, (a/2*2)*(a/2*2), 0
5 participants