Add support for folding core SIMD operations that produce TYP_MASK on newer hardware #104875

tannergooding · 2024-07-14T20:40:28Z

As per the title this gives us the ability to fold nodes that produce TYP_MASK giving it parity with operations that exist for NEON or AVX2.

To achieve this we introduce a new GenTreeMskCon type that can track mask constants. It was undesirable to reuse GenTreeVecCon as there is quite a difference of what needs to be considered and how bits are interpretered for a TYP_MASK vs a TYP_SIMD.

As part of this, it also allows us to push lower cost nodes to the right of SIMD comparisons which can improve codegen in some cases.

There is a TP improvement for Linux x64/Arm64 (Clang) and Windows x86 (MSVC) and a TP regression for Windows and Linux x64 (MSVC). There is no change to TP for Windows/Linux Arm64. This regression is up to +0.14 (Windows x64 MSVC) and +0.34% (Linux x64 MSVC) but is primarily "pay for play" in that the methods it impacts are all using SIMD mask nodes and so hit the new processing logic. Methods which aren't using SIMD mask nodes (such as pre-AVX512 machines or Arm64 machines) aren't incurring any penalty here.

tannergooding · 2024-07-15T19:27:25Z

/azp run runtime-coreclr jitstress-isas-x86

azure-pipelines · 2024-07-15T19:27:33Z

Azure Pipelines successfully started running 1 pipeline(s).

kunalspathak · 2024-07-15T19:34:18Z

src/coreclr/jit/gentree.h

@@ -7369,6 +7345,67 @@ struct GenTreeVecCon : public GenTree
 #endif
 };

+// GenTreeMskCon -- mask constant (GT_CNS_MSK)
+//
+struct GenTreeMskCon : public GenTree


should this be under FEATURE_MASKED_HW_INTRINSICS too?

This is matching how GenTreeVecCon exists, which is likewise not under FEATURE_SIMD.

AFAIR this is because it makes checks around GT_CNS_MSK and GT_CNS_VEC much simpler since there is a guaranteed ID for them, they just won't be encountered. If we did put it under an ifdef, then all the places that handle GT_CNS_VEC including places like OperIsConst would need to start being ifdef'd as well.

It might be worth doing this long term, but I'd rather investigate such cleanup in a separate PR.

kunalspathak · 2024-07-15T21:27:27Z

src/coreclr/jit/gentree.cpp

@@ -30407,6 +30598,8 @@ GenTree* Compiler::gtFoldExprHWIntrinsic(GenTreeHWIntrinsic* tree)
                    uint32_t result = BitOperations::LeadingZeroCount(static_cast<uint64_t>(value));

                    cnsNode->AsIntConCommon()->SetIconValue(static_cast<int32_t>(result));
+                    cnsNode->gtType = retType;


why is this change?

Without it the type of the node is incorrect. Arm64 expects a TYP_INT result but takes a TYP_LONG input.

So was it broken until now? wondering why any test didn't catch this earlier.

It's not been showing up as broken because of how 32-bit results work on most platforms. That is, the 32-bit and 64-bit registers are the same and setting the lower 32-bits zeros the upper 32-bits. So even with the node mistyped, the codegen and consumption in most scenarios was correct.

This got caught because of the new assert I added that validated we didn't accidentally change the return type as part of adding the TYP_MASK folding (since that could take in SIMD and need a MASK result)

So it was a bug in that the IR was technically wrong, but because of how implicit conversions and the general operation actually worked things generally just worked as expected.

kunalspathak · 2024-07-15T21:32:24Z

src/coreclr/jit/codegenxarch.cpp

+            GenTreeMskCon* mskCon = tree->AsMskCon();
+            genSetRegToConst(mskCon->GetRegNum(), targetType, &mskCon->gtSimdMaskVal);
+#else
+            unreached();


at multiple places, for GT_CNS_MSK we have

#if defined(FEATURE_MASKED_HW_INTRINSICS) // logic #else unreached(); #endif

wondering if there is a single place where we can add this check (for example, during the creation of such node) and then assume that we will never have GT_CNS_MSK when FEATURE_MASKED_HW_INTRINSICS is not enabled? I think this goes with my earlier comment if we can just have the definition of GT_CNS_MSK under #ifdef FEATURE_MASKED_HW_INTRINSICS

Yes, that may be better overall. I'll investigate it in a follow up PR as indicated above.

kunalspathak · 2024-07-15T21:37:27Z

src/coreclr/jit/simd.h

+
+        case 2:
+        {
+            bitMask = 0x5555555555555555;


does any of this will have to change when we add "variable length vector" support?

When TYP_SIMD is added we'll likely need to consider how we plan on supporting the up to 2048-bit vectors and up to 256-bit masks.

My guess is that we'll probably just say that we don't support over 512-bit SVE for simplicity, at least until physical hardware with 1024 or 2048 bits exists. When such hardware does exist, then we'd need to expand this to handle the additional bits.

kunalspathak

looks good overall. Added some questions.

tannergooding · 2024-07-15T23:17:29Z

jitstress failures are pre-existing, will take a look at them in a followup PR

amanasifkhalid · 2024-07-16T21:28:27Z

src/coreclr/jit/gentree.cpp

+#if defined(TARGET_XARCH)
+        tryHandle = op->OperIsHWIntrinsic();
+#elif defined(TARGET_ARM64)
+        if (op->OperIsHWIntrinsic() && op->OperIsHWIntrinsic(NI_Sve_CreateTrueMaskAll))


Should this condition be checking op2->OperIsHWIntrinsic() instead of op->OperIsHWIntrinsic? Right now, it is possible for us to try to use op2 as a GT_HWINTRINSIC node when it isn't one. The SVE tests are currently failing for me with the assertion OperIs(GT_HWINTRINSIC), because of the line GenTreeHWIntrinsic* cvtOp = op->AsHWIntrinsic(); below.

OperIsHWIntrinsic is always safe, it simply checks if the node type is GT_HWINTRINSIC

It looks like there might be a missing nested check that op2 is itself a hwintrinsic after we’ve confirmed the root op is createtruemaskall

Really we should be normalizing CreateTrueMaskAll as CNS_MSK instead and simply not having it as part of CvtVectorToMask (it’s an implementation detail of codegen that has no real impact on HIR), but that was a more in depth change and so wasn’t done in this pr

fixed in #104998

tannergooding added 7 commits July 13, 2024 11:50

Allow pushing constants to the right for simd comparisons

fb650ef

Ensure we fix the type when folding Arm64 lzcnt

dcdbd96

More consistently use FEATURE_MASKED_HW_INTRINSICS

92d0ea3

Remove IsVectorConst in favor of just using IsCnsVec

758f10e

Add a dedicated node type for mask constants

18121b7

Adding basic support for folding mask nodes

6bf0130

Various small fixes

9cbc8ba

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 14, 2024

dotnet-policy-service bot assigned tannergooding Jul 14, 2024

Apply formatting patch

ef949f3

This was referenced Jul 14, 2024

System.IO.Net5Compat.Tests and System.IO.Tests suddenly exiting with error 137 #100558

Open

SIGKILL (OOM?) while running LibraryImportGenerator.Tests w/o actionable log messages or artifacts dotnet/dnceng#2496

Open

tannergooding added 3 commits July 15, 2024 08:06

Ensure we pass down the right types

aac7066

Ensure we use the right shift amount

bedab55

Merge remote-tracking branch 'dotnet/main' into simd-reverseops

9fbecc3

tannergooding marked this pull request as ready for review July 15, 2024 19:27

kunalspathak reviewed Jul 15, 2024

View reviewed changes

kunalspathak approved these changes Jul 15, 2024

View reviewed changes

tannergooding merged commit 31733b9 into dotnet:main Jul 15, 2024
119 of 124 checks passed

tannergooding deleted the simd-reverseops branch July 15, 2024 23:17

tannergooding mentioned this pull request Jul 16, 2024

Move GenTreeVecCon and GenTreeMskCon under the respective FEATURE_* defines #104932

Open

amanasifkhalid reviewed Jul 16, 2024

View reviewed changes

kunalspathak mentioned this pull request Jul 16, 2024

Arm64/Sve: Fix a SVE issue and add CI leg for testing SVE with AltJit #104998

Merged

jakobbotsch mentioned this pull request Aug 2, 2024

JIT: Assertion failed '!varTypeIsMask(tree)' during 'Linear scan register alloc' #105623

Closed

github-actions bot locked and limited conversation to collaborators Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for folding core SIMD operations that produce TYP_MASK on newer hardware #104875

Add support for folding core SIMD operations that produce TYP_MASK on newer hardware #104875

tannergooding commented Jul 14, 2024 •

edited

Loading

tannergooding commented Jul 15, 2024

azure-pipelines bot commented Jul 15, 2024

kunalspathak Jul 15, 2024

tannergooding Jul 15, 2024

kunalspathak Jul 15, 2024

tannergooding Jul 15, 2024

kunalspathak Jul 15, 2024

tannergooding Jul 15, 2024

tannergooding Jul 15, 2024

kunalspathak Jul 15, 2024

tannergooding Jul 15, 2024

kunalspathak Jul 15, 2024

tannergooding Jul 15, 2024

kunalspathak left a comment

tannergooding commented Jul 15, 2024

amanasifkhalid Jul 16, 2024

tannergooding Jul 16, 2024

kunalspathak Jul 16, 2024

Add support for folding core SIMD operations that produce TYP_MASK on newer hardware #104875

Add support for folding core SIMD operations that produce TYP_MASK on newer hardware #104875

Conversation

tannergooding commented Jul 14, 2024 • edited Loading

tannergooding commented Jul 15, 2024

azure-pipelines bot commented Jul 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunalspathak left a comment

Choose a reason for hiding this comment

tannergooding commented Jul 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Jul 14, 2024 •

edited

Loading