Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for folding core SIMD operations that produce TYP_MASK on newer hardware #104875

Merged
merged 11 commits into from
Jul 15, 2024

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Jul 14, 2024

As per the title this gives us the ability to fold nodes that produce TYP_MASK giving it parity with operations that exist for NEON or AVX2.

To achieve this we introduce a new GenTreeMskCon type that can track mask constants. It was undesirable to reuse GenTreeVecCon as there is quite a difference of what needs to be considered and how bits are interpretered for a TYP_MASK vs a TYP_SIMD.

As part of this, it also allows us to push lower cost nodes to the right of SIMD comparisons which can improve codegen in some cases.

There is a TP improvement for Linux x64/Arm64 (Clang) and Windows x86 (MSVC) and a TP regression for Windows and Linux x64 (MSVC). There is no change to TP for Windows/Linux Arm64. This regression is up to +0.14 (Windows x64 MSVC) and +0.34% (Linux x64 MSVC) but is primarily "pay for play" in that the methods it impacts are all using SIMD mask nodes and so hit the new processing logic. Methods which aren't using SIMD mask nodes (such as pre-AVX512 machines or Arm64 machines) aren't incurring any penalty here.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 14, 2024
@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86

@tannergooding tannergooding marked this pull request as ready for review July 15, 2024 19:27
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -7369,6 +7345,67 @@ struct GenTreeVecCon : public GenTree
#endif
};

// GenTreeMskCon -- mask constant (GT_CNS_MSK)
//
struct GenTreeMskCon : public GenTree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be under FEATURE_MASKED_HW_INTRINSICS too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is matching how GenTreeVecCon exists, which is likewise not under FEATURE_SIMD.

AFAIR this is because it makes checks around GT_CNS_MSK and GT_CNS_VEC much simpler since there is a guaranteed ID for them, they just won't be encountered. If we did put it under an ifdef, then all the places that handle GT_CNS_VEC including places like OperIsConst would need to start being ifdef'd as well.

It might be worth doing this long term, but I'd rather investigate such cleanup in a separate PR.

@@ -30407,6 +30598,8 @@ GenTree* Compiler::gtFoldExprHWIntrinsic(GenTreeHWIntrinsic* tree)
uint32_t result = BitOperations::LeadingZeroCount(static_cast<uint64_t>(value));

cnsNode->AsIntConCommon()->SetIconValue(static_cast<int32_t>(result));
cnsNode->gtType = retType;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without it the type of the node is incorrect. Arm64 expects a TYP_INT result but takes a TYP_LONG input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So was it broken until now? wondering why any test didn't catch this earlier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not been showing up as broken because of how 32-bit results work on most platforms. That is, the 32-bit and 64-bit registers are the same and setting the lower 32-bits zeros the upper 32-bits. So even with the node mistyped, the codegen and consumption in most scenarios was correct.

This got caught because of the new assert I added that validated we didn't accidentally change the return type as part of adding the TYP_MASK folding (since that could take in SIMD and need a MASK result)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it was a bug in that the IR was technically wrong, but because of how implicit conversions and the general operation actually worked things generally just worked as expected.

GenTreeMskCon* mskCon = tree->AsMskCon();
genSetRegToConst(mskCon->GetRegNum(), targetType, &mskCon->gtSimdMaskVal);
#else
unreached();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at multiple places, for GT_CNS_MSK we have

#if defined(FEATURE_MASKED_HW_INTRINSICS)
   // logic
#else
   unreached();
#endif

wondering if there is a single place where we can add this check (for example, during the creation of such node) and then assume that we will never have GT_CNS_MSK when FEATURE_MASKED_HW_INTRINSICS is not enabled? I think this goes with my earlier comment if we can just have the definition of GT_CNS_MSK under #ifdef FEATURE_MASKED_HW_INTRINSICS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that may be better overall. I'll investigate it in a follow up PR as indicated above.


case 2:
{
bitMask = 0x5555555555555555;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does any of this will have to change when we add "variable length vector" support?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When TYP_SIMD is added we'll likely need to consider how we plan on supporting the up to 2048-bit vectors and up to 256-bit masks.

My guess is that we'll probably just say that we don't support over 512-bit SVE for simplicity, at least until physical hardware with 1024 or 2048 bits exists. When such hardware does exist, then we'd need to expand this to handle the additional bits.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall. Added some questions.

@tannergooding
Copy link
Member Author

jitstress failures are pre-existing, will take a look at them in a followup PR

@tannergooding tannergooding merged commit 31733b9 into dotnet:main Jul 15, 2024
119 of 124 checks passed
@tannergooding tannergooding deleted the simd-reverseops branch July 15, 2024 23:17
#if defined(TARGET_XARCH)
tryHandle = op->OperIsHWIntrinsic();
#elif defined(TARGET_ARM64)
if (op->OperIsHWIntrinsic() && op->OperIsHWIntrinsic(NI_Sve_CreateTrueMaskAll))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this condition be checking op2->OperIsHWIntrinsic() instead of op->OperIsHWIntrinsic? Right now, it is possible for us to try to use op2 as a GT_HWINTRINSIC node when it isn't one. The SVE tests are currently failing for me with the assertion OperIs(GT_HWINTRINSIC), because of the line GenTreeHWIntrinsic* cvtOp = op->AsHWIntrinsic(); below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OperIsHWIntrinsic is always safe, it simply checks if the node type is GT_HWINTRINSIC

It looks like there might be a missing nested check that op2 is itself a hwintrinsic after we’ve confirmed the root op is createtruemaskall

Really we should be normalizing CreateTrueMaskAll as CNS_MSK instead and simply not having it as part of CvtVectorToMask (it’s an implementation detail of codegen that has no real impact on HIR), but that was a more in depth change and so wasn’t done in this pr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in #104998

@github-actions github-actions bot locked and limited conversation to collaborators Aug 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants