Add Arm64 encodings for IF_SVE_AA_3A to IF_SVE_HL_3A #95127

a74nh · 2023-11-22T15:44:20Z

Add the first set of 17 SVE groups.

Contributes towards #94549

Based on top of #95105
Code starts from 55731b4

Change-Id: I9d32d52c8a54e4f001a8af5ff556e747087c9149

ghost · 2023-11-22T15:44:32Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Add all the groups IF_SVE_**_3A.

Based on top of #95105
Code starts from

Author:	a74nh
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `community-contribution`
Milestone:	-

a74nh · 2023-11-22T15:46:26Z

IF_SVE_AA_3A, IF_SVE_AB_3A, IF_SVE_AC_3A are done and fully working.
All other groups are WIP.

Output in jitdump:

IN0c89: 003234      fnmsub  d7, d15, d23, d31
IN0c8a: 003238      and     z0.b, p1/m, z0.b, z2.b
IN0c8b: 00323C      bic     z3.h, p4/m, z3.h, z5.h
IN0c8c: 003240      eor     z14.s, p5/m, z14.s, z16.s
IN0c8d: 003244      orr     z29.d, p7/m, z29.d, z31.d
IN0c8e: 003248      add     z5.b, p6/m, z5.b, z7.b
IN0c8f: 00324C      sub     z15.h, p7/m, z15.h, z29.h
IN0c90: 003250      subr    z2.s, p0/m, z2.s, z13.s
IN0c91: 003254      sdiv    z3.s, p2/m, z3.s, z9.s
IN0c92: 003258      sdivr   z31.d, p3/m, z31.d, z29.d
IN0c93: 00325C      udiv    z1.s, p0/m, z1.s, z0.s
IN0c94: 003260      udivr   z13.d, p7/m, z13.d, z15.d

Output in gdb:

(gdb) disassemble 0xffffa9864360+0x3234,0xffffa9864360+0x3264
Dump of assembler code from 0xffffa9867594 to 0xffffa98675c4:
   0x0000ffffa9867594:	fnmsub	d7, d15, d23, d31
   0x0000ffffa9867598:	and	z0.b, p1/m, z0.b, z2.b
   0x0000ffffa986759c:	bic	z3.h, p4/m, z3.h, z5.h
   0x0000ffffa98675a0:	eor	z14.s, p5/m, z14.s, z16.s
   0x0000ffffa98675a4:	orr	z29.d, p7/m, z29.d, z31.d
   0x0000ffffa98675a8:	add	z5.b, p6/m, z5.b, z7.b
   0x0000ffffa98675ac:	sub	z15.h, p7/m, z15.h, z29.h
   0x0000ffffa98675b0:	subr	z2.s, p0/m, z2.s, z13.s
   0x0000ffffa98675b4:	sdiv	z3.s, p2/m, z3.s, z9.s
   0x0000ffffa98675b8:	sdivr	z31.d, p3/m, z31.d, z29.d
   0x0000ffffa98675bc:	udiv	z1.s, p0/m, z1.s, z0.s
   0x0000ffffa98675c0:	udivr	z13.d, p7/m, z13.d, z15.d

a74nh · 2023-11-22T15:48:21Z

@kunalspathak @BruceForstall

kunalspathak · 2023-11-22T15:51:16Z

cc: @TIHan @amanasifkhalid

src/coreclr/jit/emitarm64.cpp

src/coreclr/jit/emitarm64.h

src/coreclr/jit/emitarm64.cpp

Picked up the change in dotnet#95127 (comment)

kunalspathak · 2023-11-22T19:18:42Z

src/coreclr/jit/emit.h

@@ -505,6 +505,9 @@ class emitter
        OPSZ32     = 5,
        OPSZ64     = 6,
        OPSZ_COUNT = 7,
+#elif defined(TARGET_ARM64)
+        OPSZ_SCALABLE = 5,


I don't think we are using it anywhere. can we revert this change?

This is used indirectly. When setting idOpSize, the emitattr is converted into opSize:

void idOpSize(emitAttr opsz) { _idOpSize = emitEncodeSize(opsz); } /* static */ inline emitter::opSize emitter::emitEncodeSize(emitAttr size) { assert((size != EA_UNKNOWN) && ((size & EA_SIZE_MASK) == size)); return static_cast<emitter::opSize>(genLog2(size)); } /* static */ inline emitAttr emitter::emitDecodeSize(emitter::opSize ensz) { assert(static_cast<unsigned>(ensz) < OPSZ_COUNT); return emitSizeDecode[ensz]; }

Without the changes, a value of EA_SCALABLE becomes OPSZ_COUNT, and then asserts inside emitDecodeSize()

kunalspathak · 2023-11-22T19:20:03Z

src/coreclr/jit/emitarm64.cpp

+        case IF_SVE_HJ_3A:   // ........xx...... ...gggmmmmmddddd -- SVE floating-point serial reduction (predicated)
+        case IF_SVE_HL_3A:   // ........xx...... ...gggmmmmmddddd -- SVE floating-point arithmetic (predicated)
+            emitDispSveReg(id->idReg1(), id->idInsOpt(), true); // ddddd
+            emitDispPredicateReg(id->idReg2(), true, true); // ggg


I am actually wondering if the merge == true is correct for all the formats it shares?

Here is one example I found, but there are just handful of them, so perhaps for them, we can split the case.

case IF_SVE_AR_4A: // ........xx.mmmmm ...gggnnnnnddddd -- SVE integer multiply-accumulate writing addend (predicated) case IF_SVE_GI_4A: // ........xx.mmmmm ...gggnnnnnddddd -- SVE2 histogram generation (vector) case IF_SVE_HU_4A: // ........xx.mmmmm ...gggnnnnnddddd -- SVE floating-point multiply-accumulate writing addend emitDispSveReg(id->idReg10(), id->idInsOpt(), true); // ddddd emitDispSveReg(id->idReg20(), id->idInsOpt(), true); // nnnnn emitDispPredicateReg(id->idReg30(), id->idInsOpt(), true); // ggg emitDispSveReg(id->idReg40(), id->idInsOpt(), true); // mmmmm

Yes, we should take it group by group. Hardcoding with different cases should be ok for most things. But, will need a little more thinking when we get to the instructions that can take /M or /Z

src/coreclr/jit/targetarm64.h

src/coreclr/jit/emitarm64.cpp

kunalspathak · 2023-11-22T20:27:54Z

IMO, I like this change to use a separate name, so debugging is easier (and make it const, too)

Sure. I have put it in #95105 because that will get merge sooner, unblocking others.

* Add insEncodeReg* methods * Adjust the `ins` for sve instruction Picked up the change in #95127 (comment) * Update the comments * Use sve_ins_offset

kunalspathak

Very good progress.

cstool output (using #94896 changes):

It seems the tool needs a fixup for formatting the output aligned. @TIHan .

Also, since you did the encoding comparison using both gdb and cstool. I am wondering which option was easier to you?

kunalspathak · 2023-11-24T22:25:54Z

src/coreclr/jit/codegenarm64.cpp

+    theEmitter->emitIns_R_R_R(INS_sve_subr, EA_SCALABLE, REG_V2, REG_P0, REG_V13,
+                              INS_OPTS_SCALABLE_S); // SUBR    <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>
+
+    theEmitter->emitIns_R_R_R(INS_sve_sdiv, EA_SCALABLE, REG_V3, REG_P2, REG_V9,


The tool generated just 1 line per instruction, but we should add some more permutation for registers, emitAttr and instOpts to make sure emitIns_* methods works as expected.

For each small group I've made sure to try to go across as many permutations as possible. (ie: INS_sve_and, INS_sve_bic, INS_sve_eor, INS_sve_orr are all IF_SVE_AA_3A and go through the same code so will have the same boundaries). The number of instructions here meant I felt there was decent coverage. I also made sure to hit value boundaries where possible (0, 31 etc).

The only instruction I felt I needed to add more permutations on were the EA_4BYTE versions of clasta/clastb.

What I found limiting is we can't test for error cases. Eg: I can't check that I can't generate a B version of INS_sve_sdiv.

Eg: I can't check that I can't generate a B version of INS_sve_sdiv.

Yes, I don't think we have code that verifies that something is not possible to get generated.

kunalspathak · 2023-11-24T22:26:36Z

src/coreclr/jit/emitarm64.cpp

@@ -8048,6 +8114,216 @@ void emitter::emitIns_R_R_R(
            fmt = IF_DV_3A;
            break;

+        case INS_sve_and:


perhaps I should also let the tool generate this code?

That would be useful, even if it's just some isVectorRegister() asserts

BruceForstall · 2023-11-26T00:58:34Z

@a74nh I see one set of discrepancies in your example above:

JitDump:

290 fadda h21, p6/m, h21, z14.h
294 fadda s22, p5/m, s22, z13.s
298 fadda d23, p4/m, d23, z12.d

cstool:

290 fadda h21, p6, h21, z14.h
294 fadda s22, p5, s22, z13.s
298 fadda d23, p4, d23, z12.d

(so, the JIT is generating /m on output when it shouldn't). Looks like this is a case where there is no /m or /z option in the ISA.

src/coreclr/jit/emitarm64.h

BruceForstall · 2023-11-26T01:06:02Z

src/coreclr/jit/codegenarm64.cpp

+// #define ALL_ARM64_EMITTER_UNIT_TESTS_GENERAL
+// #define ALL_ARM64_EMITTER_UNIT_TESTS_ADVSIMD
+// #define ALL_ARM64_EMITTER_UNIT_TESTS_SVE


Thanks for adding that categories; hopefully that helps.

I almost put this in another PR.

Also, the INS_brk test fails to compile because it should be INS_brk_windows and INS_brk_linux.

It be good if we could use the cstool output to automatically have this tested and checked.

a74nh · 2023-11-27T09:05:12Z

@a74nh I see one set of discrepancies in your example above:

JitDump:
290 fadda h21, p6/m, h21, z14.h
294 fadda s22, p5/m, s22, z13.s
298 fadda d23, p4/m, d23, z12.d
cstool:
290 fadda h21, p6, h21, z14.h
294 fadda s22, p5, s22, z13.s
298 fadda d23, p4, d23, z12.d
(so, the JIT is generating /m on output when it shouldn't). Looks like this is a case where there is no /m or /z option in the ISA.

Done.

This is purely a printing issue, the encoding of the instruction is correct. For every instruction I've done so far, there are no bits to indicate the type (/m etc) of the predicate, it's just fixed for each instruction.

a74nh · 2023-11-27T09:40:18Z

This PR has 82 instructions out of the 889 generated by the tool. That's 9.2%. It took me a week to write the code.

kunalspathak · 2023-11-27T14:11:33Z

This PR has 82 instructions out of the 889 generated by the tool. That's 9.2%. It took me a week to write the code.

That's almost 1 hour per instruction. I was hoping it would be faster given the tool generated most of the code, but I guess most of the time taken was also writing some new methods that were needed and stitching everything together. Hoping next batch would be faster and if not, let me know if things need to be added in the tool to reduce the burden. Here are few things that I have:

Generate emitIns* code
Move the comments containing group name from individual unit test to the top (your changes made in 96271ae codegenarm64.cpp )
Use the information in Arm64: Implement SVE encodings #94549 (comment) to generate the latency and throughput information.
Pass PredicateType to the display methods.

src/coreclr/jit/emitarm64.h

kunalspathak

Changes looks great to me. Added some minor comments.

src/coreclr/jit/emitarm64.cpp

kunalspathak · 2023-11-27T15:18:42Z

src/coreclr/jit/emitarm64.cpp

+        case IF_SVE_GR_3A: // ........xx...... ...gggmmmmmddddd -- SVE2 floating-point pairwise operations
+        case IF_SVE_HJ_3A: // ........xx...... ...gggmmmmmddddd -- SVE floating-point serial reduction (predicated)
+        case IF_SVE_HL_3A: // ........xx...... ...gggmmmmmddddd -- SVE floating-point arithmetic (predicated)
+            result.insThroughput = PERFSCORE_THROUGHPUT_1C;


wondering if this should be fixed to the real values of latency and throughput. can you point to a document that has these details?

I didn't find anything on internet except this kind of questions.

This is a tricky one, as the latency and throughput will change depending on the microarchitecture. For example, neon instructions won't be the same on Altra and N2.

For now, maybe we should just pick one (N2 sounds a reasonable choice for SVE).

There is definitely a future task to add other microarchitecture latencies and throughputs into coreclr.

For now, maybe we should just pick one (N2 sounds a reasonable choice for SVE).

If there is a parsable data, I can probably generate it via the tool with right numbers too.

Sadly, only the pdf.

I've manually done it for this PR. It wasn't fun to do. Thankfully, most of your groups are the same as the groups in the guide. But not all.

Not sure how easy it will be to parse as you'll need to turn instructions back into the groups.

Sadly, only the pdf.

Can you share the link for pdf that has SVE instructions?

N2 optimisation guide:
https://developer.arm.com/documentation/PJDOC-466751330-18256/0003

SVE is from 3.26.

Extracted that information and pasted in #94549 (comment). I will see if I can make the tool parse that and generate the right latencies and throughput.

Change-Id: I263ae1c649a66de393bca0c8f5ac2f9c3311fcbe

kunalspathak

LGTM

kunalspathak · 2023-11-28T07:00:33Z

This PR has 82 instructions out of the 889 generated by the tool. That's 9.2%. It took me a week to write the code.

That's almost 1 hour per instruction. I was hoping it would be faster given the tool generated most of the code, but I guess most of the time taken was also writing some new methods that were needed and stitching everything together. Hoping next batch would be faster and if not, let me know if things need to be added in the tool to reduce the burden. Here are few things that I have:

Generate emitIns* code

Move the comments containing group name from individual unit test to the top (your changes made in 96271ae codegenarm64.cpp )

Use the information in Arm64: Implement SVE encodings #94549 (comment) to generate the latency and throughput information.

Pass PredicateType to the display methods.

Fixed all of these and uploaded latest files in #94549.

a74nh added 2 commits November 22, 2023 14:37

Add insEncodeReg* methods

55731b4

Change-Id: I9d32d52c8a54e4f001a8af5ff556e747087c9149

Add Arm64 encodings for 3A groups

90fb3de

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 22, 2023

ghost added the community-contribution Indicates that the PR has been added by a community member label Nov 22, 2023

kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Nov 22, 2023

kunalspathak reviewed Nov 22, 2023

View reviewed changes

src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved

kunalspathak reviewed Nov 22, 2023

View reviewed changes

src/coreclr/jit/emitarm64.h Outdated Show resolved Hide resolved

kunalspathak reviewed Nov 22, 2023

View reviewed changes

src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved

Remove isSveRegister

15ee2fc

kunalspathak mentioned this pull request Nov 22, 2023

JIT ARM64/SVE2: Add increment/decrement from predicate register instructions #94811

Merged

a74nh added 2 commits November 22, 2023 16:36

Split ALL_ARM64_EMITTER_UNIT_TESTS into subsets

b7490a0

AD, AE, AN

473483c

kunalspathak added a commit to kunalspathak/runtime that referenced this pull request Nov 22, 2023

Adjust the ins for sve instruction

ec4b465

Picked up the change in dotnet#95127 (comment)

kunalspathak reviewed Nov 22, 2023

View reviewed changes

BruceForstall reviewed Nov 22, 2023

View reviewed changes

src/coreclr/jit/targetarm64.h Outdated Show resolved Hide resolved

src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request Nov 22, 2023

Tracking issue for CI build timeouts #76454

Closed

kunalspathak added a commit that referenced this pull request Nov 23, 2023

Arm64/SVE: Add insEncodeReg* methods (#95105)

944c4aa

* Add insEncodeReg* methods * Adjust the `ins` for sve instruction Picked up the change in #95127 (comment) * Update the comments * Use sve_ins_offset

build-analysis bot mentioned this pull request Nov 23, 2023

[8.0] [outerloop] jit-format: Format job found errors, please apply the format patch #95159

Closed

a74nh added 6 commits November 23, 2023 10:32

Merge remote-tracking branch 'origin' into SVE_AA_3A_4_github

b4687d4

Remove REG_PREDICATE_LOW_FIRST

c5d82cd

Fix sve_ins_offset

4461ecb

Add IF_SVE_AO_3A, including INS_OPTS_SCALABLE_WIDE_

f93a4f6

Add IF_SVE_CM_3A

d755ab7

Add IF_SVE_CN_3A, including INS_OPTS_SCALABLE_TO_SIMD_

2dcd8e6

a74nh marked this pull request as ready for review November 24, 2023 14:52

a74nh added 2 commits November 24, 2023 15:56

Comment out unit test define

5e9dad8

Add parentheses

889c909

kunalspathak reviewed Nov 24, 2023

View reviewed changes

BruceForstall reviewed Nov 26, 2023

View reviewed changes

a74nh added 2 commits November 27, 2023 09:10

Fix fadda predicate and comment typos

1e3e98e

Fix formatting

f2c714b

This was referenced Nov 27, 2023

DataContractSerializerTests.DCS_MyPersonSurrogate_Stress failing in CI #35066

Open

MSBuild crashing in the build #92290

Open

Better commenting + fix up errors found

96271ae

kunalspathak reviewed Nov 27, 2023

View reviewed changes

src/coreclr/jit/emitarm64.h Show resolved Hide resolved

kunalspathak reviewed Nov 27, 2023

View reviewed changes

src/coreclr/jit/emitarm64.h Outdated Show resolved Hide resolved

kunalspathak reviewed Nov 27, 2023

View reviewed changes

src/coreclr/jit/emitarm64.h Show resolved Hide resolved

kunalspathak reviewed Nov 27, 2023

View reviewed changes

a74nh added 4 commits November 27, 2023 15:32

Better descriptions for insOpts functions

2f763f1

Add latencies

11d38ba

Fix formatting

cffafad

Add emitDispLowPredicateReg

15a5fe8

Change-Id: I263ae1c649a66de393bca0c8f5ac2f9c3311fcbe

kunalspathak approved these changes Nov 27, 2023

View reviewed changes

kunalspathak merged commit a26802a into dotnet:main Nov 27, 2023
129 checks passed

kunalspathak mentioned this pull request Nov 27, 2023

Arm64: Implement SVE encodings #94549

Closed

a74nh deleted the SVE_AA_3A_4_github branch November 28, 2023 09:01

kunalspathak mentioned this pull request Nov 28, 2023

Add Arm64 encodings for IF_SVE_AF_3A to IF_SVE_AQ_3A #95337

Merged

kunalspathak mentioned this pull request Dec 12, 2023

Add Arm64 encodings for IF_SVE_CX_4A group #95918

Merged

github-actions bot locked and limited conversation to collaborators Dec 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Arm64 encodings for IF_SVE_AA_3A to IF_SVE_HL_3A #95127

Add Arm64 encodings for IF_SVE_AA_3A to IF_SVE_HL_3A #95127

a74nh commented Nov 22, 2023 •

edited

Loading

ghost commented Nov 22, 2023

a74nh commented Nov 22, 2023

a74nh commented Nov 22, 2023

kunalspathak commented Nov 22, 2023

kunalspathak Nov 22, 2023

a74nh Nov 23, 2023

kunalspathak Nov 22, 2023

kunalspathak Nov 22, 2023

a74nh Nov 23, 2023

kunalspathak commented Nov 22, 2023

kunalspathak left a comment •

edited

Loading

kunalspathak Nov 24, 2023

a74nh Nov 27, 2023

kunalspathak Nov 27, 2023

kunalspathak Nov 24, 2023

a74nh Nov 27, 2023

BruceForstall commented Nov 26, 2023

BruceForstall Nov 26, 2023

a74nh Nov 27, 2023

a74nh commented Nov 27, 2023

a74nh commented Nov 27, 2023

kunalspathak commented Nov 27, 2023 •

edited

Loading

kunalspathak left a comment

kunalspathak Nov 27, 2023

a74nh Nov 27, 2023

kunalspathak Nov 27, 2023

a74nh Nov 27, 2023

kunalspathak Nov 27, 2023

a74nh Nov 27, 2023

kunalspathak Nov 27, 2023

kunalspathak left a comment

kunalspathak commented Nov 28, 2023

Add Arm64 encodings for IF_SVE_AA_3A to IF_SVE_HL_3A #95127

Add Arm64 encodings for IF_SVE_AA_3A to IF_SVE_HL_3A #95127

Conversation

a74nh commented Nov 22, 2023 • edited Loading

ghost commented Nov 22, 2023

a74nh commented Nov 22, 2023

a74nh commented Nov 22, 2023

kunalspathak commented Nov 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunalspathak commented Nov 22, 2023

kunalspathak left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BruceForstall commented Nov 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a74nh commented Nov 27, 2023

a74nh commented Nov 27, 2023

kunalspathak commented Nov 27, 2023 • edited Loading

kunalspathak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunalspathak left a comment

Choose a reason for hiding this comment

kunalspathak commented Nov 28, 2023

a74nh commented Nov 22, 2023 •

edited

Loading

kunalspathak left a comment •

edited

Loading

kunalspathak commented Nov 27, 2023 •

edited

Loading