Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable JitDasmWithAlignmentBoundaries and JitDasmWithAddress in Release #82666

Merged
merged 4 commits into from
Mar 4, 2023

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 25, 2023

This PR enables DOTNET_JitDiffableDasm and DOTNET_JitDasmWithAlignmentBoundaries for Release. As they're now public, they're also renamed for better name consistency:

JitDiffableDasm is useful to compare codegens.
JitDasmWithAlignmentBoundaries is useful to debug loop alignment and JCC-erratum issues

-DOTNET_JitDiffableDasm
+DOTNET_JitDisasmDiffable

-DOTNET_JitDasmWithAlignmentBoundaries
+DOTNET_JitDisasmWithAlignmentBoundaries

Default JitDisasm

; Assembly listing for method HexConverter:EncodeToUtf16_Vector128
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 0 inlinees with PGO data; 7 single block inlinees; 4 inlinees without PGO data

G_M000_IG01:                ;; offset=0000H
       C5F877               vzeroupper

G_M000_IG02:                ;; offset=0003H
       488B01               mov      rax, bword ptr [rcx]
       8B4908               mov      ecx, dword ptr [rcx+08H]
       488B12               mov      rdx, bword ptr [rdx]
       4585C0               test     r8d, r8d
       740A                 je       SHORT G_M000_IG04

G_M000_IG03:                ;; offset=0011H
       C5F8100557000000     vmovups  xmm0, xmmword ptr [reloc @RWD00]
       EB08                 jmp      SHORT G_M000_IG05
                            align    [0 bytes for IG06]

G_M000_IG04:                ;; offset=001BH
       C5F810055D000000     vmovups  xmm0, xmmword ptr [reloc @RWD16]

G_M000_IG05:                ;; offset=0023H
       4533C0               xor      r8d, r8d
       8BC9                 mov      ecx, ecx
       4C8D49FC             lea      r9, [rcx-04H]
       C5F8100D5C000000     vmovups  xmm1, xmmword ptr [reloc @RWD32]

G_M000_IG06:                ;; offset=0034H
       C4A1796E1400         vmovd    xmm2, xmmword ptr [rax+r8]
       C5E173D204           vpsrlq   xmm3, xmm2, 4
       C5E160D2             vpunpcklbw xmm2, xmm3, xmm2
       C5E9DBD1             vpand    xmm2, xmm2, xmm1
       C4E27900D2           vpshufb  xmm2, xmm0, xmm2
       C4E27930D2           vpmovzxbw xmm2, xmm2
       C4A178111482         vmovups  xmmword ptr [rdx+4*r8], xmm2
       4983C004             add      r8, 4
       4C3BC1               cmp      r8, rcx
       740A                 je       SHORT G_M000_IG09

G_M000_IG07:                ;; offset=0060H
       4D3BC1               cmp      r8, r9
       76CF                 jbe      SHORT G_M000_IG06

G_M000_IG08:                ;; offset=0065H
       4D8BC1               mov      r8, r9
       EBCA                 jmp      SHORT G_M000_IG06

G_M000_IG09:                ;; offset=006AH
       C3                   ret

RWD00   dq      3736353433323130h, 6665646362613938h
RWD16   dq      3736353433323130h, 4645444342413938h
RWD32   dq      0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh

; Total bytes of code 107

DOTNET_JitDisasmDiffable=1

; Assembly listing for method HexConverter:EncodeToUtf16_Vector128
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 0 inlinees with PGO data; 7 single block inlinees; 4 inlinees without PGO data

G_M000_IG01:
       vzeroupper

G_M000_IG02:
       mov      rax, bword ptr [rcx]
       mov      ecx, dword ptr [rcx+08H]
       mov      rdx, bword ptr [rdx]
       test     r8d, r8d
       je       SHORT G_M000_IG04

G_M000_IG03:
       vmovups  xmm0, xmmword ptr [reloc @RWD00]
       jmp      SHORT G_M000_IG05
       align    [0 bytes for IG06]

G_M000_IG04:
       vmovups  xmm0, xmmword ptr [reloc @RWD16]

G_M000_IG05:
       xor      r8d, r8d
       mov      ecx, ecx
       lea      r9, [rcx-04H]
       vmovups  xmm1, xmmword ptr [reloc @RWD32]

G_M000_IG06:
       vmovd    xmm2, xmmword ptr [rax+r8]
       vpsrlq   xmm3, xmm2, 4
       vpunpcklbw xmm2, xmm3, xmm2
       vpand    xmm2, xmm2, xmm1
       vpshufb  xmm2, xmm0, xmm2
       vpmovzxbw xmm2, xmm2
       vmovups  xmmword ptr [rdx+4*r8], xmm2
       add      r8, 4
       cmp      r8, rcx
       je       SHORT G_M000_IG09

G_M000_IG07:
       cmp      r8, r9
       jbe      SHORT G_M000_IG06

G_M000_IG08:
       mov      r8, r9
       jmp      SHORT G_M000_IG06

G_M000_IG09:
       ret

RWD00   dq      3736353433323130h, 6665646362613938h
RWD16   dq      3736353433323130h, 4645444342413938h
RWD32   dq      0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh

; Total bytes of code 107

DOTNET_JitDisasmWithAlignmentBoundaries=1

; Assembly listing for method HexConverter:EncodeToUtf16_Vector128
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 0 inlinees with PGO data; 7 single block inlinees; 4 inlinees without PGO data

G_M000_IG01:                ;; offset=0000H
       C5F877               vzeroupper

G_M000_IG02:                ;; offset=0003H
       488B01               mov      rax, bword ptr [rcx]
       8B4908               mov      ecx, dword ptr [rcx+08H]
       488B12               mov      rdx, bword ptr [rdx]
       4585C0               test     r8d, r8d
       740A                 je       SHORT G_M000_IG04

G_M000_IG03:                ;; offset=0011H
       C5F8100557000000     vmovups  xmm0, xmmword ptr [reloc @RWD00]
       EB08                 jmp      SHORT G_M000_IG05
                            align    [0 bytes for IG06]

G_M000_IG04:                ;; offset=001BH
       C5F810055D000000     vmovups  xmm0, xmmword ptr [reloc @RWD16]
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (vmovups: 3) 32B boundary ...............................

G_M000_IG05:                ;; offset=0023H
       4533C0               xor      r8d, r8d
       8BC9                 mov      ecx, ecx
       4C8D49FC             lea      r9, [rcx-04H]
       C5F8100D5C000000     vmovups  xmm1, xmmword ptr [reloc @RWD32]

G_M000_IG06:                ;; offset=0034H
       C4A1796E1400         vmovd    xmm2, xmmword ptr [rax+r8]
       C5E173D204           vpsrlq   xmm3, xmm2, 4
       C5E160D2             vpunpcklbw xmm2, xmm3, xmm2
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (vpunpcklbw: 3) 32B boundary ...............................
       C5E9DBD1             vpand    xmm2, xmm2, xmm1
       C4E27900D2           vpshufb  xmm2, xmm0, xmm2
       C4E27930D2           vpmovzxbw xmm2, xmm2
       C4A178111482         vmovups  xmmword ptr [rdx+4*r8], xmm2
       4983C004             add      r8, 4
       4C3BC1               cmp      r8, rcx
       740A                 je       SHORT G_M000_IG09
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (je: 0 ; jcc erratum) 32B boundary ...............................

G_M000_IG07:                ;; offset=0060H
       4D3BC1               cmp      r8, r9
       76CF                 jbe      SHORT G_M000_IG06

G_M000_IG08:                ;; offset=0065H
       4D8BC1               mov      r8, r9
       EBCA                 jmp      SHORT G_M000_IG06

G_M000_IG09:                ;; offset=006AH
       C3                   ret

RWD00   dq      3736353433323130h, 6665646362613938h
RWD16   dq      3736353433323130h, 4645444342413938h
RWD32   dq      0F0F0F0F0F0F0F0Fh, 0F0F0F0F0F0F0F0Fh

; Total bytes of code 107

@ghost ghost assigned EgorBo Feb 25, 2023
@EgorBo
Copy link
Member Author

EgorBo commented Feb 25, 2023

@kunalspathak @dotnet/jit-contrib PTAL

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!
How much would it take to also enable JitDiffableDasm=1? (not necessarily in this PR)

@jakobbotsch
Copy link
Member

I'm wondering if we should rename these variables at the same time to have consistent naming conventions across the board -- i.e. JitDisasmWithAlignmentBoundaries, JitDisasmWithAddress.

@EgorBo
Copy link
Member Author

EgorBo commented Feb 25, 2023

How much would it take to also enable JitDiffableDasm=1? (not necessarily in this PR)

Done!

I'm wondering if we should rename these variables at the same time to have consistent naming conventions across the board -- i.e. JitDisasmWithAlignmentBoundaries, JitDisasmWithAddress.

Great idea! Any concerns, @dotnet/jit-contrib? I hope it doesn't break your internal tooling. We can maintain old names too in that case.

DOTNET_JitDisasm

-DOTNET_JitDiffableDasm
+DOTNET_JitDisasmDiffable

-DOTNET_JitDasmWithAlignmentBoundaries
+DOTNET_JitDisasmWithAlignmentBoundaries

The idea that all JitDisasm related configs now easier to find

@jakobbotsch
Copy link
Member

Any concerns, @dotnet/jit-contrib?

jitutils will need a few updates too.

@EgorBo
Copy link
Member Author

EgorBo commented Feb 25, 2023

Looks like FILECHECK is also unhappy

@EgorBo EgorBo force-pushed the enable-jitdisasm-loop-bound-release branch from 43edcae to e01be9c Compare February 25, 2023 16:22
@kunalspathak
Copy link
Member

We can maintain old names too in that case.

I would definitely not recommend maintaining 2 variables. If we agree with new name, let's just get rid of old one.

Seems 0.1% TP regression. Do you remember if the regression was similar when you enabled other flags in Release? Are we ok with this?

image

@kunalspathak
Copy link
Member

Changes looks fine to me. Just make sure to update jitutils as @jakobbotsch said before merging this.

@EgorBo
Copy link
Member Author

EgorBo commented Mar 3, 2023

I would definitely not recommend maintaining 2 variables.

It has to be done to update - e.g. currently FILECHECK uses old name to run tests, I can't update it in parallel. So I'm going to merge this as is, then update tools, propagate them all the way down to this repo and then remove old env var names.

Seems 0.1% TP regression. Do you remember if the regression was similar when you enabled other flags in Release?

Yeah it had an impact but we didn't notice it in any of our TP-related benchmarks (like crossgen, time-to-first request, etc). So I'm going to merge this as is and then land a quick TP improvement I noticed that is sort of related

@jakobbotsch
Copy link
Member

I think it should be easy to get rid of the TP regression by moving the checks under the existing release disAsm check.

@EgorBo EgorBo force-pushed the enable-jitdisasm-loop-bound-release branch from 61dfbcd to 5452a2c Compare March 4, 2023 00:21
@EgorBo
Copy link
Member Author

EgorBo commented Mar 4, 2023

Merging, TP regressions are mitigated by Jakob's suggestion: https://dev.azure.com/dnceng-public/public/_build/results?buildId=193118&view=ms.vss-build-web.run-extensions-tab

@EgorBo EgorBo merged commit f702d56 into dotnet:main Mar 4, 2023
@EgorBo EgorBo deleted the enable-jitdisasm-loop-bound-release branch March 4, 2023 09:40
@ghost ghost locked as resolved and limited conversation to collaborators Apr 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants