Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: allow forward sub of QMARK nodes #76476

Merged
merged 1 commit into from
Oct 1, 2022

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Oct 1, 2022

There is often a single-def single-use temp consuming a QMARK (which in turn often comes from expansion of isinst and friends). This temp assignment tends to stay around and can inhibit redundant branch opts in a number of interesting cases. It may also serve as an attractive nuisance for copy prop.

While it would be ideal to generalize RBO to handle assignment side effects, doing so appears quite challenging, as we would need to rewrite possibly large chunks of the SSA and VN graphs. Forward sub eliminates the temp and so clears the way for the existing RBO to remove more branches.

Addresses aspects of the "side effect" enhancements for RBO (see #48115).

Diffs

There is often a single-def single use temp consuming a QMARK (which in
turn often comes from expansion of `isinst` and friends). This temp
assignment tends to stay around and can inhibit redundant branch opts
in a number of interesting cases. It may also serve as an attractive
nuisance for copy prop.

While it would be ideal to generalize RBO to handle assignment side
effects, doing so appears quite challenging, as we would need to rewrite
possibly large chunks of the SSA and VN graphs. Forward sub eliminates
the temp and so clears the way for the existing RBO to remove more branches.

Addresses aspects of the "side effect" enhancements for RBO (see dotnet#48115).
@ghost ghost assigned AndyAyersMS Oct 1, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 1, 2022
@ghost
Copy link

ghost commented Oct 1, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

There is often a single-def single use temp consuming a QMARK (which in turn often comes from expansion of isinst and friends). This temp assignment tends to stay around and can inhibit redundant branch opts in a number of interesting cases. It may also serve as an attractive nuisance for copy prop.

While it would be ideal to generalize RBO to handle assignment side effects, doing so appears quite challenging, as we would need to rewrite possibly large chunks of the SSA and VN graphs. Forward sub eliminates the temp and so clears the way for the existing RBO to remove more branches.

Addresses aspects of the "side effect" enhancements for RBO (see #48115).

Author: AndyAyersMS
Assignees: AndyAyersMS
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Oct 1, 2022

@EgorBo PTAL
cc @dotnet/jit-contrib

A few largeish regressions; a decent number of improvements. I looked at ConvertTextToNode from benchmarks in some detail. The issue is that we lose a non-null assertion because RBO is pruning away branches but is not pruning away PHIs and/or updating VNs. And this keeps a throw around.

Post RBO we have a graph like the following, where the *'s indicate dead PHI operands. AP (or perhaps RBO) ought to be able to prove that BB09 -> BB10 is impossible because in BB09 we should have V01.2 == V01.1 != 0, but no such luck.

image - 2022-09-30T184645 923

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Oct 1, 2022

This is a typical improvement (and a motivating case: #48115 (comment)).

Note how tmp1 becomes unreferenced in the after.

;;; BEFORE

; Assembly listing for method System.Type:get_IsInterface():bool:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; No matching PGO data
; Final local variable assignments
;
;  V00 this         [V00,T01] (  5,  4   )     ref  ->  rcx         this class-hnd single-def
;  V01 loc0         [V01,T02] (  3,  2.50)     ref  ->  rax         class-hnd single-def
;  V02 OutArgs      [V02    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;  V03 tmp1         [V03,T00] (  5,  6.75)     ref  ->  rax         class-hnd "spilling QMark2"
;
; Lcl frame size = 40

G_M8876_IG01:        ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref, nogc <-- Prolog IG
       sub      rsp, 40
						;; size=4 bbWeight=1    PerfScore 0.25
G_M8876_IG02:        ; gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, byref, isz
       ; gcrRegs +[rcx]
       mov      rax, rcx
       ; gcrRegs +[rax]
       test     rax, rax
       je       SHORT G_M8876_IG05
						;; size=8 bbWeight=1    PerfScore 1.50
G_M8876_IG03:        ; gcrefRegs=00000003 {rax rcx}, byrefRegs=00000000 {}, byref, isz
       mov      rdx, 0xD1FFAB1E      ; hackishClassName
       cmp      qword ptr [rax], rdx
       je       SHORT G_M8876_IG05
						;; size=15 bbWeight=0.25 PerfScore 1.06
G_M8876_IG04:        ; gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, byref
       ; gcrRegs -[rax]
       xor      rax, rax
       ; gcrRegs +[rax]
						;; size=2 bbWeight=0.12 PerfScore 0.03
G_M8876_IG05:        ; gcrefRegs=00000003 {rax rcx}, byrefRegs=00000000 {}, byref, isz
       test     rax, rax
       je       SHORT G_M8876_IG08
						;; size=5 bbWeight=1    PerfScore 1.25
G_M8876_IG06:        ; gcrefRegs=00000001 {rax}, byrefRegs=00000000 {}, byref
       ; gcrRegs -[rcx]
       mov      rcx, rax
       ; gcrRegs +[rcx]
						;; size=3 bbWeight=0.50 PerfScore 0.12
G_M8876_IG07:        ; , epilog, nogc, extend
       add      rsp, 40
       jmp      hackishMethodName
       ; gcr arg pop 0
						;; size=9 bbWeight=0.50 PerfScore 1.12
G_M8876_IG08:        ; gcVars=0000000000000000 {}, gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, gcvars, byref
       ; gcrRegs -[rax]
       mov      rax, qword ptr [rcx]
       mov      rax, qword ptr [rax+70H]
       call     [rax+38H]System.Type:hackishMethodName
       ; gcrRegs -[rcx]
       ; gcr arg pop 0
       test     al, 32
       setne    al
       movzx    rax, al
						;; size=18 bbWeight=0.50 PerfScore 4.25
G_M8876_IG09:        ; , epilog, nogc, extend
       add      rsp, 40
       ret 
						;; size=5 bbWeight=0.50 PerfScore 0.62

; Total bytes of code 69, prolog size 4, PerfScore 17.12, instruction count 21, allocated bytes for code 69 (MethodHash=78d5dd53) for method System.Type:get_IsInterface():bool:this
;;; AFTER

; Assembly listing for method System.Type:get_IsInterface():bool:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; No matching PGO data
; Final local variable assignments
;
;  V00 this         [V00,T00] (  5,  4   )     ref  ->  rcx         this class-hnd single-def
;  V01 loc0         [V01,T01] (  4,  2.75)     ref  ->  rax         class-hnd
;  V02 OutArgs      [V02    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;* V03 tmp1         [V03    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "spilling QMark2"
;
; Lcl frame size = 40

G_M8876_IG01:        ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref, nogc <-- Prolog IG
       sub      rsp, 40
						;; size=4 bbWeight=1    PerfScore 0.25
G_M8876_IG02:        ; gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, byref, isz
       ; gcrRegs +[rcx]
       mov      rax, rcx
       ; gcrRegs +[rax]
       test     rax, rax
       je       SHORT G_M8876_IG04
						;; size=8 bbWeight=1    PerfScore 1.50
G_M8876_IG03:        ; gcrefRegs=00000003 {rax rcx}, byrefRegs=00000000 {}, byref, isz
       mov      rdx, 0xD1FFAB1E      ; hackishClassName
       cmp      qword ptr [rax], rdx
       je       SHORT G_M8876_IG06
						;; size=15 bbWeight=0.25 PerfScore 1.06
G_M8876_IG04:        ; gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, byref
       ; gcrRegs -[rax]
       mov      rax, qword ptr [rcx]
       mov      rax, qword ptr [rax+70H]
       call     [rax+38H]System.Type:hackishMethodName
       ; gcrRegs -[rcx]
       ; gcr arg pop 0
       test     al, 32
       setne    al
       movzx    rax, al
						;; size=18 bbWeight=0.50 PerfScore 4.25
G_M8876_IG05:        ; , epilog, nogc, extend
       add      rsp, 40
       ret      
						;; size=5 bbWeight=0.50 PerfScore 0.62
G_M8876_IG06:        ; gcVars=0000000000000000 {}, gcrefRegs=00000001 {rax}, byrefRegs=00000000 {}, gcvars, byref
       ; gcrRegs +[rax]
       mov      rcx, rax
       ; gcrRegs +[rcx]
						;; size=3 bbWeight=0.50 PerfScore 0.12
G_M8876_IG07:        ; , epilog, nogc, extend
       add      rsp, 40
       jmp      hackishMethodName
       ; gcr arg pop 0
						;; size=9 bbWeight=0.50 PerfScore 1.12

; Total bytes of code 62, prolog size 4, PerfScore 15.14, instruction count 18, allocated bytes for code 62 (MethodHash=78d5dd53) for method System.Type:get_IsInterface():bool:this

@EgorBo
Copy link
Member

EgorBo commented Oct 1, 2022

Does this handle cases handled by #76439? numbers of diffs look similar

@AndyAyersMS
Copy link
Member Author

Does this handle cases handled by #76439? numbers of diffs look similar

Hard to say how much overlap without diffing the two directly but looking at asp.net diff reports there doesn't seem to be many diffs in common. The qmark forward sub is avoiding small types so probably passing on TYP_BOOL qmarks.

@AndyAyersMS AndyAyersMS merged commit 65a1578 into dotnet:main Oct 1, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Oct 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants