Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tail. call got stack overflow error !? #40581

Closed
RevensofT opened this issue Aug 8, 2020 · 23 comments
Closed

tail. call got stack overflow error !? #40581

RevensofT opened this issue Aug 8, 2020 · 23 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@RevensofT
Copy link

RevensofT commented Aug 8, 2020

Description

I try to convert while loop code to to tail call recursion function but I got a stack overflow error when use tail. call while jmp has no problem.

Runtime framework version

.Net Core 3.1

Base code

    Sub double_to_fraction()
        Dim Base = System.Math.PI
        Dim N1, N2 As Integer
        Dim Tmp = 1.0

        While Tmp <> Base
            If Tmp < Base Then
                N1 += 1
            Else
                N2 += 1
                N1 = System.Math.Round(N2 * Base)
            End If
            Tmp = N1 / N2
        End While

        Console.WriteLine(N1 & "/" & N2)
    End Sub

Recursion form

    Sub double_to_fraction2()
        Dim Base = System.Math.PI
        Dim N = (T:=1, B:=1, N:=1.0).
                       do(Base,
                           Function(V, S) V.N = S,
                           Function(V, S)
                               If V.N < S Then
                                   V.T += 1
                               Else
                                   V.B += 1
                                   V.T = System.Math.Round(V.B * S)
                               End If
                               V.N = V.T / V.B
                               Return V
                           End Function)

        Console.WriteLine(N.T & "/" & N.B)
    End Sub

Tail call recursion method (do method) [Error : Stack overflow]

//===========================
// la.2.0.1 used f: la ; : la.3.0.1 used.1 la.1.2.3 re use-me
//===========================
//  "using data" isn't appear in code but I use it for shorten code on this post.
using data = ValueTuple`3[int,int,double];

data do(data, double, Func`3[data,double,bool], Func`3[data,double,data])
{
ldarg 2
ldarg 0
ldarg 1
call bool Invoke(data, double)
brfalse 0
ldarg 0
ret
0:
ldarg 3
ldarg 0
ldarg 1
call data Invoke(data, double)
ldarg 1
ldarg 2
ldarg 3
tail.call data  do(data, double, Func`3[data,double,bool], Func`3[data,double,data])
ret
}

Jump op recursion (do method) [Worked]

//===========================
// la.2.0.1 used t: la.3.0.1 used.1 sa jmp-me : la
//===========================
//  "using data" isn't appear in code but I use it for shorten code on this post.
using data = ValueTuple`3[int,int,double];

data do(data, double, Func`3[data,double,bool], Func`3[data,double,data])
{
ldarg 2
ldarg 0
ldarg 1
call Boolean Invoke(data, Double)
brtrue 0
ldarg 3
ldarg 0
ldarg 1
call data Invoke(data, Double)
starg 0
jmp data do(data, Double, Func`3[data,double,bool], Func`3[data,double,data])
0:
ldarg 0
ret
}

Comment

Tail op is supposed to fixed stack overflow from recursion method not cause it so I believe this should be a bug.

Addition info

I write do method via System.Reflection.Emit.DynamicMethod, if nothing wrong in runtime framework, it might be cause from ILGenerator.

Reproduce error project

This is repo of vs solution of tail call stack over flow error, for source code of ILS.dll view at IL Script.

tail call stack overflow.zip

category:correctness
theme:tail-call
skill-level:expert
cost:large

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Aug 8, 2020
@danmoseley danmoseley added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 9, 2020
@AndyAyersMS
Copy link
Member

There are various limitations handling tail calls with structs that depend on architecture and ABI. We have been working to relax these over time.

If this was runs on windows x64, I suspect it might have been fixed in 5.0 with #33004. Can you give 5.0 a try?

@AndyAyersMS
Copy link
Member

cc @dotnet/jit-contrib

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Aug 10, 2020
@JulieLeeMSFT JulieLeeMSFT added this to the 5.0.0 milestone Aug 10, 2020
@jakobbotsch
Copy link
Member

Regardless of the platform this should also be fixed in .NET 5.0 by #341.

@AndyAyersMS
Copy link
Member

I'll take a look. @RevensofT any chance you can point me at a more complete repro case?

@AndyAyersMS AndyAyersMS self-assigned this Aug 11, 2020
@AndyAyersMS
Copy link
Member

I was able to construct what I think is a faithful repro. This appears to be fixed in 5.0.

C:\bugs\r40581>dotnet run -c Release -f net5.0
245850922/78256779

Logical depth of recursion is the sum of the numerator and denominator above, so around 100 million calls.

Disassembly for the do method on x64 windows is:

; Assembly listing for method Helper:Do(System.ValueTuple`3[Int32,Int32,Double],double,System.Func`3[ValueTuple`3,Double,Boolean],System.Func`3[ValueTuple`3,Double,ValueTuple`3]):System.ValueTuple`3[Int32,Int32,Double]
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 RetBuf       [V00,T01] (  6,  4   )   byref  ->  rdi
;  V01 arg0         [V01,T00] (  5,  8   )   byref  ->  rsi
;  V02 arg1         [V02,T09] (  5,  4   )  double  ->  [rsp+0x90]
;  V03 arg2         [V03,T03] (  4,  3.50)     ref  ->  rbx         class-hnd
;  V04 arg3         [V04,T08] (  2,  1   )     ref  ->  [rsp+0xA0]   class-hnd
;  V05 OutArgs      [V05    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;  V06 tmp1         [V06    ] (  2,  2   )  struct (16) [rsp+0x48]   do-not-enreg[XSB] addr-exposed "struct address for call/obj"
;* V07 tmp2         [V07    ] (  0,  0   )  double  ->  zero-ref    V13.Item3(offs=0x00) P-INDEP "field V01.Item3 (fldOffset=0x0)"
;* V08 tmp3         [V08    ] (  0,  0   )     int  ->  zero-ref    V13.Item1(offs=0x08) P-INDEP "field V01.Item1 (fldOffset=0x8)"
;* V09 tmp4         [V09    ] (  0,  0   )     int  ->  zero-ref    V13.Item2(offs=0x0c) P-INDEP "field V01.Item2 (fldOffset=0xc)"
;  V10 tmp5         [V10    ] (  2,  2   )  double  ->  [rsp+0x48]   do-not-enreg[X] addr-exposed V06.Item3(offs=0x00) P-DEP "field V06.Item3 (fldOffset=0x0)"
;  V11 tmp6         [V11    ] (  2,  2   )     int  ->  [rsp+0x50]   do-not-enreg[X] addr-exposed V06.Item1(offs=0x08) P-DEP "field V06.Item1 (fldOffset=0x8)"
;  V12 tmp7         [V12    ] (  2,  2   )     int  ->  [rsp+0x54]   do-not-enreg[X] addr-exposed V06.Item2(offs=0x0c) P-DEP "field V06.Item2 (fldOffset=0xc)"
;* V13 tmp8         [V13    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref"
;  V14 tmp9         [V14    ] (  6,  8   )  struct (16) [rsp+0x38]   do-not-enreg[XSB] addr-exposed "by-value struct argument"
;  V15 tmp10        [V15,T04] (  2,  4   )     ref  ->  rax         "argument with side effect"
;  V16 tmp11        [V16,T06] (  2,  2   )     ref  ->  rax         "argument with side effect"
;  V17 tmp12        [V17,T07] (  2,  2   )    long  ->  rdx         "argument with side effect"
;  V18 tmp13        [V18    ] (  2,  2   )  struct (16) [rsp+0x28]   do-not-enreg[XSB] addr-exposed "substitute local for return buffer"
;  V19 ReturnAddress[V19    ] (  1,  0.50)    long  ->  [rsp+0x78]   do-not-enreg[X] addr-exposed "Return address"
;  V20 rat0         [V20,T02] (  3,  6   )     ref  ->  rax         "delegate invoke call"
;  V21 rat1         [V21,T05] (  3,  3   )     ref  ->  rax         "delegate invoke call"
;
; Lcl frame size = 88

G_M54699_IG01:
       57                   push     rdi
       56                   push     rsi
       55                   push     rbp
       53                   push     rbx
       4883EC58             sub      rsp, 88
       C5F877               vzeroupper
       488BF9               mov      rdi, rcx
       488BF2               mov      rsi, rdx
       498BD9               mov      rbx, r9
                                                ;; bbWeight=1    PerfScore 6.00
G_M54699_IG02:
       488BC3               mov      rax, rbx
       C5FA6F06             vmovdqu  xmm0, xmmword ptr [rsi]
       C5FA7F442438         vmovdqu  xmmword ptr [rsp+38H], xmm0
       488B4808             mov      rcx, gword ptr [rax+8]
       488D542438           lea      rdx, bword ptr [rsp+38H]
       C5FB11942490000000   vmovsd   qword ptr [rsp+90H], xmm2
       FF5018               call     qword ptr [rax+24]System.Func`3[ValueTuple`3,Double,Boolean][System.ValueTuple`3[System.Int32,System.Int32,System.Double],System.Double,System.Boolean]:Invoke(System.ValueTuple`3[Int32,Int32,Double],double):bool:this
       85C0                 test     eax, eax
       7419                 je       SHORT G_M54699_IG05
                                                ;; bbWeight=1    PerfScore 10.50
G_M54699_IG03:
       C5FA6F1E             vmovdqu  xmm3, xmmword ptr [rsi]
       C5FA7F1F             vmovdqu  xmmword ptr [rdi], xmm3
       488BC7               mov      rax, rdi
                                                ;; bbWeight=0.50 PerfScore 1.63
G_M54699_IG04:
       4883C458             add      rsp, 88
       5B                   pop      rbx
       5D                   pop      rbp
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret
                                                ;; bbWeight=0.50 PerfScore 1.63
G_M54699_IG05:
       488BAC24A0000000     mov      rbp, gword ptr [rsp+A0H]
       488BC5               mov      rax, rbp
       488D542448           lea      rdx, bword ptr [rsp+48H]
       C5FA6F1E             vmovdqu  xmm3, xmmword ptr [rsi]
       C5FA7F5C2438         vmovdqu  xmmword ptr [rsp+38H], xmm3
       488B4808             mov      rcx, gword ptr [rax+8]
       4C8D442438           lea      r8, bword ptr [rsp+38H]
       C5FB109C2490000000   vmovsd   xmm3, qword ptr [rsp+90H]
       FF5018               call     qword ptr [rax+24]System.Func`3[ValueTuple`3,Double,ValueTuple`3][System.ValueTuple`3[System.Int32,System.Int32,System.Double],System.Double,System.ValueTuple`3[System.Int32,System.Int32,System.Double]]:Invoke(System.ValueTuple`3[Int32,Int32,Double],double):System.ValueTuple`3[Int32,Int32,Double]:this
       C5F9104C2448         vmovupd  xmm1, xmmword ptr [rsp+48H]
       C5F9114C2438         vmovupd  xmmword ptr [rsp+38H], xmm1
       488D4C2438           lea      rcx, bword ptr [rsp+38H]
       C5FB108C2490000000   vmovsd   xmm1, qword ptr [rsp+90H]
       4C8BC3               mov      r8, rbx
       4C8BCD               mov      r9, rbp
       E8D6F0FFFF           call     ILStubClass:IL_STUB_StoreTailCallArgs(System.ValueTuple`3[Int32,Int32,Double],double,System.Object,System.Object)
       488D4C2478           lea      rcx, bword ptr [rsp+78H]
       4C8D442428           lea      r8, bword ptr [rsp+28H]
       48BAD892D3A6FA7F0000 mov      rdx, 0x7FFAA6D392D8
       E895ADFEFF           call     System.Runtime.CompilerServices.RuntimeHelpers:DispatchTailCalls(long,long,long)
       C5FA6F442428         vmovdqu  xmm0, xmmword ptr [rsp+28H]
       C5FA7F07             vmovdqu  xmmword ptr [rdi], xmm0
       488BC7               mov      rax, rdi
                                                ;; bbWeight=0.50 PerfScore 12.88
G_M54699_IG06:
       4883C458             add      rsp, 88
       5B                   pop      rbx
       5D                   pop      rbp
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret
                                                ;; bbWeight=0.50 PerfScore 1.63

; Total bytes of code 209, prolog size 11, PerfScore 56.45, (MethodHash=759e2a54) for method Helper:Do(System.ValueTuple`3[Int32,Int32,Double],double,System.Func`3[ValueTuple`3,Double,Boolean],System.Func`3[ValueTuple`3,Double,ValueTuple`3]):System.ValueTuple`3[Int32,Int32,Double]
; ============================================================

@jakobbotsch do we expect to see code like this after the call to DispatchTailCalls?

Still chasing down where things fail with 3.1; moving to future as this isn't a 5.0 work item.

@AndyAyersMS AndyAyersMS modified the milestones: 5.0.0, Future Aug 11, 2020
@echesakov
Copy link
Contributor

@AndyAyersMS The code after DispatchTailCalls is what I added in #39815 - it copies the value from a buffer on stack that holds a return value during a call to DispatchTailCalls to the return buffer of Helper:Do (that can point to the GC heap)

@AndyAyersMS
Copy link
Member

Maybe I'm confused, but if DispatchTailCalls returns control back to Do, then I don't see how Do can be properly tail recursive.

The only reachable return in Do should be the first one; the code after the call to DispatchTailCalls should never be executed. So that last bit of code (including epilog) looks like it should be dead code and not needed.

@jakobbotsch
Copy link
Member

Maybe I'm confused, but if DispatchTailCalls returns control back to Do, then I don't see how Do can be properly tail recursive.

The only reachable return in Do should be the first one; the code after the call to DispatchTailCalls should never be executed. So that last bit of code (including epilog) looks like it should be dead code and not needed.

It returns as normal if there is a previous tail call dispatcher on the stack that will handle the call (in which case no call will be done now and the written value will be ignored), or if this is the first call (in which case it the written value is the actual return value). So the control flow will be (relatively) normal.

@AndyAyersMS
Copy link
Member

I don't get an overflow on 3.1 either. Note I'm using ILASM to generate the do method, so perhaps the result doesn't exactly match what @RevensofT 's repro does...

@AndyAyersMS
Copy link
Member

Suspect that since dynamic methods aren't eligible for inlining they're also not eligible to make tail calls, even if they're tail calling themselves. Let me try and confirm this...

@RevensofT
Copy link
Author

RevensofT commented Aug 12, 2020

@AndyAyersMS
This is repo of vs solution of tail call stack over flow error, for source code of ILS.dll view at IL Script.

tail call stack overflow.zip

@AndyAyersMS
Copy link
Member

Thanks @RevensofT -- will report back what I see soon.

@AndyAyersMS
Copy link
Member

@RevensofT I have tried your repro on windows x64 with 3.1 and don't get an overflow.

Were you running into to this on some other OS or ISA?

@RevensofT
Copy link
Author

@AndyAyersMS I run on Windows 10 x64 1909 with .net core 3.1, below is video clip I run this repro and got stack overflow.
Tester.zip 1.77 MB

@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 11, 2020
@BruceForstall
Copy link
Member

@AndyAyersMS Do you intend to keep looking for a repro, or can this be closed?

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Nov 11, 2020

I reran using the zipped repro above. Runs to completion under VS in release with jit optimizations enabled, but I do see a stack overflow in debug

@RevensofT were you running debug versions above? I could not see which config you were launching from your video. If so, this behavior is by design. Debug versions of methods can have larger stack frames.

image

image

@RevensofT
Copy link
Author

@AndyAyersMS thanks for update info, it's run on debug setting, can I enlarge stack frame on debug setting ?

@AndyAyersMS
Copy link
Member

@RevensofT Your best bet is to try and reduce stack consumption by not relying so heavily on recursion.

But if that's not possible, you can change the default stack size for the main thread using editbin as a post-build step (see for example this stack overflow thread.

For worker threads there is a thread constructor that lets you specify the stack size.

@RevensofT
Copy link
Author

Thank you very much @AndyAyersMS , it's help me a lot.

@sandreenko sandreenko assigned jakobbotsch and unassigned jakobbotsch Apr 9, 2021
@ghost ghost added the no-recent-activity label Oct 9, 2021
@ghost
Copy link

ghost commented Oct 9, 2021

This issue has been automatically marked no recent activity because it has been marked as needs more info but has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no recent activity.

Please refer to our contribution guidelines for tips on what information might be required.

@AndyAyersMS
Copy link
Member

@RevensofT anything else you need here? Can we close this one?

@ghost ghost removed the no-recent-activity label Oct 12, 2021
@RevensofT
Copy link
Author

@AndyAyersMS Thanks for asking, I don't have any question about this topic, thanks again for your help.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

8 participants