Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET 6 C++/CLI runtime generates InvalidProgramException when large stack-allocated variables are used #60852

Closed
lstrudwick opened this issue Oct 26, 2021 · 12 comments · Fixed by #61521 or #61589
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Milestone

Comments

@lstrudwick
Copy link

Description

I am porting some code to .NET 6 and encountered some issues running C++/CLI code. After narrowing it down, it seems that using large stack-allocated arrays (larger than 65536 bytes) will generate a System.InvalidProgramException when the method with the offending variable is called for the first time.

Reproduction Steps

A minimal repro is to put the following code into a CLR Class Library and call it from a C# program:

namespace ClassLibrary1
{
	public ref class Class1
        {
        public:
	        static System::String^ BrokenFunction()
	        {
		        unsigned char block[16 * 1024 * 4 + 1] = { 0 };
		        return gcnew String("unused");
	        }
	        static System::String^ WorkingFunction()
	        {
		        unsigned char block[16 * 1024 * 4 + 0] = { 0 };
		        return gcnew String("unused");
	        }
	        static System::String^ WorkingFunction2()
	        {
		        unsigned char block[16 * 1024 * 4 + 0] = { 0 };
		        unsigned char block2[16 * 1024 * 4 + 0] = { 0 };
		        return gcnew String("unused");
	        }
	        static System::String^ AlsoWorkingFunction()
	        {
		        unsigned char* block = (unsigned char*)alloca(16 * 1024 * 4 + 1);
		        memset(block, 0, 16 * 1024 * 4 + 1);
		        return gcnew String("unused");
	        }
        };
}

As can be seen, it looks like there's a fundamental limit in the compiler-generated types which back the arrays. Note that prior to .NET 6, this appeared to work without issue.

Expected behavior

The program should not crash as long as we don't exceed the thread stack size.

Actual behavior

The program throws an exception as soon as a method with a large local array is called.

Regression?

Yes, this appears to be a .NET 6 specific issue.

Known Workarounds

In my case, as this is a scratch buffer, I could use alloca instead.

Configuration

.NET 6 RC2
Windows 10, x64
VS2022

Haven't tried any other configurations.

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Oct 26, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@danmoseley danmoseley added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 26, 2021
@ghost
Copy link

ghost commented Oct 26, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I am porting some code to .NET 6 and encountered some issues running C++/CLI code. After narrowing it down, it seems that using large stack-allocated arrays (larger than 65536 bytes) will generate a System.InvalidProgramException when the method with the offending variable is called for the first time.

Reproduction Steps

A minimal repro is to put the following code into a CLR Class Library and call it from a C# program:

namespace ClassLibrary1
{
	public ref class Class1
        {
        public:
	        static System::String^ BrokenFunction()
	        {
		        unsigned char block[16 * 1024 * 4 + 1] = { 0 };
		        return gcnew String("unused");
	        }
	        static System::String^ WorkingFunction()
	        {
		        unsigned char block[16 * 1024 * 4 + 0] = { 0 };
		        return gcnew String("unused");
	        }
	        static System::String^ WorkingFunction2()
	        {
		        unsigned char block[16 * 1024 * 4 + 0] = { 0 };
		        unsigned char block2[16 * 1024 * 4 + 0] = { 0 };
		        return gcnew String("unused");
	        }
	        static System::String^ AlsoWorkingFunction()
	        {
		        unsigned char* block = (unsigned char*)alloca(16 * 1024 * 4 + 1);
		        memset(block, 0, 16 * 1024 * 4 + 1);
		        return gcnew String("unused");
	        }
        };
}

As can be seen, it looks like there's a fundamental limit in the compiler-generated types which back the arrays. Note that prior to .NET 6, this appeared to work without issue.

Expected behavior

The program should not crash as long as we don't exceed the thread stack size.

Actual behavior

The program throws an exception as soon as a method with a large local array is called.

Regression?

Yes, this appears to be a .NET 6 specific issue.

Known Workarounds

In my case, as this is a scratch buffer, I could use alloca instead.

Configuration

.NET 6 RC2
Windows 10, x64
VS2022

Haven't tried any other configurations.

Other information

No response

Author: lstrudwick
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@AndyAyersMS
Copy link
Member

I'll take a look.

@AndyAyersMS AndyAyersMS self-assigned this Oct 26, 2021
@AndyAyersMS
Copy link
Member

C++/CLI compiler is creating a value class that is larger than 64K in size, and the resulting IL is hitting this implementation limitation in the jit:

if (offset >= 65536)
{
IMPL_LIMITATION("JIT doesn't support offsets larger than 65535 into valuetypes\n");
}

This limitation is not new so I'm not sure how this worked in prior releases.

@eatplayhate
Copy link

Just guessing based on the function name - those generated classes never had local variables with offsets above that size (I’m fairly sure they didn’t have any member fields, just an explicit layout/size). I assume that whatever has changed is now trying to compute local offsets for something placed at the end of the type.

@AndyAyersMS
Copy link
Member

@lstrudwick can you double-check that the above was working on an earlier release, and if so, which one?

@lstrudwick
Copy link
Author

.NET 5 and .NET Framework 4.7.2 can both run this without issue. I've previously run this on .NET Core 3.1 as well, but wasn't in a position to build for that right this moment.

@lstrudwick
Copy link
Author

All generated types have the following structure in the broken case.

{
  [UnsafeValueType]
  [NativeCppClass]
  [StructLayout(LayoutKind.Sequential, Size = 65537)]
  internal struct $ArrayType$$$BY0BAAAB@E
  {
  }
}

@JulieLeeMSFT JulieLeeMSFT added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Oct 29, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Nov 1, 2021
@AndyAyersMS
Copy link
Member

@lstrudwick sorry for the delay -- I can repro this now.

In 6.0 we add "stack poisoning" for uninitialized locals (see #54685). Unfortunately in this case the poisoning happens via a sequence of discrete 8 byte stores.

Eventually the offsets in these stores get so large that the JIT hits an internal limitation; this surfaces as an invalid program exception.

cc @jakobbotsch

*************** In lvaAssignFrameOffsets(FINAL_FRAME_LAYOUT)
Assign V04 GsCookie, size=8, stkOffs=-0x18
Assign V01 loc1, size=65544, stkOffs=-0x10020
Assign V00 loc0, size=8, stkOffs=-0x10028
Assign V02 tmp0, size=4, stkOffs=-0x1002c
Pad V03 OutArgs, size=32, stkOffs=-0x10030, pad=4
Assign V03 OutArgs, size=32, stkOffs=-0x10050
--- delta bump 8 for RA
--- delta bump 8 for FP
--- delta bump 0 for RBP frame
--- virtual stack offset to actual stack offset delta is 16
-- V00 was -65576, now -65560
-- V01 was -65568, now -65552
-- V02 was -65580, now -65564
-- V03 was -65616, now -65600
-- V04 was -24, now -8
; Final local variable assignments
;
;  V00 loc0         [V00    ] (  1,  1   )     ref  ->  [rbp-10018H]   do-not-enreg[] must-init class-hnd exact ptr
;  V01 loc1         [V01    ] (  1,  1   )  struct (65544) [rbp-10010H]   do-not-enreg[XS] addr-exposed ld-addr-op unsafe-buffer
;  V02 tmp0         [V02    ] (  1,  1   )     int  ->  [rbp-1001CH]   do-not-enreg[V] "GSCookie dummy"
;  V03 OutArgs      [V03    ] (  1,  1   )  lclBlk (32) [rsp+00H]   do-not-enreg[] "OutgoingArgSpace"
;  V04 GsCookie     [V04    ] (  1,  1   )    long  ->  [rbp-08H]   do-not-enreg[X] addr-exposed "GSSecurityCookie"
;
; Lcl frame size = 65600
Mark labels for codegen
  BB01 : first block
  BB04 : branch target
*************** After genMarkLabelsForCodegen()

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [???..???)                                     keep i internal label hascall LIR 
BB02 [0003]  1       BB01                  1       [???..???)-> BB04 ( cond )                     internal LIR 
BB03 [0004]  1       BB02                  0.50    [???..???)                                     internal LIR 
BB04 [0002]  2       BB02,BB03             1       [???..???)                                     keep i internal label hascall LIR 
BB05 [0001]  1       BB04                  1       [000..016)        (return)                     i LIR 
-----------------------------------------------------------------------------------------------------------------------------------------
Setting stack level from -572662307 to 0

=============== Generating BB01 [???..???), preds={} succs={BB02} flags=0x00000002.20010070: keep i internal label hascall LIR 
BB01 IN (0)={} + ByrefExposed + GcHeap
     OUT(0)={} + ByrefExposed + GcHeap

Liveness not changing: 0000000000000000 {}
							Live regs: (unchanged) 00000000 {}
							GC regs: (unchanged) 00000000 {}
							Byref regs: (unchanged) 00000000 {}

      L_M20969_BB01:
Mapped BB01 to G_M20969_IG02
Label: IG02, GCvars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}

Scope info: begin block BB01, IL range [???..???)
Scope info: ignoring block beginning
IN0001:        mov      rax, 0xCDCDCDCDCDCDCDCD
IN0002:        mov      qword ptr [V01 rbp-10010H], rax
IN0003:        mov      qword ptr [V01+0x8 rbp-10008H], rax
IN0004:        mov      qword ptr [V01+0x10 rbp-10000H], rax
IN0005:        mov      qword ptr [V01+0x18 rbp-FFF8H], rax
IN0006:        mov      qword ptr [V01+0x20 rbp-FFF0H], rax
IN0007:        mov      qword ptr [V01+0x28 rbp-FFE8H], rax
...
IN1fff:        mov      qword ptr [V01+0xFFE8 rbp-28H], rax
IN2000:        mov      qword ptr [V01+0xFFF0 rbp-20H], rax
IN2001:        mov      qword ptr [V01+0xFFF8 rbp-18H], rax

For 5.0 and earlier the JIT won't do this poisoning.

; Assembly listing for method ClassLibrary1.Class1:BrokenFunction():System.String
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; debuggable code
; rbp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 loc0         [V00    ] (  1,  1   )     ref  ->  [rbp-0x10018]   must-init class-hnd exact ptr
;  V01 loc1         [V01    ] (  1,  1   )  struct (65544) [rbp-0x10010]   do-not-enreg[XSB] addr-exposed ld-addr-op unsafe-buffer
;  V02 tmp0         [V02    ] (  1,  1   )     int  ->  [rbp-0x1001C]   do-not-enreg[X] addr-exposed "GSCookie dummy"
;  V03 OutArgs      [V03    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;  V04 GsCookie     [V04    ] (  1,  1   )    long  ->  [rbp-0x08]   do-not-enreg[X] addr-exposed "GSSecurityCookie"
;
; Lcl frame size = 65600

G_M20969_IG01:
       55                   push     rbp
       4C8D9C24C0FFFEFF     lea      r11, [rsp-10040H]
       E852B5A25F           call     CORINFO_HELP_STACK_PROBE
       498BE3               mov      rsp, r11
       488DAC2440000100     lea      rbp, [rsp+10040H]
       33C0                 xor      rax, rax
       488985E8FFFEFF       mov      qword ptr [rbp-10018H], rax
       48B878563412F0DEBC9A mov      rax, 0x9ABCDEF012345678
       488945F8             mov      qword ptr [rbp-08H], rax
						;; bbWeight=1    PerfScore 5.75
G_M20969_IG02:
       48B8B893730CFA7F0000 mov      rax, 0x7FFA0C7393B8
       833800               cmp      dword ptr [rax], 0
       7405                 je       SHORT G_M20969_IG04
						;; bbWeight=1    PerfScore 3.25
G_M20969_IG03:
       E8CC2A745F           call     CORINFO_HELP_DBG_IS_JUST_MY_CODE
						;; bbWeight=0.50 PerfScore 0.50
G_M20969_IG04:
       33D2                 xor      edx, edx
       488D8DF0FFFEFF       lea      rcx, bword ptr [rbp-10010H]
       41B801000100         mov      r8d, 0x10001
       4D63C0               movsxd   r8, r8d
       E8C5B0A25F           call     CORINFO_HELP_MEMSET
       48B80855D67665020000 mov      rax, 0x26576D65508
       488B00               mov      rax, gword ptr [rax]
       488985E8FFFEFF       mov      gword ptr [rbp-10018H], rax
       488B85E8FFFEFF       mov      rax, gword ptr [rbp-10018H]
       48B978563412F0DEBC9A mov      rcx, 0x9ABCDEF012345678
       48394DF8             cmp      qword ptr [rbp-08H], rcx
       7405                 je       SHORT G_M20969_IG05
       E8D5F5735F           call     CORINFO_HELP_FAIL_FAST
						;; bbWeight=1    PerfScore 9.75
G_M20969_IG05:
       90                   nop      
						;; bbWeight=1    PerfScore 0.25
G_M20969_IG06:
       488D6500             lea      rsp, [rbp]
       5D                   pop      rbp
       C3                   ret      
						;; bbWeight=1    PerfScore 2.00

; Total bytes of code 146, prolog size 48, PerfScore 36.10, (MethodHash=ff99ae16) for method ClassLibrary1.Class1:BrokenFunction():System.String

@jakobbotsch
Copy link
Member

jakobbotsch commented Nov 12, 2021

@lstrudwick as a workaround you can pass /clr:initlocals to the C++ compiler. It will mean the poisoning will not happen and that the struct will be zero initialized instead.

C# repro:

using System;
using System.Runtime.CompilerServices;

public unsafe class Program
{
    [SkipLocalsInit]
    public static void Main()
    {
        Test t;
        Console.WriteLine(t.Bytes[4]);
    }

    private struct Test
    {
        public fixed byte Bytes[0x10008];
    }
}

As for a fix I'm not sure if we should just give up on poisoning for structs this large or try to support it (e.g. using rep stosd, not sure what to use on ARM architectures). Thoughts @AndyAyersMS?

@jakobbotsch
Copy link
Member

Alternatively we can increase the size of emitLclVarAddr to allow encoding larger offsets. That would increase the memory usage slightly on 32-bit platforms, but not on 64-bit platforms as it is part of a union with pointer members already.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 12, 2021
@AndyAyersMS
Copy link
Member

AndyAyersMS commented Nov 12, 2021

I'm not sure if we should just give up on poisoning for structs this large or try to support it

Not sure either -- I haven't looked at where the poisoning code ends up, but if it's at the end of the prolog or the first thing after the prolog then using loops shouldn't be too hard.

[Edit, looks like indeed it wasn't...]

jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue Nov 15, 2021
For very large structs (> 64K in size) poisoning could end up generating
instructions requiring larger local var offsets than we can handle. This
hits IMPL_LIMIT that throws InvalidProgramException. Turn off poisoning
for larger structs that require more than 16 movs to also avoid the
significant code bloat by the singular movs.

This is a less risky version of dotnet#61521 for backporting to .NET 6.

Fix dotnet#60852
jakobbotsch added a commit that referenced this issue Nov 15, 2021
For very large structs (> 64K in size) poisoning could end up generating
instructions requiring larger local var offsets than we can handle. This
hits IMPL_LIMIT that throws InvalidProgramException. Turn off poisoning
for larger structs that require more than 16 movs to also avoid the
significant code bloat by the singular movs.

This is a less risky version of #61521 for backporting to .NET 6.

Fix #60852
github-actions bot pushed a commit that referenced this issue Nov 15, 2021
For very large structs (> 64K in size) poisoning could end up generating
instructions requiring larger local var offsets than we can handle. This
hits IMPL_LIMIT that throws InvalidProgramException. Turn off poisoning
for larger structs that require more than 16 movs to also avoid the
significant code bloat by the singular movs.

This is a less risky version of #61521 for backporting to .NET 6.

Fix #60852
jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue Nov 15, 2021
For very large structs poisoning could end up generating instructions
requiring larger local var offsets than we can handle which would hit an
IMPL_LIMIT that throws InvalidProgramException. Switch to using rep
stosd (x86/x64)/memset helper (other platforms) when a local needs more
than a certain number of mov instructions to poison.

Also includes a register allocator change to mark killed registers as
modified in addRefsForPhysRegMask instead of by the (usually) calling
function, since this function is used directly in the change.

Fix dotnet#60852
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Nov 16, 2021
safern pushed a commit that referenced this issue Dec 15, 2021
* Disable poisoning for large structs

For very large structs (> 64K in size) poisoning could end up generating
instructions requiring larger local var offsets than we can handle. This
hits IMPL_LIMIT that throws InvalidProgramException. Turn off poisoning
for larger structs that require more than 16 movs to also avoid the
significant code bloat by the singular movs.

This is a less risky version of #61521 for backporting to .NET 6.

Fix #60852

* Run jit-format

* Add regression test

* Update src/coreclr/jit/codegencommon.cpp

Co-authored-by: Andy Ayers <andya@microsoft.com>

* Don't check poisoning for large struct in test

Since it's disabled, there is nothing to check.

Co-authored-by: Jakob Botsch Nielsen <jakob.botsch.nielsen@gmail.com>
Co-authored-by: Andy Ayers <andya@microsoft.com>
@ghost ghost locked as resolved and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
6 participants