Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable fake hot/cold splitting on ARM64 #70708

Merged
merged 13 commits into from
Jun 22, 2022

Conversation

amanasifkhalid
Copy link
Member

This PR enables fake hot/cold splitting on ARM64 via COMPlus_JitFakeProcedureSplitting, and fixes various bugs exposed by testing with fake-splitting:

  • Branches between hot/cold sections are now always long.
  • The pseudoinstruction for loading a constant from the cold section did not support loading 16-byte data into vector registers, as it temporarily loaded the constant into an 8-byte integer register. Now, 16-byte constants are loaded directly into vector registers via an ld1 instruction.
  • Tests involving loading 16-byte constants exposed the data section is not always aligned to its largest constant. Now, the data section is always aligned to emitConsDsc.alignment when calling eeAllocMem.
  • Asserts/NYIs blocking hot/cold splitting on ARM64 have been removed.

Fake hot/cold splitting requires we fake unwind info by treating each split function as one hot section. A more architecture-agnostic approach for this has been applied. However, unwind info generated for cold sections has yet to be tested, as this depends on VM support for hot/cold splitting.

Aman Khalid added 3 commits May 27, 2022 14:11
- Remove NYIs/control flow preventing code splitting on ARM64.
- Update emitter::emitIns_J() to keep jumps between hot/cold sections
long.
- Update emitter::emitOutputLJ() to emit long jumps for both
conditional and unconditional branches between hot/cold sections, and
report relocations to the runtime.
- Update long ldr pseudoinstruction to instead use ld1 instruction
when loading 16-byte constants into vector registers; ldr
implementation temporarily loads the constant into a general integer
register, which does not support 16-byte values.
- Remove NYIs/control flow preventing code splitting on ARM64.
- Update emitter::emitIns_J() to keep jumps between hot/cold sections
long.
- Update emitter::emitOutputLJ() to emit long jumps for both
conditional and unconditional branches between hot/cold sections, and
report relocations to the runtime.
- Update long ldr pseudoinstruction to instead use ld1 instruction
when loading 16-byte constants into vector registers; ldr
implementation temporarily loads the constant into a general integer
register, which does not support 16-byte values.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 14, 2022
@ghost ghost assigned amanasifkhalid Jun 14, 2022
@ghost
Copy link

ghost commented Jun 14, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR enables fake hot/cold splitting on ARM64 via COMPlus_JitFakeProcedureSplitting, and fixes various bugs exposed by testing with fake-splitting:

  • Branches between hot/cold sections are now always long.
  • The pseudoinstruction for loading a constant from the cold section did not support loading 16-byte data into vector registers, as it temporarily loaded the constant into an 8-byte integer register. Now, 16-byte constants are loaded directly into vector registers via an ld1 instruction.
  • Tests involving loading 16-byte constants exposed the data section is not always aligned to its largest constant. Now, the data section is always aligned to emitConsDsc.alignment when calling eeAllocMem.
  • Asserts/NYIs blocking hot/cold splitting on ARM64 have been removed.

Fake hot/cold splitting requires we fake unwind info by treating each split function as one hot section. A more architecture-agnostic approach for this has been applied. However, unwind info generated for cold sections has yet to be tested, as this depends on VM support for hot/cold splitting.

Author: amanasifkhalid
Assignees: amanasifkhalid
Labels:

area-CodeGen-coreclr

Milestone: -

@amanasifkhalid
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

This commit contains fixes for various bugs exposed by enabling fake
hot/cold splitting on ARM64:
- Branches between hot/cold sections are now always long.
- The pseudoinstruction for loading a constant from the cold section
did not support loading 16-byte data into vector registers, as it
temporarily loaded the constant into an 8-byte integer register. Now,
16-byte constants are loaded directly into vector registers via an
`ld1` instruction.
- Tests involving loading 16-byte constants exposed the data section
is not always aligned to its largest constant. Now, the data section
is always aligned to `emitConsDsc.alignment` when calling `eeAllocMem`.
- Asserts/NYIs blocking hot/cold splitting on ARM64 have been removed.

Fake hot/cold splitting requires we fake unwind info by treating each
split function as one hot section. A more architecture-agnostic
approach for this has been applied.
@amanasifkhalid
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@amanasifkhalid
Copy link
Member Author

Closing to prevent re-running checks.

Aman Khalid added 4 commits June 14, 2022 10:22
This commit contains fixes for various bugs exposed by enabling fake
hot/cold splitting on ARM64:
- Branches between hot/cold sections are now always long.
- The pseudoinstruction for loading a constant from the cold section
did not support loading 16-byte data into vector registers, as it
temporarily loaded the constant into an 8-byte integer register. Now,
16-byte constants are loaded directly into vector registers via an
`ld1` instruction.
- Tests involving loading 16-byte constants exposed the data section
is not always aligned to its largest constant. Now, the data section
is always aligned to `emitConsDsc.alignment` when calling `eeAllocMem`.
- Asserts/NYIs blocking hot/cold splitting on ARM64 have been removed.

Fake hot/cold splitting requires we fake unwind info by treating each
split function as one hot section. A more architecture-agnostic
approach for this has been applied.
The newly-introduced `emitRemoveJumpToNextInst` optimization caused
a regression when hot/cold-splitting, where jumps from the last hot
instruction to the first cold instruction were erroneously removed.
This is fixed by disabling the `isRemovableJmpCandidate` flag for
branches between hot/cold sections.

On an unrelated note, a JIT dump message has been added to indicate
stress-splitting is occurring.
@amanasifkhalid
Copy link
Member Author

I've added commits that update the fake-splitting implementation to place the read-only data section after the cold section on ARM64. This allows the hot/cold sections to be truly contiguous -- previously, the read-only data section was placed after the hot section. This placement should better facilitate generating fake unwind info, thus fixing stack walks when fake-splitting.

@amanasifkhalid
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Jun 20, 2022

I realized my implementation for aligning the data section when fake-splitting is overzealous, and simply tweaking the current implementation to align to TARGET_POINTER_SIZE when emitConsDsc.alignment >= TARGET_POINTER_SIZE (rather than ==) fixes the asserts hit when loading 16-byte constants.

@BruceForstall PTAL. I noticed runtime-jit-experimental is failing due to some odd concurrency issue (at least that's what it looks like). Hopefully this was just a fluke.

@amanasifkhalid amanasifkhalid marked this pull request as ready for review June 20, 2022 21:47
@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Jun 20, 2022

/azp run runtime-jit-experimental

Edit: Checks didn't trigger for some reason... I'll wait to restart them after addressing feedback.

@BruceForstall BruceForstall self-requested a review June 20, 2022 21:52
@BruceForstall
Copy link
Member

I noticed runtime-jit-experimental is failing due to some odd concurrency issue (at least that's what it looks like). Hopefully this was just a fluke.

Looks like a (bad) recent regression. I opened #71023.

@BruceForstall
Copy link
Member

/azp run runtime

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

To facilitate generating unwind info, fake-splitting now places the
read-only data section after the cold section. This allows the
hot/cold code sections to be truly contiguous.
@amanasifkhalid
Copy link
Member Author

@BruceForstall PTAL at eeAllocMem. I incorporated your suggestions to simplify the caller logic and improve the method's abstraction: eeAllocMem now takes emitConsDsc.alignment as an additional argument since it's already calculated, and handles the ARM64-specific fix-ups. The caller no longer handles anything ARM64- or fake-splitting-specific.

@BruceForstall
Copy link
Member

@BruceForstall PTAL at eeAllocMem. I incorporated your suggestions to simplify the caller logic and improve the method's abstraction: eeAllocMem now takes emitConsDsc.alignment as an additional argument since it's already calculated, and handles the ARM64-specific fix-ups. The caller no longer handles anything ARM64- or fake-splitting-specific.

I like this direction. Unfortunately, and I apologize, this change is going to clash with my change #71044. Can we wait until that is merged (today), and then review this? You'll need to pick up its changes into this.

@amanasifkhalid
Copy link
Member Author

I like this direction. Unfortunately, and I apologize, this change is going to clash with my change #71044. Can we wait until that is merged (today), and then review this? You'll need to pick up its changes into this.

No worries!

@amanasifkhalid
Copy link
Member Author

@BruceForstall I merged your changes from #71044 in, and added a guard to disable splitting on LoongArch64. Thanks for the ARM64 alignment work -- I recall hitting some related asserts when trying to load 16-byte constants a few weeks ago...

@amanasifkhalid
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/coreclr/jit/unwind.cpp Show resolved Hide resolved
@amanasifkhalid
Copy link
Member Author

Merging this PR for now. Will submit a follow-up PR disabling hot/cold splitting when generateCFIUnwindCodes() is true.

@amanasifkhalid amanasifkhalid merged commit bc1a872 into dotnet:main Jun 22, 2022
@AndyAyersMS AndyAyersMS mentioned this pull request Jun 29, 2022
54 tasks
@ghost ghost locked as resolved and limited conversation to collaborators Jul 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants