Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support AVR for inline asm! #91224

Merged
merged 1 commit into from
Dec 7, 2021
Merged

Conversation

couchand
Copy link
Contributor

A first pass at support for the AVR platform in inline asm!. Passes the initial compiler tests, have not yet done more complete verification.

In particular, the register classes could use a lot more fleshing out, this draft PR so far only includes the most basic.

cc @Amanieu @dylanmckay

@rust-highfive
Copy link
Collaborator

r? @wesleywiser

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 25, 2021
@rust-log-analyzer

This comment has been minimized.

Copy link
Member

@Amanieu Amanieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Regarding register classes, just make sure you have enough to accurate describe the constraints that would be needed for various instructions. See https://www.nongnu.org/avr-libc/user-manual/inline_asm.html for a mostly complete list of constraint code used by avr-gcc and LLVM. For example, I can see some instructions which only accept a "high" register (but then again I'm not very familiar with AVR assembly).

r31: reg = ["r31", "ZH"],

X: reg_xyz = ["r27r26", "X"],
Y: reg_xyz = ["r29r28", "Y"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a look at lib/Target/AVR/AVRRegisterInfo.cpp in the LLVM source you will see that Y is actually reserved for the stack pointer. Also r0 and r1 are reserved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like for Y the way to go is adding the #error pragma here to keep the user's hands off of it.

I think we'd need some way to name r0 and r1. For instance, to get the result of a MUL:

let a = 1u8;
let b = 1u8;
let res: u16;
asm!(
    "mul {}, {}",
    in(reg) a,
    in(reg) b,
    lateout("r1r0") res,
);

Is there another architecture that I can look to for an example of how to handle this?


(and just to clarify because it confused me for a moment, Y is reserved for the frame pointer, the stack pointer has a dedicated register SPH:SPL)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd need some way to name r0 and r1.

Or just leave it to the user to reassemble the value from r0 and r1 themselves. In the end it should be optimized the same way by the compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn't clear. If r0 and r1 are reserved, and thus probably shouldn't be nameable for inline asm!, how is the user expected to get the results from them?

Or to put it another way, what should I do with the information that lib/Target/AVR/AVRRegisterInfo.cpp reserves these registers if not exclude them from use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn't clear. If r0 and r1 are reserved, and thus probably shouldn't be nameable for inline asm!, how is the user expected to get the results from them?

Both registers r1 and r0 should be accessible. Any author of some piece Rust code targeting AVR should be able to name these registers in inline ASM.

The zero register should always be accessible even in inline asm so that the dev can pass 0 to an instruction which only supports registers as inputs, not immediates.

See this avr-gcc wiki doc https://gcc.gnu.org/wiki/avr-gcc#Fixed_Registers which shows that both registers should be available in inline asm.

It is a little bit finicky with LLVM instruction reordering, but it's always like that and I don't see a problem with your approach. You won't be able to say, run a piece of C code (i.e. something that executes MUL) and then use inline ASM to reassemble r1/r0 because LLVM inline assembler does not parse/inspect into the raw assembly text (only the constraints/list), so LLVM won't know you're referring to r1/r0 and so will not be able to preserve it during transformation. You would be able to use inline ASM to reassemble r1/r0 from the result of an instruction that is also inside the inline asm block, because the internal contents of any inline asm block would not be reordered. In summary, in this paragraph all I am trying to say that it is perfectly acceptable and allowable to use r1/r0 in inline ASM, but you cannot rely on their values across inline asm block boundaries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like there should be a register class just for r1/r0, they shouldn't be in reg because we wouldn't want to pick either of them for "any old register".

The zero register should always be accessible

Thanks for bringing this up. I'll need to add to the documentation that if a programmer writes to r1 in any way that they must restore it to zero before the end of the asm! block.

it is perfectly acceptable and allowable to use r1/r0 in inline ASM, but you cannot rely on their values across inline asm block boundaries.

Does that mean that you shouldn't be able to use them as inputs or outputs to the block? For instance, would my code sample up thread be expected to work, or does that count as "across block boundaries"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that r0 and r1 should be usable as operands in asm!. However the reality is that LLVM marks these as reserved registers, which causes LLVM's register allocator to ignore these and assume they are never touched by inline assembly. If you try to actually use r0/r1 as operands then things will break in unpredictable ways, which is why we always prevent reserved registers from being used in asm!.

The only "correct" way to use r0 and r1 in asm! is to save their value to another register (which should be properly marked as clobbered) and then restoring these values at the end of the assembly snippet. This is unavoidable until the LLVM backend is fixed to no longer treat these registers as reserved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having a hard time understanding exactly what LLVM means by reserved. The only authority I've found is the comment above getReservedRegs here: https://github.com/llvm/llvm-project/blob/e356027016c6365b3d8924f54c33e2c63d931492/llvm/include/llvm/CodeGen/TargetRegisterInfo.h#L524-L527

  /// A reserved register:
  /// - is not allocatable
  /// - is considered always live
  /// - is ignored by liveness tracking

I don't think I know enough to tell if that should be applied to r0 and r1 or not. I'm inclined to trust your judgement there @Amanieu.

This is unavoidable until the LLVM backend is fixed to no longer treat these registers as reserved.

If this is where we're going, I'm happy to submit this change to LLVM. @dylanmckay, since I suspect you'll be reviewing the differential, does that change make sense to you?

In the meantime, I'll make r0 and r1 error out here with a helpful message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unavoidable until the LLVM backend is fixed to no longer treat these registers as reserved.

If this is where we're going, I'm happy to submit this change to LLVM. @dylanmckay, since I suspect you'll be reviewing the differential, does that change make sense to you?

From my point of view, if AVR-GCC compatibility is kept, then the change is definitely good. If not, maybe it's possible but certainly more complicated 👍

src/doc/unstable-book/src/library-features/asm.md Outdated Show resolved Hide resolved
compiler/rustc_target/src/asm/avr.rs Outdated Show resolved Hide resolved
src/doc/unstable-book/src/library-features/asm.md Outdated Show resolved Hide resolved
src/test/assembly/asm/avr-types.rs Show resolved Hide resolved
src/test/assembly/asm/avr-types.rs Show resolved Hide resolved
compiler/rustc_codegen_llvm/src/asm.rs Outdated Show resolved Hide resolved
@wesleywiser
Copy link
Member

r? @Amanieu

@rust-highfive rust-highfive assigned Amanieu and unassigned wesleywiser Nov 29, 2021
@couchand
Copy link
Contributor Author

Thanks for your review @Amanieu! I've replied in the places where I have questions.

Is there any style guidance on the translation from constraint codes/descriptions to register class names? Here is my proposal for a minimal set of constraint codes to support:

  • r for any 8-bit register
  • w for any 16-bit register pair
  • d for an 8-bit upper register
  • e for any pointer register pair (just X and Z I guess)
  • z for the Z register (and also any base pointer register since Y is not usable) [does this even need to be a class?]

The first two could possibly be named reg and wreg to reuse symbols from other architectures, but I'm not sure what the others should be named.

@couchand
Copy link
Contributor Author

I've pushed up a variety of changes that addresses most (but not all) of the outstanding items. Please take a look when you have availability @Amanieu. Thanks!

@couchand
Copy link
Contributor Author

couchand commented Nov 29, 2021

I think I'm not entirely clear on how I should be handling the register pairs. Any pair can be used as an argument to the 16-bit MOVW instruction, and the upper pairs can be an argument to the 16-bit immediate instructions like ADIW.

Also, each of the pointer pairs, which have special behavior for the indirect memory operations, can be used as two general-purpose 8-bit registers when passed to any other instruction.

The 8-bit and 16-bit operation expect the lower register of the pair to be named, with the upper register implicit.

Are the changes I've made here consistent with that usage?

@dylanmckay
Copy link
Contributor

dylanmckay commented Dec 1, 2021

I think I'm not entirely clear on how I should be handling the register pairs. Any pair can be used as an argument to the 16-bit MOVW instruction, and the upper pairs can be an argument to the 16-bit immediate instructions like ADIW.

Also, each of the pointer pairs, which have special behavior for the indirect memory operations, can be used as two general-purpose 8-bit registers when passed to any other instruction.

The 8-bit and 16-bit operation expect the lower register of the pair to be named, with the upper register implicit.

Are the changes I've made here consistent with that usage?

One point I'll make is that the instruction set is pretty ad-hoc/bolted on parts in ways, with even the registers supported for an "instruction" being target MCU specific. You will see instructions like: LPM: only ever works on the Z register, therefore, to use in inline ASM there needs to be a register class that can only ever resolve to the Z register[0] so GCC has an inline asm reg class constraint named z.

Another instruction complicating it is STD/LDD (store/load to memory with pointer offset/displacement). On some obscure AVR chips, they are only defined for Y and Z pointer registers and not the X pointer register (but you can probably ignore this case because the LLVM backend doesn't even make this distinction). But if the backend did make the distinction in the future as it should already be doing, then for the Rust code to work correct across AVR chips, the Rust code would need to explicitly opt-in to one of the x, y, z reg class constraints (redundantly excluding registers on the high end device that could be used, from being used), or the Rust compiler itself would need to expose an inline asm reg class constraint whose included registers were target device specific so that the Rust could could just use that and have it just work.

Suffice to say, I think it's best to land a core set like you're doing and add the others later. It's probably worth adding 3 more extra register classes in this PR (x, y, z, that can only resolve to their own respective implied registers) which will allow Rust code to at least use instructions like LPM and also work around any chip-specific/subtarget feature issues.

EDIT: I remembered you're intentionally not allowing frame pointer Y to be used so in that case obviously there's no need no add a register class for it

@couchand
Copy link
Contributor Author

couchand commented Dec 1, 2021

Thanks for the review @dylanmckay!

The point is well received that every AVR device is a new beast. For myself, I'm usually working on ones that don't even have a hardware MUL (so that conversation is all a bit theoretical!). A very conservative use of asm! might have a good chance of being portable across cores, but anything very complicated will need special attention paid to the specific core(s) in use. I think that's a reasonable trade off to make.

It's probably worth adding 3 more extra register classes in this PR (x, y, z, that can only resolve to their own respective implied registers) which will allow Rust code to at least use instructions like LPM and also work around any chip-specific/subtarget feature issues.

Are all the use-cases for these handled by the specific-register syntax of asm!? That is, is it sufficient that the user can write

let mut byte = MaybeUninit::uninit();
let input = 42u8;
asm!(
    "st X, {}",
    in("X") byte.as_mut_ptr() as u16,
    in(reg) input,
);

or should you also be able to say

let mut byte = MaybeUninit::uninit();
let input = 42u8;
asm!(
    "st {}, {}",
    in(x) byte.as_mut_ptr() as u16,
    in(reg) input,
);

@Amanieu
Copy link
Member

Amanieu commented Dec 2, 2021

I would recommend adding template modifiers for register pairs to print just the high/low register of a pair. Have a look at how it's done for ARM where the e and f modifiers allow printing the low/high d-reg component of a q-reg. In the AVR backend the equivalent modifiers are A and B.

@couchand
Copy link
Contributor Author

couchand commented Dec 2, 2021

I would recommend adding template modifiers for register pairs to print just the high/low register of a pair. Have a look at how it's done for ARM where the e and f modifiers allow printing the low/high d-reg component of a q-reg. In the AVR backend the equivalent modifiers are A and B.

That's a great idea. I'll go ahead and add those.

@couchand couchand force-pushed the 2021-11/avr-asm branch 2 times, most recently from e3ef486 to 1e4e1c3 Compare December 2, 2021 22:23
@couchand couchand marked this pull request as ready for review December 2, 2021 22:24
@couchand
Copy link
Contributor Author

couchand commented Dec 2, 2021

Thank you @Amanieu and @dylanmckay for your continued reviews and many thoughtful comments. I have incorporated all of the feedback thus far and rebased this branch. Please take another look when you have an opportunity.

@@ -322,6 +322,7 @@ impl AsmBuilderMethods<'tcx> for Builder<'a, 'll, 'tcx> {
InlineAsmArch::SpirV => {}
InlineAsmArch::Wasm32 | InlineAsmArch::Wasm64 => {}
InlineAsmArch::Bpf => {}
InlineAsmArch::Avr => {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to look this up in LLVM's internals, but the clobber for the status register is "~{sreg}", which should be marked as clobbered when preserves_flags is not set. Also the documentation in the unstable book should be updated to reflect this.

Copy link
Contributor Author

@couchand couchand Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added that.

So many of the other platforms have an empty branch here it was easy to not spend much thought on this clause. But the realization that I missed this made me worried there might be more. I just went through again and checked for other copy-pasta, didn't notice any concerns.

The one other place we have a null implementation that could potentially have code is suggest_modifier -- that one I did consider if there was a heuristic to use to know which byte you mean. But in the end it seems equally likely you'd want either byte, and there doesn't seem to be a way to suggest two modifiers.

@Amanieu
Copy link
Member

Amanieu commented Dec 6, 2021

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Dec 6, 2021

📌 Commit c6e8ae1 has been approved by Amanieu

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 6, 2021
@bors
Copy link
Contributor

bors commented Dec 7, 2021

⌛ Testing commit c6e8ae1 with merge 0b6f079...

@bors
Copy link
Contributor

bors commented Dec 7, 2021

☀️ Test successful - checks-actions
Approved by: Amanieu
Pushing 0b6f079 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 7, 2021
@bors bors merged commit 0b6f079 into rust-lang:master Dec 7, 2021
@rustbot rustbot added this to the 1.59.0 milestone Dec 7, 2021
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (0b6f079): comparison url.

Summary: This change led to moderate relevant regressions 😿 in compiler performance.

  • Moderate regression in instruction counts (up to 0.8% on incr-unchanged builds of externs)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression

@rustbot rustbot added the perf-regression Performance regression. label Dec 7, 2021
@couchand couchand deleted the 2021-11/avr-asm branch December 8, 2021 00:55
@couchand couchand restored the 2021-11/avr-asm branch December 8, 2021 00:55
@couchand couchand deleted the 2021-11/avr-asm branch December 8, 2021 00:55
@couchand couchand restored the 2021-11/avr-asm branch December 8, 2021 00:55
@couchand
Copy link
Contributor Author

couchand commented Dec 8, 2021

I have no idea what needs to be done to investigate this performance issue. From the linked results I can see there are five benchmark/profile/scenario rows with issues, and four of them are the externs benchmark.

But I don't know what to do next. What are these externs benchmarks? Why would inline assembler changes for a platform that presumably isn't exercised by the benchmark make such a significant change? What's the usual range on these results?

What does "sufficient written justification" require? Are there any examples or documentation that explain what's needed?

@Amanieu
Copy link
Member

Amanieu commented Dec 8, 2021

The performance issue is almost certainly not related to this PR.

@couchand
Copy link
Contributor Author

couchand commented Dec 8, 2021

The performance issue is almost certainly not related to this PR.

Good to hear. Is there anything actionable for me here, then?

@pnkfelix
Copy link
Member

@couchand no, I don't think there's anything actionable for you here. If anything, the performance team might want to double check what's going on with that benchmark, and/or our noise suppression techniques.

@pnkfelix
Copy link
Member

I think we can ignore this regression. externs is a pathological benchmark (3000 extern function declarations) and I do not think we should worry about it going up by 0.8%, especially since a different like that could easily be due to memory alignment effects in a benchmark like this.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants