Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support --fix-v4bx #50764

Closed
nickdesaulniers opened this issue Aug 9, 2021 · 17 comments
Closed

support --fix-v4bx #50764

nickdesaulniers opened this issue Aug 9, 2021 · 17 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla lld:ELF

Comments

@nickdesaulniers
Copy link
Member

Bugzilla Link 51422
Version unspecified
OS All
Blocks #4440
CC @arndb,@efriedma-quic,@MaskRay,@nathanchance,@smithp35

Extended Description

In order to link armv4 Linux kernel images with LLD, it's required that LLD support --fix-v4bx, otherwise we get the warning:

ld.lld: warning: lld uses blx instruction, no object with architecture supporting feature detected

IIUC, there's even a relocation type for this: R_ARM_V4BX.

ClangBuiltLinux/linux#964

@efriedma-quic
Copy link
Collaborator

There's two separate issues here:

  1. Supporting armv4t.
  2. Supporting armv4 without thumb.

Supporting armv4t requires some modifications to the relocation handling for calls; that's what the warning is about.

Supporting armv4 without thumb means we need to use a different sequence for function returns. Historically, this is a compiler thing, not a linker thing, but R_ARM_V4BX is a trick to allow writing code that works for both armv4t and armv4-without-thumb targets. --fix-v4bx essentially converts arm-mode armv4t code to armv4.

If you're not using --fix-v4bx, the linker doesn't really care about the difference between armv4t and armv4; it doesn't generate function returns, anyway.

@MaskRay
Copy link
Member

MaskRay commented Aug 9, 2021

(The documentation says the support is for >=v6.)

Isn't armv4 so obscure that it it quite irrelevant now?
Even if it isn't so obscure, are these users happy with their current toolchain and don't consider a switch?

@efriedma-quic
Copy link
Collaborator

armv5t has both the register and immediate forms of blx, I think.

I think the issue here is there are some legacy hardware enthusiasts who build Linux for very old hardware, like https://en.wikipedia.org/wiki/Risc_PC , so legacy configs targeting armv4 persist in the kernel source tree, and randconfig stumbles over them.

@arndb
Copy link
Mannequin

arndb mannequin commented Aug 10, 2021

There are three classes of ARMv4 (not T) users that I can see from a kernel perspective:

  • RiscPC/sa110 has only one actual user (rmk) I'm aware of, and it has the added problem that the hardware does not support 32-bit memory accesses, so kernels are actually built for ARMv3 despite this being an ARMv4 processor. gcc-9 removed ARMv3 support a few years ago, and I see no reason to add this to clang. As soon as gcc-9 or higher is required for building kernels, this platform will be removed.

  • There are a handful of hobbyist users of legacy StrongARM machines that run ARMv4 kernels on SA1100 handhelds or Netwinder. We will probably discuss removing those as well when time is up for RiscPC, but they could remain for a few years longer.

  • There are two remaining embedded platforms based on Faraday FA526 cores that are deployed commercially in settings that require kernel updates: Cortina Gemini and MOXA Art. The hardware is old, but the kernel ports are well written and maintained, they will probably outlive a number of ARMv4T and ARMv5E platforms we currently support. It would be nice to be able to build kernels for these using clang.

For reference, we are still adding support for ARMv5E (ARM926E) SoC platforms, including SoCs designed as recently as 2020 (Microchip SAM9X60).
For ARMv4T, we currently support AT91RM9200, S3C24xx, i.MX1, EP72xx, EP93xx, OMAP1, and ARM Integrator based on ARM920T and ARM710T. Debian Buster (released in 2019) dropped support for these, but there is still some work going into the kernel for at least ep93xx, omap1 and s3c24xx, similar to the ARMv4/FA526 platforms.

@smithp35
Copy link
Collaborator

On the LLD side I originally implemented v7-A and above then later added v5 as there was at least one person had some need for it, I think on the BSD side. It probably wouldn't be a huge amount of extra code and tests to support v4t but up until now there hasn't been anyone needing it in LLD. The assumption was that older hardware would already have toolchains they could use, and we could avoid extra complexity in the code.

For v4t support in lld, there are a few things we'd need to do:

  • We'd need a new v4t thunks as ldr pc, [address] cannot change state in v4t. The sequence becomes:
    ldr ip, [address]
    bx ip
  • As there is no BLX there is no Thunk sharing between Arm and Thumb state. The v4t Thunks isCompatibleWith would need to include only relocations from the same state.
  • ARM::needsThunk() would need a case for v4t that accounts for the lack of a BLX. I think it would be something like (pseudo code):
    return isV4t || !inBranchRange(type, branchAddr, dst + a);

If the Thunks code has done its job properly then the relocation handling code should never be presented with an opportunity to use a BLX.

On the linker side R_ARM_V4BX is trivial to implement (just write Arm state mov pc, lr encoding to the location), but as Eli points out the compiler has to generate the relocation when targeting Arm v4.

If there is a need for support then it may be worth approaching Linaro TCWG. I could do it, but I've not got a lot of spare time so it might take a few weeks rather than a few days.

@efriedma-quic
Copy link
Collaborator

  • As there is no BLX there is no Thunk sharing between Arm and Thumb state.
    The v4t Thunks isCompatibleWith would need to include only relocations from
    the same state.

The first thing any v4t Thumb thunk will do is switch to ARM mode. The remaining code is the same, so it could be shared. Not sure that's worth doing, though.

@efriedma-quic
Copy link
Collaborator

Also, the kernel doesn't actually have any thumb code, so it doesn't actually need v4t thunks, I think. It would be enough to just suppress the warning if there aren't any Thumb symbols.

@smithp35
Copy link
Collaborator

The first thing any v4t Thumb thunk will do is switch to ARM mode. The
remaining code is the same, so it could be shared. Not sure that's worth
doing, though.

It is possible, although it would mean two entry points for thunk that we'd need to select between per relocation type. My instinct would be to keep it simple unless there was a real need to minimise the code-size.

Also, the kernel doesn't actually have any thumb code, so it doesn't
actually need v4t thunks, I think. It would be enough to just suppress the
warning if there aren't any Thumb symbols.

That would be possible, although at the moment that would require LLD to scan all the local symbols of every input object looking for the absence of $t. Although only programs that could trigger the warning would need to run the check. We can't use the BuildAttributes as objects built with -marm still have Tag_THUMB_ISA_use=1 (Thumb instructions were permitted to be used)

Would a suppression of the warning be acceptable to the kernel folk? Or are there v4t kernels that use Thumb code that need supporting?

@arndb
Copy link
Mannequin

arndb mannequin commented Aug 11, 2021

That would be possible, although at the moment that would require LLD to
scan all the local symbols of every input object looking for the absence of
$t. Although only programs that could trigger the warning would need to run
the check. We can't use the BuildAttributes as objects built with -marm
still have Tag_THUMB_ISA_use=1 (Thumb instructions were permitted to be used)

Would a suppression of the warning be acceptable to the kernel folk? Or are
there v4t kernels that use Thumb code that need supporting?

I'm sure there is no thumb code in pre-v7 kernels that we need to link against. Suppressing the warning would seem to do the trick for armv4t kernels, but if I build for armv4, I still see 'bx' instructions in the thunks.

I checked the compiler output from a recent clang and I see that building for armv4 does not generate any 'bx' instructions, so I don't think we actually need the R_ARM_V4BX/--fix-v4bx trick any more to work around compiler-generated instructions, but we still need to work around the linker adding them.

Could we just have a linker flag to ask for armv4 style thunks, and use them for both v4 and v4t kernels?

@smithp35
Copy link
Collaborator

It does look like (ARMInstrInfo.td MOVPCLR https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/ARM/ARMInstrInfo.td#L2443) the return sequence is predicated so clang/llvm will use mov pc, lr for -march=armv4 so in theory no compiler work required.

We can tell if all objects are ArmV4 from the build attributes. Wouldn't need a new command line option.

Looking at LLD I'd expect that the only thunks needed for an Arm v4 kernel would be long range Arm to Arm. If --pic-veneer is set then these will have bx lr in them. However if --pic-veneer isn't set I'd expect to see all the thunks to be of the form:
ldr pc, [pc, #-4] ; l1
l1: .word

In theory these should be OK for v4. May be worth trying a build without --pic-veneer (The kernel wouldn't be position independent though).

I think a new PI Thunk for v4 would look something like:
ldr ip, [pc, #-8] ; l1
add pc, ip, pc ; not interworking but OK for v4 as there is no Thumb
l1: .word destination -4 - l1 ; offset to destination

Although if we're adding thunks for armv4, it may just be worth adding v4t thunks at the same time.

@arndb
Copy link
Mannequin

arndb mannequin commented Aug 11, 2021

It does look like (ARMInstrInfo.td MOVPCLR
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/ARM/
ARMInstrInfo.td#L2443) the return sequence is predicated so clang/llvm will
use mov pc, lr for -march=armv4 so in theory no compiler work required.

Maybe it was the 'blx mcount' when building with '-pg' that caused the problems? I remember debugging into that in the past, but don't remember the exactly problem and this seems fine now, it always uses 'bl mcount' here with the clang version I see.

We can tell if all objects are ArmV4 from the build attributes. Wouldn't
need a new command line option.

Looking at LLD I'd expect that the only thunks needed for an Arm v4 kernel
would be long range Arm to Arm.

Correct

If --pic-veneer is set then these will have
bx lr in them. However if --pic-veneer isn't set I'd expect to see all the
thunks to be of the form:
ldr pc, [pc, #-4] ; l1
l1: .word

In theory these should be OK for v4. May be worth trying a build without
--pic-veneer (The kernel wouldn't be position independent though).

Yes, that works. I see the --pic-veneer flag was added by Ard in
https://www.armlinux.org.uk/developer/patches/viewpatch.php?id=8323/1

to work around veneers that are inserted into the early-boot code that is run before the MMU is enabled.

We can probably get away with this on actual ARMv4/v4t hardware since their memory is too constrained to actually run a kernel that is large enough to require veneers, but it seems a little wrong. In theory one can run a large (allyesconfig style) kernel with ARMv4 enabled on an ARMv5 machine with a lot of memory.

I think a new PI Thunk for v4 would look something like:
ldr ip, [pc, #-8] ; l1
add pc, ip, pc ; not interworking but OK for v4 as there is no Thumb
l1: .word destination -4 - l1 ; offset to destination

Although if we're adding thunks for armv4, it may just be worth adding v4t
thunks at the same time.

That would be nice for user space, but in the kernel we don't really care about the distinction. I suggested a command line flag since we could then just unconditionally pick the v4 veneers for linking a v4/v4t kernel and avoid both the bx instructions and the warning.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Dec 19, 2022
lld cannot build for ARMv4/v4T targets because it inserts 'blx' instructions
that are unsupported there:

  ld.lld: warning: lld uses blx instruction, no object with architecture supporting feature detected

Add a Kconfig time dependency to prevent those targets from being
selected in randconfig builds.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: llvm/llvm-project#50764
Link: ClangBuiltLinux#964
Link: https://lore.kernel.org/r/20221215162635.3750763-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
@nickdesaulniers
Copy link
Member Author

https://reviews.llvm.org/rG62c605771a30 looks perhaps related? cc @stuij

@stuij
Copy link
Member

stuij commented Dec 29, 2022

Ah, this is bad timing! I was unaware of this ticket.

Yes, so my patch stops lld inserting BLX instructions for v4(t), and instead now creates veneers that are sensible for these architectures. Currently I only added support for absolute relocations, but PIC relocations are high on the list and should follow shortly. I think it's just two cases that we should still account for, and adding them is trivial. Which includes v4 PIC thunks not generating a BX which right now they still do.

And as we don't insert BLX anymore, the LLD BLX warning is not emitted anymore.

Let me know if I'm missing more things that need fixing. ATM I'm quite motivated to get v4(t) working for LLVM, so happy to help out where I can on this.

@stuij stuij self-assigned this Jan 7, 2023
@stuij
Copy link
Member

stuij commented Jan 9, 2023

I just put up a review for LLD that will make it support position independent thunks for v4/v4T:
https://reviews.llvm.org/D141272

I think that would fix this issue, and no more work would be needed on the LLVM side. To summarise:

  • We now (after patch lands) support both v4 and v4t relocations in linker, for both absolute and position independent configs.
  • We won't need support for --fix-v4bx or R_ARM_V4BX, as no BX is generated on v4, either by compiler or linker.
  • No special linker flag is needed for v4 as target architecture is inferred from build attributes.
  • We don't generate a BLX warning anymore, so no need to guard against that.

Am I missing anything?

This would mean the intel-lab-lkp/linux@6a7ee50 patch can be reverted. Would we need to wait until say LLVM 16 is released (assuming this patch lands before that)?

By the way, relatedly, I just pushed a fix (747fc27) for v4/v5/v6 where a b.w would be generated (not supported for these arches) if a range extension is needed and the distance between thunk section and target is smaller than b.w. So this would affect jumps spanning more than 32MB. Seeing the lack of a bug report, I guess this doesn't happen very often.

@stuij stuij closed this as completed in 6f9ff1b Jan 13, 2023
CarlosAlbertoEnciso pushed a commit to SNSystems/llvm-debuginfo-analyzer that referenced this issue Jan 16, 2023
- Position independent thunks now work for both Armv4 and Armv4T
- Armv4 arm->arm thunks don't emit a BX anymore, which doesn't exist for the
  arch. This fixes llvm/llvm-project#50764.
- Armv4 and Armv4T both have the same arm->arm behaviour. Which also is
  desirable for the above ticket.

Reviewed By: MaskRay, peter.smith

Differential Revision: https://reviews.llvm.org/D141272
@linusw
Copy link

linusw commented Feb 8, 2024

I just tested this with clang 17.0.6 on the Gemini FA526 target, close to ARMv4T.
It works like a charm:

sl-boot>go 0x400000
Booting Linux on physical CPU 0x0
Linux version 6.8.0-rc1+ (linus@lino) (clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708), LLD 17.0.6) #49 PREEMPT Thu Feb 8 23:02:26 CET 2024
CPU: FA526 [66015261] revision 1 (ARMv4), cr=0000397f
CPU: VIVT data cache, VIVT instruction cache
OF: fdt: Machine model: D-Link DIR-685 Xtreme N Storage Router
(...)

It boots all the way to userspace, gets network, routes packets and has framebuffer.

@linusw
Copy link

linusw commented Feb 15, 2024

Also tested with clang 17.0.6 on the SA110 Netwinder, which is a pure ARMv4 targer.
It boots fine and works as expected:

Press '*' TWICE to enter debug......Booting kernel...
Uncompressing Linux... done, booting the kernel.
Booting Linux on physical CPU 0x0
Linux version 6.8.0-rc4-00034-g8d3dea210042 (linus@lino) (clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708), LLD 17.0.6) #8 Th4
CPU: StrongARM-110 [4401a103] revision 3 (ARMv4), cr=0000517f
CPU: VIVT data cache, VIVT instruction cache
Machine: Rebel-NetWinder
(...)

All the way to userspace, network and graphics.

@stuij
Copy link
Member

stuij commented Feb 19, 2024

Yay! Tackar for posting this, and I bet I speak for all armv4(t) fans all over the world.

I Googled the Gemini SoC, and landed on a page you wrote. I'm tempted to buy a router, or perhaps one of those strongarm servers.

veselypeta pushed a commit to veselypeta/cherillvm that referenced this issue Jun 12, 2024
- Position independent thunks now work for both Armv4 and Armv4T
- Armv4 arm->arm thunks don't emit a BX anymore, which doesn't exist for the
  arch. This fixes llvm/llvm-project#50764.
- Armv4 and Armv4T both have the same arm->arm behaviour. Which also is
  desirable for the above ticket.

Reviewed By: MaskRay, peter.smith

Differential Revision: https://reviews.llvm.org/D141272
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla lld:ELF
Projects
None yet
Development

No branches or pull requests

6 participants