-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/link: windows/arm64 binary output invalid with linkmode=external #51923
Comments
Does it work with other DLL built from Go (e.g. just an empty main function)? |
I have absolutely no Go experience, so I apologize in advance. In a new directory, copying test.exe from earlier steps.
no error reported. |
Thanks. So the error seems specific to rclone? What if you build rclone with |
Same result with |
Sorry, could you clarify if it still fails with error 193? Or no error reported?
|
On a hunch, I tried rebuilding rclone.exe with
|
Linking the 'real' librclone.dll had no error or warning. Attempting to load the DLL gives error 193 |
It occurred to me that an lld "repro.tar" of linking the DLL might be helpful (especially for @mstorsjo) |
Awesome! Thanks, yes that did help me pinpoint the issue further. I think I know the root cause of what's happening here. TL;DR; the
What leads me to believe this is the root cause (and how I found it): Trying to load this DLL with a small test program like in the original post here, I get a dialog box pop up saying that it was unable load it (with error status 0xc000007b). If I try to do the same on Wine on linux on aarch64, it loads successfully. So the DLL isn't entirely broken, but some more strict check in Windows bails out here. One of the most common issues in failing to load an executable like this has in the past been if there are gaps in the section layout, which often can happen after stripping (but wouldn't happen if the linker omits debug info directly, unless the linker is broken). But that isn't an issue here, all the sections are correctly laid out. All the DLL flags and similar are set correctly. (On arm64, the dynamicbase flag is a must, contrary to x86.) Next, I tried to look at what parts of the executable that the Windows loader might be interacting with. I tried modifying the executable to zero out some fields, to have less things for the Windows loader to interact with. (To modify the executable, I used llvm-objcopy where I manually edited the code to do the exact tweaks on the PE header that I wanted; there's probably other tools for tweaking the PE header that would have worked too.) Looking at the file headers, it has a couple data directories defined; export table, import table, import address table, exception table, base relocation table, TLS table. I tried zooming in on the base relocation table. (See https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#the-reloc-section-image-only for some docs on it.) As the executable has got dynamicbase set, having no base relocation table at all doesn't seem to fly, but we can truncate it to see if it contains bad data somewhere. For inspecting the relocation table, I used MSVC's dumpbin.exe;
This output contained 174k lines, so we can't sit down and inspect it by hand. We can't truncate it to zero. But we can truncate it to only contain entries for some pages. Here, the size So somewhere, in the relocation table, there's something that makes the Windows loader bail out. So I bisected the table, leaving out more or less of it, until I managed to pinpoint the offending piece:
So here, there's a discontinuity in the table (or whatever you'd call it - it's not correctly sorted). With some more debugging in lld, I was able to pinpoint the source of this discontinutiy down to the |
@mstorsjo thanks for the detailed investigation! So it sounds like some relocations are not sorted and the PE loader doesn't like them? Could you, or @jeremyd2019 , try if this patch makes any difference? Thanks!
|
Yes; object file level relocations aren't sorted. For normal linking this should be mostly harmless, but the non-sortedness carries over to the DLL/EXE base relocations, which seem to need to be in strictly increasing order, and lld assumes the object file relocations are correctly ordered to begin with - which they seem to be so far - I haven't checked the spec if this is a strict requirement or just something that all toolchains practically do. Technically it would be possible to add a check and warning for it in lld, but as lld generally is quite performance focused and sometimes quite blindly assume that the input files are correct, I'm not sure if a check for it (or even worse, a sorting pass) would be welcomed there. I could check and see what MS link.exe does in such a situation - if it does resolve it nicely, one could argue that lld should get some fix too.
I don't have any setup available for testing it unfortunately, but let's hope @jeremyd2019 can. |
This is as far as I was able to get with my knowledge. I wish there were some sort of 'PE linter' that could detect issues that would make Windows reject an image. I guess I could start one at least with what I've learned so far... Off-topic update: I started one: https://github.com/jeremyd2019/pelint/blob/main/pelint.py
msys2/MINGW-packages@master...jeremyd2019:test-go-patch 😁 Rebuilt go with those patches, and then rebuilt rclone & librclone (from its mingw-w64-rclone package). The DLL loaded in my test exe, and librclone's included python/ctypes wrapper test also worked. I also managed to get their C test for librclone working with minimal modifications for Windows. |
Change https://go.dev/cl/396195 mentions this issue: |
Thanks for confirming. |
It makes me wonder why Windows i686/x86_64 doesn't have this issue (ie, is there some earlier step sorting relocations for them?) |
What version of Go are you using (
go version
)?With fix from #51903 applied
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWindows 11 ARM64, running MSYS2's package of clang/llvm/lld 13.0.1.
What did you do?
Download and extract https://downloads.rclone.org/v1.58.0/rclone-v1.58.0.tar.gz
What did you expect to see?
Successful run.
What did you see instead?
(this is ERROR_BAD_EXE_FORMAT)
I haven't been able to figure out what's wrong with the DLL that Windows won't load it. I'm attaching it in case somebody has some idea of what to look for.
librclone.zip
Split from #51903 since this seems to be a different issue.
/cc @cherrymui and @mstorsjo in case he has some idea what's invalid about this DLL
The text was updated successfully, but these errors were encountered: