Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llc lowers intrinsics incorrectly with the presense of regparm attribute #4369

Closed
llvmbot opened this issue Apr 16, 2009 · 21 comments
Closed
Labels
bugzilla Issues migrated from bugzilla tools:llc

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 16, 2009

Bugzilla Link 3997
Resolution FIXED
Resolved on Mar 20, 2017 13:45
Version trunk
OS Linux
Blocks llvm/llvm-bugzilla-archive#4068
Attachments test case
Reporter LLVM Bugzilla Contributor
CC @asl,@chandlerc,@pageexec

Extended Description

llc does not handle this case correctly:

call void @​llvm.memcpy.i32(i8* %3, i8* %4, i32 2052, i32 4)
%dst1 = bitcast [128 x i8]* %dst to i8*		; <i8*> [#uses=1]
%src2 = bitcast [128 x i8]* %src to i8*		; <i8*> [#uses=1]
%5 = call i8* @&#8203;memcpy(i8* inreg %dst1, i8* inreg %src2, i32 inreg 128) nounwind		; <i8*> [#uses=0]

when llvm.memcpy.i32 lowers to call to memcpy, it does not follow the customized memcpy calling convention.

for the first llvm.memcpy.i32, the generated assemblies are:

movl	%eax, (%edx)
movl	%ecx, 4(%edx)
movl	$2052, 8(%edx)
call	memcpy

for the second call of memcpy, the codes are:

leal	-140(%ebp), %eax
leal	-268(%ebp), %ecx
movl	$128, %edx
movl	%edx, -276(%ebp)
movl	%ecx, %edx
movl	-276(%ebp), %ecx
call	memcpy
@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 16, 2009

running command:
llvm-gcc -S -o - a.c -nostdinc -mregparm=3 -fno-builtin

@asl
Copy link
Collaborator

asl commented Apr 16, 2009

llc is doing correct thing here. The intrinsic does not have any regparm attributes.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 16, 2009

So is it possible to leverage the benefits of fastcall/regparm for these intrinsic?

OS Kernel specializes the memcpy call for performance reasons.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 21, 2009

The problem here is llvm-gcc is not respecting either the -ffreestanding or -fno-builtin flags by generating an llvm intrinsic for memcpy. The definition in the C file (with the regparm flag) no longer matches the intrinsic and should not be lowered to an intrinsic in the IR.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 21, 2009

The problem here is llvm-gcc is not respecting either the -ffreestanding or
-fno-builtin flags by generating an llvm intrinsic for memcpy.

Neither does gcc. In fact the gcc docs say "GCC requires the
freestanding environment provide memcpy', memmove', memset' and memcmp'" and if you check inside gcc you will see that it will
generate memcpy intrinsics regardless of the flags you mention in
some circumstances. So I think what you are saying is invalid.

PS: This has come up before, so probably you can find previous
discussions in bugzilla or the mailing list archives.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 21, 2009

gcc is generating calls with the regparm=3 calling convention to the programmer supplied memcpy. llvm is generating calls with the normal calling convention to the programmer supplied memcpy (which is regparm=3). I will grant that this is a 1 line change in the source program in this case to deal with this difference, but it doesn't seem like a satisfactory solution.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 21, 2009

As noted in the gcc docs, gcc requires the system to supply memcpy.
What does this mean exactly? It doesn't mean just a function
named memcpy, it means something that can be called like standard
memcpy, that does the job of standard memcpy. The problem is
that the memcpy in this PR can't be called like standard memcpy
because it is using a different ABI by requiring some parameters
to be passed in registers. This is a user error. You will get
exactly the same problem with gcc if you pass it a testcase that
causes it to generate a call to memcpy internally. It is simply
easier to hit the problem with llvm-gcc.

That said, what would it take to change this? The problem is
that gcc/llvm-gcc do need to produce the memcpy operation in
some places, for example when do struct assignments. In general
they can't magically know that memcpy needs to be called using an
unusual ABI, because they might be compiling a file that didn't
include the header defining the unusual memcpy. So they would
probably have to emulate memcpy by some explicit loop rather than
generating an memcpy call.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 21, 2009

Well, gcc 4.3 generates "rep; movsb" assembly for struct assignments instead of calls to memcpy (as least on my machine).

This bug only shows up when you are compiling performance critical code. In fact, it makes llvm-gcc fail to miscompile Linux kernel.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 21, 2009

llvm-gcc will also emit this kind of code (rep; movsb) if it
considers it profitable. gcc will emit a call to memcpy in
various circumstances, see emit_block_move_hints in expr.c.
As usual with gcc it's not clear exactly when it does this.

By the way, presumably the linux kernel comment is because
it likes to compile everything with -mregparm=3 ?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 21, 2009

linux kernel includes its only implementation of memcpy/memset in its tree, which are compiled with -mregparm=3

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 21, 2009

The simple correction to linux is to tag the function with an attribute, or, and this is what I did in my tree, make memcpy a (regparm=0) wrapper for kmemcpy and have explicit users use kmemcpy. This takes 4 lines of code.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 8, 2014

As noted in the gcc docs, gcc requires the system to supply memcpy.
What does this mean exactly? It doesn't mean just a function
named memcpy, it means something that can be called like standard
memcpy, that does the job of standard memcpy. The problem is
that the memcpy in this PR can't be called like standard memcpy

The compiler was invoked with -ffreestanding -mregparm=3.

The memcpy in this PR is, by that definition, the standard memcpy.

You will get exactly the same problem with gcc if you pass it a
testcase that causes it to generate a call to memcpy internally.

That does not appear to be the case (using -mtune=i686 makes it less inclined to just use rep movs, btw):

Try compiling the C file in (duplicate) bug 18415, as follows:
gcc -m32 -O3 -o- -S foo.c -mtune=i686 -mregparm=3

Using GCC 4.8.2 on Fedora 20, it definitely seems to invoke 'memcpy' with the correct ABI, as specified on the command line.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 8, 2014

*** Bug llvm/llvm-bugzilla-archive#18415 has been marked as a duplicate of this bug. ***

@chandlerc
Copy link
Member

Rough sketch of how you might solve this...
Here is a patch that shows vaguely where the problem manifests in the backend, and one (potentially bad) approach to solving it.

This is likely a decent starting point to factor and use for memcpy, memmove, etc. However, there may be better/cleaner ways of doing this.

Also, while it correctly lowers using the ABI, it doesn't realize this makes the function tail callable. That might require brute force of looking at the register usage and synthesizing at tail call when suitable here in the DAG. Or it might require some cleanup when doing isel on the dag. :: shudder ::

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 23, 2014

Hm, Perhaps I misunderstand but I think that patch misses the point. However, it does enlighten me somewhat, so thanks.

The point is not that we have an existing memcpy() function in this module, declared with some bizarreness like attribute((regparm(3)).

The point is that we have invoked the compiler with -mregparm=3 so everything is using that ABI. I don't think this patch will address that, will it?

I'm a little confused because -mregparm=3 doesn't seem to match any of the predefined calling conventions enumerated by CallingConv::ID. How are we even supposed to ask for what we want?

On the clang side, we have X86_32ABIInfo::shouldUseInReg() which will calculate, for each argument, whether it's supposed to be in a register or not. On the LLVM side we don't have that flexibility. Or at least not in the same way.

One simple(ish) solution might be to define CallingConv::ID::RegParm1, CallingConv::ID::RegParm2, CallingConv::ID::RegParm3 etc., and let each target implement them as appropriate.

Is there a better way?

@chandlerc
Copy link
Member

Hm, Perhaps I misunderstand but I think that patch misses the point.
However, it does enlighten me somewhat, so thanks.

The point is not that we have an existing memcpy() function in this
module, declared with some bizarreness like attribute((regparm(3)).

The point is that we have invoked the compiler with -mregparm=3 so
everything is using that ABI. I don't think this patch will address that,
will it?

Heh, I was actually discussing this other possibility just now.

I'm a little confused because -mregparm=3 doesn't seem to match any of the
predefined calling conventions enumerated by CallingConv::ID. How are we
even supposed to ask for what we want?

On the clang side, we have X86_32ABIInfo::shouldUseInReg() which will
calculate, for each argument, whether it's supposed to be in a register or
not. On the LLVM side we don't have that flexibility. Or at least not in the
same way.

One simple(ish) solution might be to define CallingConv::ID::RegParm1,
CallingConv::ID::RegParm2, CallingConv::ID::RegParm3 etc., and let each
target implement them as appropriate.

Is there a better way?

The current way this is modeled is relatively simple: the Clang ABI information attaches 'inreg' onto the relevant parameters of the function declaration. This, combined with the baseline calling convention, provides the information needed to lower the call correctly.

The problem is that we currently rely exclusively on function declarations to encode this type of information. If we don't have the declaration there is no mechanism to communicate this from the frontend (which has the complete ABI information) to the backend which is doing the lowering.

All of the options I see for this are pretty bad.

  1. Come up with an encoding of the variations on the calling conventions like '-mregparm' and put that in a module flag which we can then use to build a correct call (which may only work for simple functions that only accept integers).

  2. Drastically expand the calling conventions to cover every permutation (I don't even know if this is possible).

  3. Require the module to provide declarations (and LLVM to preserve those declarations) for all functions which codegen can synthesize a call to, and always use those declarations for lowering.

3b) Same as 3, but make certain declarations optional when we can emit straight line code (memcpy can always be emitted as a tiny rep loop).

  1. Something else I've not thought of....

Sadly, the biggest disadvantage to #​3 are huge swath of math functions that we would have to do this for, not memcpy and friends, so #​3b seems to not help the complexity much. #​1 and #​2 work, but seem likely to be brittle and embed very nitty-gritty ABI details even farther into the code generator.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 12, 2014

bump.

We still see this issue in arch/x86/boot (memcpy) .

@llvmbot
Copy link
Collaborator Author

llvmbot commented Aug 10, 2015

The current way this is modeled is relatively simple: the Clang ABI
information attaches 'inreg' onto the relevant parameters of the function
declaration. This, combined with the baseline calling convention, provides
the information needed to lower the call correctly.

How about callback from the code generator to the front end, giving the API of an intrinsic function that it's about to emit a call to, and allowing the front end to tweak it as appropriate (by adding 'inreg', etc.).

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 8, 2015

This is also going to come up - very soon - in the context of the MCU psABI as well, since that ABI basically mandates -mregparm=3.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 20, 2017

Fixed in r298179.

@edwintorok
Copy link
Contributor

mentioned in issue llvm/llvm-bugzilla-archive#4068

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla tools:llc
Projects
None yet
Development

No branches or pull requests

4 participants