llc lowers intrinsics incorrectly with the presense of regparm attribute #4369

llvmbot · 2009-04-16T18:39:37Z


Bugzilla Link	3997
Resolution	FIXED
Resolved on	Mar 20, 2017 13:45
Version	trunk
OS	Linux
Blocks	llvm/llvm-bugzilla-archive#4068
Attachments	test case
Reporter	LLVM Bugzilla Contributor
CC	@asl,@chandlerc,@pageexec

Extended Description

llc does not handle this case correctly:

call void @&#8203;llvm.memcpy.i32(i8* %3, i8* %4, i32 2052, i32 4)
%dst1 = bitcast [128 x i8]* %dst to i8*		; <i8*> [#uses=1]
%src2 = bitcast [128 x i8]* %src to i8*		; <i8*> [#uses=1]
%5 = call i8* @&#8203;memcpy(i8* inreg %dst1, i8* inreg %src2, i32 inreg 128) nounwind		; <i8*> [#uses=0]

when llvm.memcpy.i32 lowers to call to memcpy, it does not follow the customized memcpy calling convention.

for the first llvm.memcpy.i32, the generated assemblies are:

movl	%eax, (%edx)
movl	%ecx, 4(%edx)
movl	$2052, 8(%edx)
call	memcpy

for the second call of memcpy, the codes are:

leal	-140(%ebp), %eax
leal	-268(%ebp), %ecx
movl	$128, %edx
movl	%edx, -276(%ebp)
movl	%ecx, %edx
movl	-276(%ebp), %ecx
call	memcpy

The text was updated successfully, but these errors were encountered:

llvmbot · 2009-04-16T18:42:57Z

running command:
llvm-gcc -S -o - a.c -nostdinc -mregparm=3 -fno-builtin

asl · 2009-04-16T19:23:25Z

llc is doing correct thing here. The intrinsic does not have any regparm attributes.

llvmbot · 2009-04-16T20:16:25Z

So is it possible to leverage the benefits of fastcall/regparm for these intrinsic?

OS Kernel specializes the memcpy call for performance reasons.

llvmbot · 2009-04-21T17:18:14Z

The problem here is llvm-gcc is not respecting either the -ffreestanding or -fno-builtin flags by generating an llvm intrinsic for memcpy. The definition in the C file (with the regparm flag) no longer matches the intrinsic and should not be lowered to an intrinsic in the IR.

llvmbot · 2009-04-21T19:04:37Z

The problem here is llvm-gcc is not respecting either the -ffreestanding or
-fno-builtin flags by generating an llvm intrinsic for memcpy.

Neither does gcc. In fact the gcc docs say "GCC requires the
freestanding environment provide memcpy', memmove', memset' and memcmp'" and if you check inside gcc you will see that it will
generate memcpy intrinsics regardless of the flags you mention in
some circumstances. So I think what you are saying is invalid.

PS: This has come up before, so probably you can find previous
discussions in bugzilla or the mailing list archives.

llvmbot · 2009-04-21T19:34:08Z

gcc is generating calls with the regparm=3 calling convention to the programmer supplied memcpy. llvm is generating calls with the normal calling convention to the programmer supplied memcpy (which is regparm=3). I will grant that this is a 1 line change in the source program in this case to deal with this difference, but it doesn't seem like a satisfactory solution.

llvmbot · 2009-04-21T20:00:06Z

As noted in the gcc docs, gcc requires the system to supply memcpy.
What does this mean exactly? It doesn't mean just a function
named memcpy, it means something that can be called like standard
memcpy, that does the job of standard memcpy. The problem is
that the memcpy in this PR can't be called like standard memcpy
because it is using a different ABI by requiring some parameters
to be passed in registers. This is a user error. You will get
exactly the same problem with gcc if you pass it a testcase that
causes it to generate a call to memcpy internally. It is simply
easier to hit the problem with llvm-gcc.

That said, what would it take to change this? The problem is
that gcc/llvm-gcc do need to produce the memcpy operation in
some places, for example when do struct assignments. In general
they can't magically know that memcpy needs to be called using an
unusual ABI, because they might be compiling a file that didn't
include the header defining the unusual memcpy. So they would
probably have to emulate memcpy by some explicit loop rather than
generating an memcpy call.

llvmbot · 2009-04-21T21:00:31Z

Well, gcc 4.3 generates "rep; movsb" assembly for struct assignments instead of calls to memcpy (as least on my machine).

This bug only shows up when you are compiling performance critical code. In fact, it makes llvm-gcc fail to miscompile Linux kernel.

llvmbot · 2009-04-21T21:48:00Z

llvm-gcc will also emit this kind of code (rep; movsb) if it
considers it profitable. gcc will emit a call to memcpy in
various circumstances, see emit_block_move_hints in expr.c.
As usual with gcc it's not clear exactly when it does this.

By the way, presumably the linux kernel comment is because
it likes to compile everything with -mregparm=3 ?

llvmbot · 2009-04-21T22:38:35Z

linux kernel includes its only implementation of memcpy/memset in its tree, which are compiled with -mregparm=3

llvmbot · 2009-04-21T22:42:38Z

The simple correction to linux is to tag the function with an attribute, or, and this is what I did in my tree, make memcpy a (regparm=0) wrapper for kmemcpy and have explicit users use kmemcpy. This takes 4 lines of code.

llvmbot · 2014-01-08T14:22:42Z

As noted in the gcc docs, gcc requires the system to supply memcpy.
What does this mean exactly? It doesn't mean just a function
named memcpy, it means something that can be called like standard
memcpy, that does the job of standard memcpy. The problem is
that the memcpy in this PR can't be called like standard memcpy

The compiler was invoked with -ffreestanding -mregparm=3.

The memcpy in this PR is, by that definition, the standard memcpy.

You will get exactly the same problem with gcc if you pass it a
testcase that causes it to generate a call to memcpy internally.

That does not appear to be the case (using -mtune=i686 makes it less inclined to just use rep movs, btw):

Try compiling the C file in (duplicate) bug 18415, as follows:
gcc -m32 -O3 -o- -S foo.c -mtune=i686 -mregparm=3

Using GCC 4.8.2 on Fedora 20, it definitely seems to invoke 'memcpy' with the correct ABI, as specified on the command line.

llvmbot · 2014-01-08T14:23:23Z

*** Bug llvm/llvm-bugzilla-archive#18415 has been marked as a duplicate of this bug. ***

chandlerc · 2014-01-23T15:18:52Z

Rough sketch of how you might solve this...
Here is a patch that shows vaguely where the problem manifests in the backend, and one (potentially bad) approach to solving it.

This is likely a decent starting point to factor and use for memcpy, memmove, etc. However, there may be better/cleaner ways of doing this.

Also, while it correctly lowers using the ABI, it doesn't realize this makes the function tail callable. That might require brute force of looking at the register usage and synthesizing at tail call when suitable here in the DAG. Or it might require some cleanup when doing isel on the dag. :: shudder ::

llvmbot · 2014-01-23T16:17:01Z

Hm, Perhaps I misunderstand but I think that patch misses the point. However, it does enlighten me somewhat, so thanks.

The point is not that we have an existing memcpy() function in this module, declared with some bizarreness like attribute((regparm(3)).

The point is that we have invoked the compiler with -mregparm=3 so everything is using that ABI. I don't think this patch will address that, will it?

I'm a little confused because -mregparm=3 doesn't seem to match any of the predefined calling conventions enumerated by CallingConv::ID. How are we even supposed to ask for what we want?

On the clang side, we have X86_32ABIInfo::shouldUseInReg() which will calculate, for each argument, whether it's supposed to be in a register or not. On the LLVM side we don't have that flexibility. Or at least not in the same way.

One simple(ish) solution might be to define CallingConv::ID::RegParm1, CallingConv::ID::RegParm2, CallingConv::ID::RegParm3 etc., and let each target implement them as appropriate.

Is there a better way?

chandlerc · 2014-01-23T23:54:59Z

Hm, Perhaps I misunderstand but I think that patch misses the point.
However, it does enlighten me somewhat, so thanks.

The point is not that we have an existing memcpy() function in this
module, declared with some bizarreness like attribute((regparm(3)).

The point is that we have invoked the compiler with -mregparm=3 so
everything is using that ABI. I don't think this patch will address that,
will it?

Heh, I was actually discussing this other possibility just now.

I'm a little confused because -mregparm=3 doesn't seem to match any of the
predefined calling conventions enumerated by CallingConv::ID. How are we
even supposed to ask for what we want?

On the clang side, we have X86_32ABIInfo::shouldUseInReg() which will
calculate, for each argument, whether it's supposed to be in a register or
not. On the LLVM side we don't have that flexibility. Or at least not in the
same way.

One simple(ish) solution might be to define CallingConv::ID::RegParm1,
CallingConv::ID::RegParm2, CallingConv::ID::RegParm3 etc., and let each
target implement them as appropriate.

Is there a better way?

The current way this is modeled is relatively simple: the Clang ABI information attaches 'inreg' onto the relevant parameters of the function declaration. This, combined with the baseline calling convention, provides the information needed to lower the call correctly.

The problem is that we currently rely exclusively on function declarations to encode this type of information. If we don't have the declaration there is no mechanism to communicate this from the frontend (which has the complete ABI information) to the backend which is doing the lowering.

All of the options I see for this are pretty bad.

Come up with an encoding of the variations on the calling conventions like '-mregparm' and put that in a module flag which we can then use to build a correct call (which may only work for simple functions that only accept integers).
Drastically expand the calling conventions to cover every permutation (I don't even know if this is possible).
Require the module to provide declarations (and LLVM to preserve those declarations) for all functions which codegen can synthesize a call to, and always use those declarations for lowering.

3b) Same as 3, but make certain declarations optional when we can emit straight line code (memcpy can always be emitted as a tiny rep loop).

Something else I've not thought of....

Sadly, the biggest disadvantage to #3 are huge swath of math functions that we would have to do this for, not memcpy and friends, so #3b seems to not help the complexity much. #1 and #2 work, but seem likely to be brittle and embed very nitty-gritty ABI details even farther into the code generator.

llvmbot · 2014-11-12T14:27:54Z

bump.

We still see this issue in arch/x86/boot (memcpy) .

llvmbot · 2015-08-10T10:11:09Z

The current way this is modeled is relatively simple: the Clang ABI
information attaches 'inreg' onto the relevant parameters of the function
declaration. This, combined with the baseline calling convention, provides
the information needed to lower the call correctly.

How about callback from the code generator to the front end, giving the API of an intrinsic function that it's about to emit a call to, and allowing the front end to tweak it as appropriate (by adding 'inreg', etc.).

llvmbot · 2015-10-08T16:59:26Z

This is also going to come up - very soon - in the context of the MCU psABI as well, since that ABI basically mandates -mregparm=3.

llvmbot · 2017-03-20T20:45:47Z

Fixed in r298179.

edwintorok · 2021-11-27T01:36:52Z

mentioned in issue llvm/llvm-bugzilla-archive#4068

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llc lowers intrinsics incorrectly with the presense of regparm attribute #4369

llc lowers intrinsics incorrectly with the presense of regparm attribute #4369

llvmbot commented Apr 16, 2009

llvmbot commented Apr 16, 2009

asl commented Apr 16, 2009

llvmbot commented Apr 16, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Jan 8, 2014

llvmbot commented Jan 8, 2014

chandlerc commented Jan 23, 2014

llvmbot commented Jan 23, 2014

chandlerc commented Jan 23, 2014

llvmbot commented Nov 12, 2014

llvmbot commented Aug 10, 2015

llvmbot commented Oct 8, 2015

llvmbot commented Mar 20, 2017

edwintorok commented Nov 27, 2021

llc lowers intrinsics incorrectly with the presense of regparm attribute #4369

llc lowers intrinsics incorrectly with the presense of regparm attribute #4369

Comments

llvmbot commented Apr 16, 2009

Extended Description

llvmbot commented Apr 16, 2009

asl commented Apr 16, 2009

llvmbot commented Apr 16, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Apr 21, 2009

llvmbot commented Jan 8, 2014

llvmbot commented Jan 8, 2014

chandlerc commented Jan 23, 2014

llvmbot commented Jan 23, 2014

chandlerc commented Jan 23, 2014

llvmbot commented Nov 12, 2014

llvmbot commented Aug 10, 2015

llvmbot commented Oct 8, 2015

llvmbot commented Mar 20, 2017

edwintorok commented Nov 27, 2021