"ran out of registers" while building i386 linux kernel #41914

arndb · 2019-07-10T19:33:30Z


Bugzilla Link	42569
Version	trunk
OS	Linux
Blocks	#4440
CC	@Aaron1011,@topperc,@efriedma-quic,@fhahn,@RKSimon,@MatzeB,@nickdesaulniers,@qcolombet,@zygoloid,@rotateright

Extended Description

One file in the linux kernel produces an internal error from the register allocator. I reduced the test case to this:

typedef int (*tune_freq_func_t)(int , void * tuneargs);
static struct {
  int power_up;
  int power_down;
  tune_freq_func_t fm_tune_freq;
  tune_freq_func_t am_tune_freq;
  int fm_rsq_status;
  int agc_status;
  int intb_pin_cfg;
} a[1];
int b, c;
int fn1(void) { return a[c].fm_tune_freq(b, fn1); }

$ clang-9 -m32 -mregparm=3 -O2 -fno-strict-overflow -c si476x-cmd.c
error: ran out of registers during register allocation
1 error generated.

See https://godbolt.org/z/aQ96HC

efriedma-quic · 2019-07-11T02:04:30Z

This looks similar to https://reviews.llvm.org/rL163819

efriedma-quic · 2019-07-11T23:43:48Z

This isn't really a register allocator issue; there aren't enough registers available in the register class in question.

Aaron1011 · 2021-07-25T22:38:56Z

Minimized:

target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-unknown-linux-gnu"

%bigstruct = type { void (i32, i32)*, i32, i32, i32, i32, i32, i32 }

@bigconst = constant [1 x %bigstruct] zeroinitializer
@zero = global i32 0

define void @main() {
  %1 = load i32, i32* @zero
  %2 = getelementptr [1 x %bigstruct], [1 x %bigstruct]* @bigconst, i32 %1, i32 0, i32 0
  %3 = load void (i32, i32)*, void (i32, i32)** %2
  tail call void %3(i32 inreg 0, i32 inreg 1)
  ret void
}

Aaron1011 · 2021-07-27T23:37:36Z

There's some special handling of tail calls with inreg arguments (for 32-bit X86) in X86TargetLowering::IsEligibleForTailCallOptimization, which seems potentially relevant.

arndb · 2021-09-30T12:21:12Z

I came across another instance of the output:

clang-14 --target=x86_64-linux  -m32 -O2 -fno-omit-frame-pointer -c nf_synproxy_core.i
nf_synproxy_core.i:22:7: error: inline assembly requires more registers than available
  asm("" : "=&r"(d) : "r"(a), "r"(b), "r"(0), "r"(c), "0"(d));
      ^

based one the input below (reduced with cvise):

int a, b, d, j, l, e, g;
char c;
struct e {
  char f;
};
struct g {
  struct e h;
};
struct i {
} m(void), f, h;
enum { k } n(struct i *, struct e *);
struct {
  short ac;
} * o, *q;
int (*p)();
_Bool r;
short s, t;
unsigned char *u;
void v(void);
static void w(int *x, struct i *z) {
  struct g y;
  asm("" : "=&r"(d) : "r"(a), "r"(b), "r"(0), "r"(c), "0"(d));
  q->ac = 0;
  t = (unsigned char *)q - u;
  s = y.h.f = k;
  n(z, &(&y)->h);
  if (o)
    p(x, j, l, r);
  v();
  m();
}
void aa() { w(&e, &f); }
void ab() { w(&g, &h); }

The above happens with clang-12 through at least clang-14, but not with clang-11.

And there is one more in 64-bit mode that I have not reduced but could if that helps:

arch/x86/crypto/curve25519-x86_64.c:519:3: error: inline assembly requires more registers than available
                "  movq 0(%1), %%rdx;"                                       /* f[0] */
                ^

nickdesaulniers · 2022-03-21T23:26:22Z

Perhaps relevant to -fno-omit-frame-pointer. It looks like gcc-8 stopped obeying that for -m32. That's probably what's causing register pressure for LLVM.

https://godbolt.org/z/a7K8dddPG

nickdesaulniers · 2022-03-21T23:30:22Z

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=8e941ae950ddce1745b4d6819a7131908dd7de24 looks relevant.

nickdesaulniers · 2022-03-21T23:38:05Z

Perhaps a clang-12 regression?
https://godbolt.org/z/fos3r6x7d

nickdesaulniers · 2022-03-22T20:05:18Z

Perhaps a clang-12 regression? https://godbolt.org/z/fos3r6x7d

Bisection converged on https://reviews.llvm.org/D83996. (cc @topperc, @phoebewang ); doesn't look like a regression per se. But it definitely looks like having a scheduling model or not affects this, since we were also seeing issues for -march=atom.

topperc · 2022-03-22T20:47:06Z

There are two copies from EAX and EDX into virtual regsiters at the start of the function immediately after isel. Those got re-scheduled after the inline assembly which oversubscribed the register allocator for the inline assembly.

topperc · 2022-03-22T20:49:36Z

Adding

def  : InstRW<[WriteMove], (instrs COPY)>;

to the SandyBridge scheduler model makes it go away.

nickdesaulniers · 2022-03-23T20:50:17Z

Uploaded a test case to pre-commit/demonstrate a diff once we agree on a fix: https://reviews.llvm.org/D122348.

From discord:

(Nick): so @ctopper IIUC, because the copies of EAX+EDX are made AFTER the inline asm, the live range of EAX+EDX are extended across the inline asm where previously they were not, which removes EAX+EDX from the available physreg pool that regalloc has to work with for the INLINEASM?

(Craig): that was the conclusion I came to

nickdesaulniers · 2022-03-23T21:02:43Z

Adding
def  : InstRW<[WriteMove], (instrs COPY)>;
to the SandyBridge scheduler model makes it go away.

https://reviews.llvm.org/D122350 is a tentative patch, but is touching the schedule model really the right fix? Shouldn't we modify instruction scheduling to account for (or straight up avoid) extending live ranges of physregs across INLINEASM statements?

nickdesaulniers · 2022-03-23T21:13:03Z

Shouldn't we modify instruction scheduling to account for (or straight up avoid) extending live ranges of physregs across INLINEASM statements?

isSchedBoundary looks interesting. Maybe something like:

diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp
index 40cfaebcf55f..d22b5b2a6f96 100644
--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -463,7 +463,13 @@ static bool isSchedBoundary(MachineBasicBlock::iterator MI,
                             MachineBasicBlock *MBB,
                             MachineFunction *MF,
                             const TargetInstrInfo *TII) {
-  return MI->isCall() || TII->isSchedulingBoundary(*MI, MBB, *MF);
+  if (MI->isCall() || TII->isSchedulingBoundary(*MI, MBB, *MF))
+    return true;
+  if (MI->isInlineAsm() && !MBB->livein_empty())
+    for (MachineOperand &MO : MI->operands())
+      if (MO.isReg())
+        return true;
+  return false;
 }
 
 /// A region of an MBB for scheduling.

phoebewang · 2022-03-24T03:56:22Z

I think this may have big impact to current use of inline asm, given out of registers cases are rare.
How about sum the reg with the same class and check the pressure:

TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();
...
for (int I = 0; I < N; ++I)
  if (TRI->getRegPressureLimit(RC[I], *MF) < RCCount[I])
    return true;

nickdesaulniers · 2022-04-22T22:23:46Z

https://reviews.llvm.org/D124308 PTAL

To match the cost of other scheduling models. This is expected to schedule mov instructions around INLINEASM less frequently for the default machineschedule (pre-RA scheduling). Suggested by Craig Topper. Link: #41914 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122350

nickdesaulniers · 2022-05-05T18:22:20Z

I just landed a patch that will ship in clang-15. It makes it less likely that we schedule movs in a way that extends the LiveRange of phyregs across inline asm in a way that RegAlloc will fail.

That said, the MachineScheduler is still not hardened from this re-occuring. Ideally, the scheduler could check if reordering a mov across an inlineasm would produce unsatisfiable constraints on the register allocator. IIRC, MachineScheduler just uses a priority queue insertion to order instructions that don't have data dependencies between them. So this might not be the last time we see this issue, but my patch should help a bit. Closing for now, but let's open a new bug report with more test cases if we can still reproduce this with clang-15.

(My patch allows both test cases by @arndb to compile without issue)

nickdesaulniers · 2022-05-05T18:33:30Z

As is tradition, I spoke too soon. This fixes one of our instances for no -march=, but for -march=atom we see yet another instance.
ClangBuiltLinux/linux#1589

RKSimon · 2022-05-05T18:58:42Z

@nickdesaulniers Is this on 32-bit atom builds? We always use ILP on Atom CPU targets which is going to waste registers that 32-bit can't really afford.

  // For 64-bit, since we have so many registers, use the ILP scheduler.
  // For 32-bit, use the register pressure specific scheduling.
  // For Atom, always use ILP scheduling.
  if (Subtarget.isAtom())
    setSchedulingPreference(Sched::ILP);
  else if (Subtarget.is64Bit())
    setSchedulingPreference(Sched::ILP);
  else
    setSchedulingPreference(Sched::RegPressure);

nickdesaulniers · 2022-05-06T21:29:38Z

Is this on 32-bit atom builds?

No. 64b ATOM.

926afd7 looks like it moved 32b atom from hybrid to ilp.

If I change atom to hybrid, I can still reproduce the register exhaustion compilation failure. Same for RegPressure.

Maybe I should be looking at the operator() overloads for the derived classes of queue_sort.

I've posted a smaller reproducer here.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

This was referenced Feb 14, 2022

Inline assembler dies on WindowMaker source file (wrlib/x86_specific.c) #10209

Closed

march=atom compiletime issues ClangBuiltLinux/linux#1483

Closed

inline assembly requires more registers than available ClangBuiltLinux/linux#1589

Open

nickdesaulniers self-assigned this Apr 22, 2022

nickdesaulniers added the backend:X86 Scheduler Models Accuracy of X86 scheduler models label Apr 22, 2022

nickdesaulniers closed this as completed May 5, 2022

nickdesaulniers mentioned this issue May 5, 2022

Fedora i686 config minus CONFIG_FORTIFY_SOURCE error in arch/x86/include/asm/checksum_32.h ClangBuiltLinux/linux#1442

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"ran out of registers" while building i386 linux kernel #41914

"ran out of registers" while building i386 linux kernel #41914

arndb mannequin commented Jul 10, 2019 •

edited by nickdesaulniers

Loading

efriedma-quic commented Jul 11, 2019

efriedma-quic commented Jul 11, 2019

Aaron1011 commented Jul 25, 2021 •

edited by nickdesaulniers

Loading

Aaron1011 commented Jul 27, 2021 •

edited by nickdesaulniers

Loading

arndb mannequin commented Sep 30, 2021 •

edited by nickdesaulniers

Loading

nickdesaulniers commented Mar 21, 2022

nickdesaulniers commented Mar 21, 2022

nickdesaulniers commented Mar 21, 2022

nickdesaulniers commented Mar 22, 2022 •

edited

Loading

topperc commented Mar 22, 2022

topperc commented Mar 22, 2022

nickdesaulniers commented Mar 23, 2022

nickdesaulniers commented Mar 23, 2022

nickdesaulniers commented Mar 23, 2022 •

edited

Loading

phoebewang commented Mar 24, 2022

nickdesaulniers commented Apr 22, 2022

nickdesaulniers commented May 5, 2022

nickdesaulniers commented May 5, 2022 •

edited

Loading

RKSimon commented May 5, 2022

nickdesaulniers commented May 6, 2022

"ran out of registers" while building i386 linux kernel #41914

"ran out of registers" while building i386 linux kernel #41914

Comments

arndb mannequin commented Jul 10, 2019 • edited by nickdesaulniers Loading

Extended Description

efriedma-quic commented Jul 11, 2019

efriedma-quic commented Jul 11, 2019

Aaron1011 commented Jul 25, 2021 • edited by nickdesaulniers Loading

Aaron1011 commented Jul 27, 2021 • edited by nickdesaulniers Loading

arndb mannequin commented Sep 30, 2021 • edited by nickdesaulniers Loading

based one the input below (reduced with cvise):

nickdesaulniers commented Mar 21, 2022

nickdesaulniers commented Mar 21, 2022

nickdesaulniers commented Mar 21, 2022

nickdesaulniers commented Mar 22, 2022 • edited Loading

topperc commented Mar 22, 2022

topperc commented Mar 22, 2022

nickdesaulniers commented Mar 23, 2022

nickdesaulniers commented Mar 23, 2022

nickdesaulniers commented Mar 23, 2022 • edited Loading

phoebewang commented Mar 24, 2022

nickdesaulniers commented Apr 22, 2022

nickdesaulniers commented May 5, 2022

nickdesaulniers commented May 5, 2022 • edited Loading

RKSimon commented May 5, 2022

nickdesaulniers commented May 6, 2022

arndb mannequin commented Jul 10, 2019 •

edited by nickdesaulniers

Loading

Aaron1011 commented Jul 25, 2021 •

edited by nickdesaulniers

Loading

Aaron1011 commented Jul 27, 2021 •

edited by nickdesaulniers

Loading

arndb mannequin commented Sep 30, 2021 •

edited by nickdesaulniers

Loading

nickdesaulniers commented Mar 22, 2022 •

edited

Loading

nickdesaulniers commented Mar 23, 2022 •

edited

Loading

nickdesaulniers commented May 5, 2022 •

edited

Loading