Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ran out of registers" while building i386 linux kernel #41914

Closed
arndb mannequin opened this issue Jul 10, 2019 · 20 comments
Closed

"ran out of registers" while building i386 linux kernel #41914

arndb mannequin opened this issue Jul 10, 2019 · 20 comments
Assignees
Labels
backend:X86 Scheduler Models Accuracy of X86 scheduler models backend:X86 bugzilla Issues migrated from bugzilla

Comments

@arndb
Copy link
Mannequin

arndb mannequin commented Jul 10, 2019

Bugzilla Link 42569
Version trunk
OS Linux
Blocks #4440
CC @Aaron1011,@topperc,@efriedma-quic,@fhahn,@RKSimon,@MatzeB,@nickdesaulniers,@qcolombet,@zygoloid,@rotateright

Extended Description

One file in the linux kernel produces an internal error from the register allocator. I reduced the test case to this:

typedef int (*tune_freq_func_t)(int , void * tuneargs);
static struct {
  int power_up;
  int power_down;
  tune_freq_func_t fm_tune_freq;
  tune_freq_func_t am_tune_freq;
  int fm_rsq_status;
  int agc_status;
  int intb_pin_cfg;
} a[1];
int b, c;
int fn1(void) { return a[c].fm_tune_freq(b, fn1); }
$ clang-9 -m32 -mregparm=3 -O2 -fno-strict-overflow -c si476x-cmd.c
error: ran out of registers during register allocation
1 error generated.

See https://godbolt.org/z/aQ96HC

@efriedma-quic
Copy link
Collaborator

This looks similar to https://reviews.llvm.org/rL163819

@efriedma-quic
Copy link
Collaborator

This isn't really a register allocator issue; there aren't enough registers available in the register class in question.

@Aaron1011
Copy link

Aaron1011 commented Jul 25, 2021

Minimized:

target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-unknown-linux-gnu"

%bigstruct = type { void (i32, i32)*, i32, i32, i32, i32, i32, i32 }

@bigconst = constant [1 x %bigstruct] zeroinitializer
@zero = global i32 0

define void @main() {
  %1 = load i32, i32* @zero
  %2 = getelementptr [1 x %bigstruct], [1 x %bigstruct]* @bigconst, i32 %1, i32 0, i32 0
  %3 = load void (i32, i32)*, void (i32, i32)** %2
  tail call void %3(i32 inreg 0, i32 inreg 1)
  ret void
}

@Aaron1011
Copy link

Aaron1011 commented Jul 27, 2021

There's some special handling of tail calls with inreg arguments (for 32-bit X86) in X86TargetLowering::IsEligibleForTailCallOptimization, which seems potentially relevant.

@arndb
Copy link
Mannequin Author

arndb mannequin commented Sep 30, 2021

I came across another instance of the output:

clang-14 --target=x86_64-linux  -m32 -O2 -fno-omit-frame-pointer -c nf_synproxy_core.i
nf_synproxy_core.i:22:7: error: inline assembly requires more registers than available
  asm("" : "=&r"(d) : "r"(a), "r"(b), "r"(0), "r"(c), "0"(d));
      ^

based one the input below (reduced with cvise):

int a, b, d, j, l, e, g;
char c;
struct e {
  char f;
};
struct g {
  struct e h;
};
struct i {
} m(void), f, h;
enum { k } n(struct i *, struct e *);
struct {
  short ac;
} * o, *q;
int (*p)();
_Bool r;
short s, t;
unsigned char *u;
void v(void);
static void w(int *x, struct i *z) {
  struct g y;
  asm("" : "=&r"(d) : "r"(a), "r"(b), "r"(0), "r"(c), "0"(d));
  q->ac = 0;
  t = (unsigned char *)q - u;
  s = y.h.f = k;
  n(z, &(&y)->h);
  if (o)
    p(x, j, l, r);
  v();
  m();
}
void aa() { w(&e, &f); }
void ab() { w(&g, &h); }

The above happens with clang-12 through at least clang-14, but not with clang-11.

And there is one more in 64-bit mode that I have not reduced but could if that helps:

arch/x86/crypto/curve25519-x86_64.c:519:3: error: inline assembly requires more registers than available
                "  movq 0(%1), %%rdx;"                                       /* f[0] */
                ^

@nickdesaulniers
Copy link
Member

Perhaps relevant to -fno-omit-frame-pointer. It looks like gcc-8 stopped obeying that for -m32. That's probably what's causing register pressure for LLVM.

https://godbolt.org/z/a7K8dddPG

@nickdesaulniers
Copy link
Member

@nickdesaulniers
Copy link
Member

Perhaps a clang-12 regression?
https://godbolt.org/z/fos3r6x7d

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Mar 22, 2022

Perhaps a clang-12 regression? https://godbolt.org/z/fos3r6x7d

Bisection converged on https://reviews.llvm.org/D83996. (cc @topperc, @phoebewang ); doesn't look like a regression per se. But it definitely looks like having a scheduling model or not affects this, since we were also seeing issues for -march=atom.

@topperc
Copy link
Collaborator

topperc commented Mar 22, 2022

There are two copies from EAX and EDX into virtual regsiters at the start of the function immediately after isel. Those got re-scheduled after the inline assembly which oversubscribed the register allocator for the inline assembly.

@topperc
Copy link
Collaborator

topperc commented Mar 22, 2022

Adding

def  : InstRW<[WriteMove], (instrs COPY)>;

to the SandyBridge scheduler model makes it go away.

@nickdesaulniers
Copy link
Member

Uploaded a test case to pre-commit/demonstrate a diff once we agree on a fix: https://reviews.llvm.org/D122348.

From discord:

(Nick): so @ctopper IIUC, because the copies of EAX+EDX are made AFTER the inline asm, the live range of EAX+EDX are extended across the inline asm where previously they were not, which removes EAX+EDX from the available physreg pool that regalloc has to work with for the INLINEASM?

(Craig): that was the conclusion I came to

@nickdesaulniers
Copy link
Member

Adding

def  : InstRW<[WriteMove], (instrs COPY)>;

to the SandyBridge scheduler model makes it go away.

https://reviews.llvm.org/D122350 is a tentative patch, but is touching the schedule model really the right fix? Shouldn't we modify instruction scheduling to account for (or straight up avoid) extending live ranges of physregs across INLINEASM statements?

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Mar 23, 2022

Shouldn't we modify instruction scheduling to account for (or straight up avoid) extending live ranges of physregs across INLINEASM statements?

isSchedBoundary looks interesting. Maybe something like:

diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp
index 40cfaebcf55f..d22b5b2a6f96 100644
--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -463,7 +463,13 @@ static bool isSchedBoundary(MachineBasicBlock::iterator MI,
                             MachineBasicBlock *MBB,
                             MachineFunction *MF,
                             const TargetInstrInfo *TII) {
-  return MI->isCall() || TII->isSchedulingBoundary(*MI, MBB, *MF);
+  if (MI->isCall() || TII->isSchedulingBoundary(*MI, MBB, *MF))
+    return true;
+  if (MI->isInlineAsm() && !MBB->livein_empty())
+    for (MachineOperand &MO : MI->operands())
+      if (MO.isReg())
+        return true;
+  return false;
 }
 
 /// A region of an MBB for scheduling.

@phoebewang
Copy link
Contributor

I think this may have big impact to current use of inline asm, given out of registers cases are rare.
How about sum the reg with the same class and check the pressure:

TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();
...
for (int I = 0; I < N; ++I)
  if (TRI->getRegPressureLimit(RC[I], *MF) < RCCount[I])
    return true;

@nickdesaulniers
Copy link
Member

@nickdesaulniers nickdesaulniers self-assigned this Apr 22, 2022
@nickdesaulniers nickdesaulniers added the backend:X86 Scheduler Models Accuracy of X86 scheduler models label Apr 22, 2022
nickdesaulniers added a commit that referenced this issue May 5, 2022
To match the cost of other scheduling models. This is expected to
schedule mov instructions around INLINEASM less frequently for the
default machineschedule (pre-RA scheduling).

Suggested by Craig Topper.

Link: #41914

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122350
@nickdesaulniers
Copy link
Member

I just landed a patch that will ship in clang-15. It makes it less likely that we schedule movs in a way that extends the LiveRange of phyregs across inline asm in a way that RegAlloc will fail.

That said, the MachineScheduler is still not hardened from this re-occuring. Ideally, the scheduler could check if reordering a mov across an inlineasm would produce unsatisfiable constraints on the register allocator. IIRC, MachineScheduler just uses a priority queue insertion to order instructions that don't have data dependencies between them. So this might not be the last time we see this issue, but my patch should help a bit. Closing for now, but let's open a new bug report with more test cases if we can still reproduce this with clang-15.

(My patch allows both test cases by @arndb to compile without issue)

@nickdesaulniers
Copy link
Member

nickdesaulniers commented May 5, 2022

As is tradition, I spoke too soon. This fixes one of our instances for no -march=, but for -march=atom we see yet another instance.
ClangBuiltLinux/linux#1589

@RKSimon
Copy link
Collaborator

RKSimon commented May 5, 2022

@nickdesaulniers Is this on 32-bit atom builds? We always use ILP on Atom CPU targets which is going to waste registers that 32-bit can't really afford.

  // For 64-bit, since we have so many registers, use the ILP scheduler.
  // For 32-bit, use the register pressure specific scheduling.
  // For Atom, always use ILP scheduling.
  if (Subtarget.isAtom())
    setSchedulingPreference(Sched::ILP);
  else if (Subtarget.is64Bit())
    setSchedulingPreference(Sched::ILP);
  else
    setSchedulingPreference(Sched::RegPressure);

@nickdesaulniers
Copy link
Member

Is this on 32-bit atom builds?

No. 64b ATOM.

926afd7 looks like it moved 32b atom from hybrid to ilp.

If I change atom to hybrid, I can still reproduce the register exhaustion compilation failure. Same for RegPressure.

Maybe I should be looking at the operator() overloads for the derived classes of queue_sort.

I've posted a smaller reproducer here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 Scheduler Models Accuracy of X86 scheduler models backend:X86 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

6 participants