Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLDB] On AArch64, reconfigure register context first #70742

Conversation

jasonmolenda
Copy link
Collaborator

On an SVE/SME the register context configuration may change after the inferior process has executed. This was handled via https://reviews.llvm.org/D159504 but it is reconfiguring and clearing the register context after we've parsed any expedited reigster values from the stop reply packet. That results in lldb having to read each register value one at a time while at that stop location, which will be a performance problem on non-local debug setups.

The configuration & clearing needs to happen first. Also, update the names of the local variables for a little clarity.

On an SVE/SME the register context configuration may change after
the inferior process has executed.  This was handled via
https://reviews.llvm.org/D159504 but it is reconfiguring and clearing
the register context after we've parsed any expedited reigster
values from the stop reply packet.  That results in lldb having
to read each register value one at a time while at that stop
location, which will be a performance problem on non-local
debug setups.

The configuration & clearing needs to happen first.  Also, update
the names of the local variables for a little clarity.
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 30, 2023

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

On an SVE/SME the register context configuration may change after the inferior process has executed. This was handled via https://reviews.llvm.org/D159504 but it is reconfiguring and clearing the register context after we've parsed any expedited reigster values from the stop reply packet. That results in lldb having to read each register value one at a time while at that stop location, which will be a performance problem on non-local debug setups.

The configuration & clearing needs to happen first. Also, update the names of the local variables for a little clarity.


Full diff: https://github.com/llvm/llvm-project/pull/70742.diff

1 Files Affected:

  • (modified) lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp (+16-18)
diff --git a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
index 56fc5490657ea71..15bfda3e02e99e4 100644
--- a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
@@ -1642,9 +1642,22 @@ ThreadSP ProcessGDBRemote::SetThreadStopInfo(
   }
 
   ThreadGDBRemote *gdb_thread = static_cast<ThreadGDBRemote *>(thread_sp.get());
-  RegisterContextSP gdb_reg_ctx_sp(gdb_thread->GetRegisterContext());
+  RegisterContextSP reg_ctx_sp(gdb_thread->GetRegisterContext());
 
-  gdb_reg_ctx_sp->InvalidateIfNeeded(true);
+  reg_ctx_sp->InvalidateIfNeeded(true);
+
+  // AArch64 SVE/SME specific code below updates SVE and ZA register sizes and
+  // offsets if value of VG or SVG registers has changed since last stop.
+  const ArchSpec &arch = GetTarget().GetArchitecture();
+  if (arch.IsValid() && arch.GetTriple().isAArch64()) {
+    GDBRemoteRegisterContext *gdb_remote_reg_ctx =
+        static_cast<GDBRemoteRegisterContext *>(reg_ctx_sp.get());
+
+    if (gdb_remote_reg_ctx) {
+      gdb_remote_reg_ctx->AArch64Reconfigure();
+      gdb_remote_reg_ctx->InvalidateAllRegisters();
+    }
+  }
 
   auto iter = std::find(m_thread_ids.begin(), m_thread_ids.end(), tid);
   if (iter != m_thread_ids.end())
@@ -1655,25 +1668,10 @@ ThreadSP ProcessGDBRemote::SetThreadStopInfo(
     WritableDataBufferSP buffer_sp(
         new DataBufferHeap(reg_value_extractor.GetStringRef().size() / 2, 0));
     reg_value_extractor.GetHexBytes(buffer_sp->GetData(), '\xcc');
-    uint32_t lldb_regnum = gdb_reg_ctx_sp->ConvertRegisterKindToRegisterNumber(
+    uint32_t lldb_regnum = reg_ctx_sp->ConvertRegisterKindToRegisterNumber(
         eRegisterKindProcessPlugin, pair.first);
     gdb_thread->PrivateSetRegisterValue(lldb_regnum, buffer_sp->GetData());
   }
-
-  // AArch64 SVE/SME specific code below updates SVE and ZA register sizes and
-  // offsets if value of VG or SVG registers has changed since last stop.
-  const ArchSpec &arch = GetTarget().GetArchitecture();
-  if (arch.IsValid() && arch.GetTriple().isAArch64()) {
-    GDBRemoteRegisterContext *reg_ctx_sp =
-        static_cast<GDBRemoteRegisterContext *>(
-            gdb_thread->GetRegisterContext().get());
-
-    if (reg_ctx_sp) {
-      reg_ctx_sp->AArch64Reconfigure();
-      reg_ctx_sp->InvalidateAllRegisters();
-    }
-  }
-
   thread_sp->SetName(thread_name.empty() ? nullptr : thread_name.c_str());
 
   gdb_thread->SetThreadDispatchQAddr(thread_dispatch_qaddr);

Copy link
Member

@bulbazord bulbazord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Definitely happy to see us moving towards doing less unnecessary work. 😄

I know this is just code relocation but I'm a bit wary of the original code. Using static_cast to downcast like this may be UB if reg_ctx_sp.get() isn't actually backed by a GDBRemoteRegisterContext.... We should consider rewriting this to avoid potential UB in the near future.

@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Oct 31, 2023

The configuration & clearing needs to happen first. Also, update the names of the local variables for a little clarity.

Just an immediate reaction, the values we need to do that configuration come from the expedited registers. So at a glance, of course we would have to parse those out to be able to reconfigure. Unless you mean that we are invalidating all the other expedited registers in the process, which makes us re-read them.

But also, this code isn't very clear anyway, so let me put this through the SVE/SME tests and see if there's any practical issue with it.

@DavidSpickett
Copy link
Collaborator

That results in lldb having to read each register value one at a time while at that stop location, which will be a performance problem on non-local debug setups.

And given that SVE/SME size can change every time we stop, "this stop" means "all stops". So yeah I can see the issue there.

@DavidSpickett
Copy link
Collaborator

This passed all the tests, and I see a reduction in the packets sent for a single step.

The value of vg is 8 and svg 4 is 4 in this case.

Before:

(lldb) b-remote.async>  < 800> read packet: $T05thread:p56d.56d;name:main.o;threads:56d;thread-pcs:000000000040056c;00:0000000000000000;01:38f5ffffffff0000;02:48f5ffffffff0000;03:8804400000000000;04:0000000000000000;05:3d170907e9f5e52b;06:0886fcf7ffff0000;07:0000000000010000;08:ffffffffffffffff;09:ffffffffffffff00;0a:0000000000000000;0b:1000000000000000;0c:0822e6f7ffff0000;0d:0000000000000000;0e:0000000000000000;0f:47ffff6f00000000;10:a8efe7f7ffff0000;11:0000420000000000;12:4062517300000000;13:7005400000000000;14:0000000000000000;15:5004400000000000;16:0000000000000000;17:0000000000000000;18:0000000000000000;19:0000000000000000;1a:0000000000000000;1b:0000000000000000;1c:0000000000000000;1d:e0f3ffffffff0000;1e:90f0e7f7ffff0000;1f:e0f3ffffffff0000;20:6c05400000000000;21:00002060;a1:0800000000000000;d9:0400000000000000;reason:trace;#c9
intern-state     <  20> send packet: $pa1;thread:056d;#29
intern-state     <  20> read packet: $0800000000000000#08
intern-state     <  20> send packet: $pd9;thread:056d;#34
intern-state     <  20> read packet: $0400000000000000#04
intern-state     <  20> send packet: $p20;thread:056d;#f9
intern-state     <  20> read packet: $6c05400000000000#42
intern-state     <  20> send packet: $p1f;thread:056d;#2e
intern-state     <  20> read packet: $e0f3ffffffff0000#1e
intern-state     <  20> send packet: $p1e;thread:056d;#2d
intern-state     <  20> read packet: $90f0e7f7ffff0000#90
intern-state     <  21> send packet: $xfffffffff200,200#5e
intern-state     < 516> read packet: $e4f2ffffffff0000000000000000000010f3ffffffff0000905efdf7ffff000030f4ffffffff00000000000000000000b035fff7ffff0000010000000000000000f2fff7ffff0000c8f3ffffffff0000b8f3ffffffff00002e4e3df600000000270340000000000098f5fff7ffff0000e4f2ffffffff0000e8f2ffffffff000038f5d80300000000d802400000000000c8f3ffffffff0000c079fef7ffff00000100000000000000b8f3ffffffff0000b035fff7ffff000000000000000000006035fff7ffff00000100000000000000030000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000010000000000000000f2fff7ffff000000f4ffffffff0000749efdf7ffff00000000420000000000203bfff7ffff00002043fff7ffff0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000a0f3ffffffff00003c4be9f7ffff0000a0f3ffffffff0000ac05400000000000e0f3ffffffff000038f0e7f7ffff000070054000000000000000000000000000e0f3ffffffff000054f0e7f7ffff0000700540000000000000000000000000000000000000000000840440000000000000000000000000000000000000000000#71
intern-state     <  15> send packet: $Z0,400568,4#4d
intern-state     <   6> read packet: $OK#9a
dbg.evt-handler  <  16> send packet: $jThreadsInfo#c1
dbg.evt-handler  < 224> read packet: $[{"name":"main.o","reason":"trace","registers":{"161":"0800000000000000","217":"0400000000000000","29":"e0f3ffffffff0000","30":"90f0e7f7ffff0000","31":"e0f3ffffffff0000","32":"6c05400000000000"}],"signal":5,"tid":1389}]]#76
dbg.evt-handler  <  20> send packet: $pa1;thread:056d;#29
dbg.evt-handler  <  20> read packet: $0800000000000000#08
dbg.evt-handler  <  20> send packet: $pd9;thread:056d;#34
dbg.evt-handler  <  20> read packet: $0400000000000000#04
dbg.evt-handler  <  20> send packet: $p20;thread:056d;#f9
dbg.evt-handler  <  20> read packet: $6c05400000000000#42

After:

(lldb) b-remote.async>  < 800> read packet: $T05thread:p514.514;name:main.o;threads:514;thread-pcs:000000000040056c;00:0000000000000000;01:38f5ffffffff0000;02:48f5ffffffff0000;03:8804400000000000;04:0000000000000000;05:b610134a0aa5296f;06:0886fcf7ffff0000;07:0000000000010000;08:ffffffffffffffff;09:ffffffffffffff00;0a:0000000000000000;0b:1000000000000000;0c:0822e6f7ffff0000;0d:0000000000000000;0e:0000000000000000;0f:47ffff6f00000000;10:a8efe7f7ffff0000;11:0000420000000000;12:4062517300000000;13:7005400000000000;14:0000000000000000;15:5004400000000000;16:0000000000000000;17:0000000000000000;18:0000000000000000;19:0000000000000000;1a:0000000000000000;1b:0000000000000000;1c:0000000000000000;1d:e0f3ffffffff0000;1e:90f0e7f7ffff0000;1f:e0f3ffffffff0000;20:6c05400000000000;21:00002060;a1:0800000000000000;d9:0400000000000000;reason:trace;#14
intern-state     <  20> send packet: $pa1;thread:0514;#f4
intern-state     <  20> read packet: $0800000000000000#08
intern-state     <  20> send packet: $pd9;thread:0514;#ff
intern-state     <  20> read packet: $0400000000000000#04
intern-state     <  21> send packet: $xfffffffff200,200#5e
intern-state     < 516> read packet: $e4f2ffffffff0000000000000000000010f3ffffffff0000905efdf7ffff000030f4ffffffff00000000000000000000b035fff7ffff0000010000000000000000f2fff7ffff0000c8f3ffffffff0000b8f3ffffffff00002e4e3df600000000270340000000000098f5fff7ffff0000e4f2ffffffff0000e8f2ffffffff000038f5d80300000000d802400000000000c8f3ffffffff0000c079fef7ffff00000100000000000000b8f3ffffffff0000b035fff7ffff000000000000000000006035fff7ffff00000100000000000000030000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000010000000000000000f2fff7ffff000000f4ffffffff0000749efdf7ffff00000000420000000000203bfff7ffff00002043fff7ffff0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000a0f3ffffffff00003c4be9f7ffff0000a0f3ffffffff0000ac05400000000000e0f3ffffffff000038f0e7f7ffff000070054000000000000000000000000000e0f3ffffffff000054f0e7f7ffff0000700540000000000000000000000000000000000000000000840440000000000000000000000000000000000000000000#71
intern-state     <  15> send packet: $Z0,400568,4#4d
intern-state     <   6> read packet: $OK#9a
dbg.evt-handler  <  16> send packet: $jThreadsInfo#c1
dbg.evt-handler  < 224> read packet: $[{"name":"main.o","reason":"trace","registers":{"161":"0800000000000000","217":"0400000000000000","29":"e0f3ffffffff0000","30":"90f0e7f7ffff0000","31":"e0f3ffffffff0000","32":"6c05400000000000"}],"signal":5,"tid":1300}]]#65
dbg.evt-handler  <  20> send packet: $pa1;thread:0514;#f4
dbg.evt-handler  <  20> read packet: $0800000000000000#08
dbg.evt-handler  <  20> send packet: $pd9;thread:0514;#ff
dbg.evt-handler  <  20> read packet: $0400000000000000#04

So this is an improvement but now I wonder why even expedite the vg and svg registers if we're going to reconfigure before parsing them, which means we send a register read anyway.

Potential workarounds:

  • Invalidate only registers we know can change size.
  • Parse the expedited registers once, reconfigure then parse them again. On the assumption that no scalable register will ever be in the expedited set.

Not sure that's your responsibility to fix, so this change could go in as is and I'll look into the rest if you like.

@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Oct 31, 2023

Invalidate only registers we know can change size.

This one doesn't work because in changing the size of any register in the context, you invalidate the offsets of the rest. We also cannot be sure that the scalable registers are in a certain place in that context. They will be beyond general purpose registers, yes, but beyond that you could have a mix of scalable and unscalable registers depending on the target's extensions.

@DavidSpickett
Copy link
Collaborator

Parse the expedited registers once, reconfigure then parse them again. On the assumption that no scalable register will ever be in the expedited set.

This works but due to more pessimism in:

void GDBRemoteRegisterContext::AArch64Reconfigure() {

We actually invalidate vg and svg before reading them again. I will check if this is still needed.

So, as silly as it sounds this might be the way to go:

  • Parse expedited registers.
  • Reconfigure if needed and invalidate all registers.
  • Parse expedited registers again.

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy for this to land as is given that it is a strict improvement over the current state (however weird that is).

@jasonmolenda
Copy link
Collaborator Author

So, as silly as it sounds this might be the way to go:

  • Parse expedited registers.
  • Reconfigure if needed and invalidate all registers.
  • Parse expedited registers again.

I'm fine with that, it doesn't require packets over the wire so the perf hit is tiny. Or fishing the vg and svg register values of of the expedited register list if they're there. I'm fine with abandoning this PR if you'd like to write that patch, it's not pressing that this be fixed immediately. Thanks!

@jasonmolenda
Copy link
Collaborator Author

Or fishing the vg and svg register values of of the expedited register list if they're there.

ignore that suggestion, we can't hardcode a register number here, the stub is allowed to use whatever numbers it wants for the registers and the expedited register numbers are all in the stub's numbering ordering at this point. I think your parse-reconfigure-parse plan is good.

@DavidSpickett
Copy link
Collaborator

I'm gonna merge this so I can do a patch on top of this where we both understand the starting point. I think I've got to the bottom of the issues my workarounds were addressing.

@DavidSpickett DavidSpickett merged commit d402645 into llvm:main Oct 31, 2023
4 checks passed
JDevlieghere pushed a commit to swiftlang/llvm-project that referenced this pull request Nov 29, 2023
On an SVE/SME the register context configuration may change after the
inferior process has executed. This was handled via
https://reviews.llvm.org/D159504 but it is reconfiguring and clearing
the register context after we've parsed any expedited reigster values
from the stop reply packet. That results in lldb having to read each
register value one at a time while at that stop location, which will be
a performance problem on non-local debug setups.

The configuration & clearing needs to happen first. Also, update the
names of the local variables for a little clarity.

(cherry picked from commit d402645)
JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Nov 29, 2023
…after-receiving-expedited-regs

[LLDB] On AArch64, reconfigure register context first (llvm#70742)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants