-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API flexibility #63
Comments
I mean, the main reason I wrote my own symbolizer is that I needed inlining info and couldn't see how to easily add that to gimli-addr2line so I just started over. As I said in gimli-rs/gimli#248, I'm definitely willing to merge my unwind-rs codebase into what others have as I don't have the time to maintain it on my own (and it doesn't really make sense when I'm the only user). What hardliner does right now:
|
Yep gimli-addr2line should be much faster here than what it is currently. I've given this some thought and plan to work on it as soon as I have time (perhaps tomorrow).
I wanted to do that, but some compilers generate one sequence for the entire compilation unit, so for some files this can be very slow. In general, I like a lot about your code style in hardliner, and I understand why you started from scratch, but I still plan to implement inlining info in gimli-addr2line and then evaluate the differences from there. The rest of the unwinding is not something I've worked on (although I may in future) so my opinions on that don't count for much. It doesn't matter to me if that is in gimli-rs or not, it's whatever works best for you. Moving it to gimli-rs won't immediately accomplish anything though if no-one else has time to work on it. |
First off: sorry I've been slow to respond in other threads, I've been traveling. My general take: it doesn't make sense to have competing symbolication libraries without good reason. We should be working together, because none of us have the time/resources to maintain everything ourselves. If that means throwing out the current code base and replacing it with But we have to eventually end up with one shared code base (preferably in
Yeah, let's do this. I'll send you an org invite in a moment, and then you can transfer the repo over.
What we get is more ecosystem cohesion and reviewers to bounce ideas off of. A bit fluffly, but I want our organization to be the one-stop shop for all kinds of debugging needs and none of us can create that alone :) I'm still working on Should I transfer |
Agreed. I'll do that experimentation then write up my thoughts.
Sure. I've also transfered ddbug, but until I'm happier with the state of it, I will continue to develop it by simply pushing to master. |
For testing purposes, I've added function inlining support to gimli-addr2line here. It's ugly, so don't look too closely. This algorithm is based heavily on the interval tree usage in @main-- 's code, and all credit for the good performance of this goes to @main-- . For the binary I'm testing on, the results of this don't fully agree with binutils, but I did a few spot checks and those results do agree with llvm-symbolizer. I tried comparing the output with hardliner, but currently it is missing many names in C++ code, although it gets the line numbers correct. This is probably because it doesn't follow Here's some timing comparisons. I haven't done graphs because they make it harder to compare when there are outliers. The test binary is mostly C++ with about 11MB of text. ~110k consecutive addresses:
Every 100th address:
Every address:
For the consecutive addresses, hardliner's memory usage is higher because it doesn't do lazy parsing, but that can be fixed, so ignore that. A better comparison of memory usage is for the benchmark of every address. I didn't bother waiting for that to do the functions too, but the difference in memory usage between gimli and hardliner should be about the same still. The main thing to note is that due to hardliner resuming sequences instead of caching them, it is significantly slower. This is also the reason why it uses less memory. However, these benchmarks are for lots of addresses, and the time per address is still quite low. I've been optimizing primarily for speed, but is that the correct thing to optimize for? I don't know... If we keep gimli-addr2line, we need to:
If we switch to hardliner, we need to:
And then the big difference that could go either way is which line sequence algorithm to use. I'm happy with using either code base. If we stick with gimli-addr2line then I can do the majority of the work on that. If we change to hardliner then I would like @main-- to do some of that, but I can still work on it too. |
Very interesting analysis. I was benchmarking with a debug build of ripgrep (29MB total), where (as you can see from the graphs) the performance penalty from resuming the line number program wasn't nearly as big.
I'm pretty sure that's the reason - as I'm always testing with Rust binaries, I only implemented One important point to note is that recent versions (5.0) of llvm-symbolizer are much faster. I wanted to find out why but I'm just not familiar enough with the llvm codebase - on the surface it looks like they just create a sorted list and binary search that??
I know, it's a difficult problem. The way I think about it, the one major usecase for symbolization is stack traces. A typical stack trace:
From that perspective, it seems like the key to this is laziness: Only parse what we need and cache that. The major downside is that it forces us to allocate when querying. That's a problem when you use the library in a panic handler as you might be panicking due to OOM. My opinion on this has shifted a bit though - a backtrace for an OOM panic may not be very useful anyways since you can never be sure who allocated all the memory without a memory profiler. Now about the symbol table parsing:
|
Yes, I had a bit of a look at that, but didn't compile it to test. They take advantage of the fact that the DIEs are a tree. So your address map only needs to tell you which DIE to start at, and then you can traverse back up the tree to get the rest. DWARF doesn't naturally allow traversing backwards, so they have a
If I recall correctly from when I briefly looked at libbacktrace, it also allocates during the query. That doesn't mean we shouldn't try to do better though. Benchmarking against it might be interesting too. I think it's the same as the others in that it caches the fully decoded line sequences.
We can easily keep it all in the
Sometimes you have to link with code for which you don't have debug info. It's nice to have something in the backtrace instead of simply question marks. |
Ah, so this is the part I was missing. I'm curious how this compares to an optimized implementation of my interval tree - I'm not sure which is faster.
Yeah, this is what I'm proposing: Keep it in the
I guess it depends a lot on what you're doing. But for just printing a trace, the better-than-nothing argument obviously applies, you're right. |
Note that something like a profiler is also "just" symbolicating stack traces, but typically many stack traces at a time. And while each individual sampled stack won't have consecutive addresses, there likely are consecutive addresses across samples. |
It does, but it has its own allocator that uses
I think all of libbacktrace is async-signal-safe (if you grant this assumption). |
We still need to decide between gimli-addr2line and hardliner. I think the main factor for deciding between them is the .debug_line handling, and I don't have a strong opinion either way. Long term, I want to investigate using the approach used by hardliner, but adding artificial breaks in long sequences, so maybe hardliner is a better starting point for that. For the other factors, I think hardliner's is closer to what we want. symbol table handling doesn't matter because that's moving to the object crate. So overall I think hardliner is a better starting point. If we do choose hardliner, how do we go about replacing this crate with it?
So it's conceivable we could do the same with custom allocators. Auditing rust for async-signal-safe might be hard though? I think lazy parsing is important to have for large files. |
A few options:
I haven't given this a lot of thought but I think I prefer 2 since it both preserves a meaningful git history in this repo as well as for hardliner.
I don't think async-signal-safety is really supported in Rust right now? But I agree, lazy parsing can be a huge performance boost in practice. |
+1 for option 2 |
I prefer option 2 as well. |
Prompted by @main--'s comment, here's a few API changes for consideration. The common theme is to make the API more flexible, and less centered on providing an addr2line clone. Note that more flexible may also mean more complicated, so this is a tradeoff. Some of these may be strawman proposals :)
Remove the
with_functions
option, and use a parameter tolocate
instead (or a differentlocate
method). This allows the caller to decide when doing thelocate
, not when constructing theMapping
. This is possible because we lazily read the function information.Remove the
with_demangling
option and stop doing demangling. This should be handled by the caller. Instead, we return a language value that the caller can use to determine how to demangle. This allows the caller to access both the mangled and demangled names if desired, and to implement their own caching of demangled names if required.Remove the
with_symbol_table
option. The caller should be able to decide whether to use symbols for each query. The caller should also be able to get both debuginfo and symbol table information simultaneously. This may mean having two independent queries: one for debuginfo and one for the symbol table.Make the
object
crate dependency optional. Instead, define a trait for the required object loader functionality. We can still implement this trait forobject
for convenience.Avoid the
memmap
dependency. Using anOwningHandle
to tie something to a memmap is a generic operation that has nothing to do with addr2line, and should be implemented elsewhere.Add an endianity parameter to the mapping instead of using
EndianDebugInfo
. This allows the caller to decide if they only want to supportNativeEndian
, without incurring size or performance penalties.The text was updated successfully, but these errors were encountered: