Make `-Z self-profile` more efficient #58372

michaelwoerister · 2019-02-11T13:36:57Z

The self-profiling feature is going to make profiling the compilers performance a lot easier. However, a recent first stab at collecting more detailed information (see #58085) still has too much overhead.

Here are some of the things that could be improved:

Move post-processing of the collected data out of the rustc process, as much as possible. SelfProfiler::get_results() does a lot of work for generating the statistics from the collected events. All of this should probably be moved to a separate tool that runs after profiling is done.
Reduce the amount of dispatch and locking that needs to be done for each event. For each event we have to get exclusive access to the profiler (RefCell/ parking_lot mutex) and then look up the event stream for the current thread in an FxHashMap. This should probably solved via thread-local data somehow.
Reduce the size of events. Events are quite big (32 bytes on x86_64 would be my guess). The timestamp can be reduced to 64 bits if we just measure the time from process start. The &str containing the query name can be replaced by a 4 byte tag.
Persist events to disk in a binary format. We should probably open a memory mapped file per thread that we write events to directly. If events don't contain pointers they can be written to disk verbatim. The post-processing tool can then convert them to something platform independent.

Some time soon we also want to record query keys per event. This can already be done efficiently by storing the 32 bit DepNodeIndex that corresponds to a query (which also obviates the need to store the query name in each event). However, in order for the DepNodeIndex to be useful, we'll need to create a persist a mapping of DepNodeIndex -> String at some point before the tcx is destroyed (i.e. in the middle of the compilation process). I expect that creating this map will not be entirely cheap :/

@Mark-Simulacrum, as you can see the whole workflow around self-profiling will change quite a bit, so I
think it's too early to add infrastructure for it to perf.rlo just yet.

cc @wesleywiser @nnethercote (and @rust-lang/wg-compiler-performance for good measure)

The text was updated successfully, but these errors were encountered:

Related to rust-lang#58372

Part of rust-lang#58372

@michaelwoerister

[self-profiler] Make the profiler faster/more efficient Related to #58372 r? @michaelwoerister

michaelwoerister · 2019-02-28T11:25:39Z

So here are some thoughts on making the part of profiling that is happening in the compiler process as cheap as possibe:

Anything in ProfilerEvent should be small and persistable verbatim. E.g. we don't want &str in there because that needs to be expanded to the actual string contents during serialization, which generates a lot of redundant work. I think this basically leaves integer values (which can be indices into side tables that are also persisted, but with less redundancy).
Query keys still have to be rendered to some kind of string form during persistence. This is expensive, so we have to be smart about it:
- It's good to do this in bulk if possible, so that it shows up as a single event in profiles instead of being spread across other events with little chance to account for it during post-processing.
- The string representation of query keys contain lots of redundancy, e.g. the same DefPath prefixes will occur over and over again. It would be good if the way we generate and persist these strings takes advantage of that.

As a result of these thoughts I have the following proposal:

ProfilerEvent should look like this (modulo more efficient packing into "bitfields"):

struct ProfilerEvent {
  event_kind: EventKind, // u8 - query-provider, query-cache-hit, generic-event, incr-comp-cache-loading ... 
  event_id: u32, // ~ (query-kind, query-key) or (function-name, arguments)
  timestamp_kind: u8, // start, stop, instant
  timestamp: u64 // nanoseconds since profiler was created
}

The event_id is basically a key into a string table that is owned by the profiler. The entry in the string table does not have to be created at the same time as the event_id. In particular, for queries we can assume that the event_id is the DepNodeIndex of the query instance and can then later iterate over all query tables in order to generate the corresponding string table entries in bulk. We can reserve the range 0 .. 2^31 for queries. Other events are less fine-grained so it should be OK to allocate entries on demand without noticeably skewing the profile.
The string table in the profiler is tuned for coping with redundancy, that is, entries can refer to other entries by reference instead of copying the contents. So an entry conceptually looks like:
```
struct StringTableEntry {
   components: [EntryComponent]
}

enum EntryComponent {
   Contents(String),
   Ref(StringTableId),
   End,
} 
```
I say conceptually because I expect these entries to be serialized immediately during their allocation. The table itself does not do any de-duplication or interning. This is done from the outside where we have additional knowledge of the data being stored. This also means that entries cannot be compared for equality by just looking at their ID but I don't think we need that.
Because DefId and CrateNum are by far the most common query keys we'll take special care of generating their string table entries in an efficient manner. I imagine something similar to what we do for generating CGU names, that is, use caching/dynamic programming to make this really efficient. Other kinds of query keys (like Ty) can be optimized later too since the string table format is reasonably flexible.
Since the string table is rather complex, it should be encapsulated into its own re-usable crate so that the post-processing crate does not have to duplicate the logic.

What I like about this approach is that it should be rather efficient but it is still generic. I.e. the string table only knows about strings and references between them but does not need to duplicate more logic about compiler internal data structures like the DefPath or Ty. We are lowering these into a simpler form.

Any feedback is welcome!

Related to rust-lang#58372

Part of rust-lang#58372

@michaelwoerister

…r=michaelwoerister [self-profiler] Make the profiler faster/more efficient Related to rust-lang#58372 r? @michaelwoerister

Related to rust-lang#58372

Part of rust-lang#58372

@michaelwoerister

…erister [self-profiler] Make the profiler faster/more efficient Related to #58372 r? @michaelwoerister

pnkfelix · 2019-03-07T15:43:05Z

cc #58967

Related to rust-lang#58372 Related to rust-lang#58967

wesleywiser · 2019-04-14T02:59:15Z

@michaelwoerister I believe this issue can be closed now that we're using measureme right?

michaelwoerister · 2019-04-15T09:35:47Z

Yes, I think all things listed here are implemented via measureme. Closing.

michaelwoerister added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 11, 2019

michaelwoerister mentioned this issue Feb 11, 2019

Bustage due to recent self-profile changes rust-lang/rustc-perf#342

Closed

wesleywiser self-assigned this Feb 11, 2019

wesleywiser mentioned this issue Feb 13, 2019

[self-profiler] Make the profiler faster/more efficient #58425

Merged

wesleywiser added a commit to wesleywiser/rust that referenced this issue Feb 20, 2019

Remove profiler output and replace with a raw event dump

81327ff

Related to rust-lang#58372

wesleywiser added a commit to wesleywiser/rust that referenced this issue Feb 20, 2019

Reduce the size of events by using a u64 instead of Instant

f49dea4

Part of rust-lang#58372

bors added a commit that referenced this issue Feb 20, 2019

Auto merge of #58425 - wesleywiser:more_profiler_changes, r=<try>

695f06a

[self-profiler] Make the profiler faster/more efficient Related to #58372 r? @michaelwoerister

wesleywiser added a commit to wesleywiser/rust that referenced this issue Mar 1, 2019

Remove profiler output and replace with a raw event dump

979ece9

Related to rust-lang#58372

wesleywiser added a commit to wesleywiser/rust that referenced this issue Mar 1, 2019

Reduce the size of events by using a u64 instead of Instant

2fb8d60

Part of rust-lang#58372

wesleywiser added a commit to wesleywiser/rust that referenced this issue Mar 3, 2019

Remove profiler output and replace with a raw event dump

fccc841

Related to rust-lang#58372

wesleywiser added a commit to wesleywiser/rust that referenced this issue Mar 3, 2019

Reduce the size of events by using a u64 instead of Instant

9d26789

Part of rust-lang#58372

bors added a commit that referenced this issue Mar 3, 2019

Auto merge of #58425 - wesleywiser:more_profiler_changes, r=michaelwo…

87a4363

…erister [self-profiler] Make the profiler faster/more efficient Related to #58372 r? @michaelwoerister

michaelwoerister pushed a commit to michaelwoerister/rust that referenced this issue Apr 12, 2019

Use measureme in self-profiler

c9237f5

Related to rust-lang#58372 Related to rust-lang#58967

wesleywiser added a commit to wesleywiser/rust that referenced this issue Apr 13, 2019

Use measureme in self-profiler

56e434d

Related to rust-lang#58372 Related to rust-lang#58967

michaelwoerister closed this as completed Apr 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `-Z self-profile` more efficient #58372

Make `-Z self-profile` more efficient #58372

michaelwoerister commented Feb 11, 2019 •

edited

Loading

michaelwoerister commented Feb 28, 2019

pnkfelix commented Mar 7, 2019

wesleywiser commented Apr 14, 2019

michaelwoerister commented Apr 15, 2019

Make -Z self-profile more efficient #58372

Make -Z self-profile more efficient #58372

Comments

michaelwoerister commented Feb 11, 2019 • edited Loading

michaelwoerister commented Feb 28, 2019

pnkfelix commented Mar 7, 2019

wesleywiser commented Apr 14, 2019

michaelwoerister commented Apr 15, 2019

Make `-Z self-profile` more efficient #58372

Make `-Z self-profile` more efficient #58372

michaelwoerister commented Feb 11, 2019 •

edited

Loading