[Experimental] Performance test new paging-based measureme data sink. #76436

michaelwoerister · 2020-09-07T10:19:37Z

This PR will try to make sure that the new measureme file format (as described in rust-lang/measureme#128) does not regress performance while doing self-profiling.

The perf.rlo run is actually expected to fail (because the tools on the perf server cannot handle the new format yet) but hopefully we'll still get the top-level timings that should show potential regressions.

The implementation of the new format as referenced by this PR is not production ready yet, but it should give a good indication of final performance. Once we know if this method of performance testing actually yields results, we can also test an alternative implementation that is slightly simpler.

cc @wesleywiser @Mark-Simulacrum

michaelwoerister · 2020-09-07T10:20:02Z

@bors try

bors · 2020-09-07T10:20:03Z

@michaelwoerister: 🔑 Insufficient privileges: not in try users

Xanewok · 2020-09-07T10:38:03Z

@bors try

bors · 2020-09-07T10:38:15Z

⌛ Trying commit d9054f8d414a52d189b26a0fa92f614740d596b2 with merge 12da0f82045ff20d2ae70c83c926bae9c22df223...

bors · 2020-09-07T11:10:55Z

💔 Test failed - checks-actions

rust-log-analyzer · 2020-09-07T11:14:15Z

The job dist-x86_64-linux of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

   Compiling parking_lot v0.11.0
   Compiling rand_core v0.5.1
   Compiling generic-array v0.12.3
   Compiling parking_lot v0.10.2
   Compiling measureme v0.7.1 (https://github.com/michaelwoerister/measureme.git?branch=paged_sinks#da4d0a6e)
   Compiling rand_chacha v0.2.2
   Compiling matchers v0.0.1
   Compiling rustc_serialize v0.0.0 (/checkout/compiler/rustc_serialize)
   Compiling petgraph v0.5.1
---
   Compiling parking_lot v0.10.2
   Compiling crossbeam-deque v0.7.3
   Compiling block-buffer v0.7.3
   Compiling digest v0.8.1
   Compiling measureme v0.7.1 (https://github.com/michaelwoerister/measureme.git?branch=paged_sinks#da4d0a6e)
   Compiling matchers v0.0.1
   Compiling rustc_serialize v0.0.0 (/checkout/compiler/rustc_serialize)
   Compiling petgraph v0.5.1
   Compiling md-5 v0.8.0
---
    Checking parking_lot v0.10.2
    Checking generic-array v0.12.3
    Checking rand_chacha v0.2.2
    Checking crossbeam-deque v0.7.3
    Checking measureme v0.7.1 (https://github.com/michaelwoerister/measureme.git?branch=paged_sinks#da4d0a6e)
    Checking regex v1.3.9
    Checking rand v0.7.3
    Checking rustc_serialize v0.0.0 (/checkout/compiler/rustc_serialize)
    Checking petgraph v0.5.1

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

michaelwoerister · 2020-09-07T15:44:58Z

OK, this seems to pass all checks now. @Xanewok, would you be so kind and start another perf run for me?

Xanewok · 2020-09-07T15:46:54Z

Sure! Let's see if this one-liner works...

@bors try @rust-timer queue

rust-timer · 2020-09-07T15:46:55Z

Awaiting bors try build completion

bors · 2020-09-07T15:47:09Z

⌛ Trying commit ca1bb18264167719f514e66b94870f465036810f with merge cc857b33d54a8ee4ee103623dc1de441b8cf6009...

Mark-Simulacrum · 2020-09-07T16:38:44Z

Try build finished but intentionally not queueing this on perf, as it won't work. I'm working on patch that we can land to make this work side-by-side with regular builds.

Mark-Simulacrum · 2020-09-07T16:42:46Z

Looks like this currently fails in rustc, when running on helloworld

thread 'rustc' panicked at 'assertion failed: num_bytes <= self.shared_state.page_size - PAGE_HEADER_SIZE', /cargo/git/checkouts/measureme-11f27a204054df90/c004094/measureme/src/paged_serialization_sink.rs:133:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.48.0-nightly (cc857b33d 2020-09-07) running on x86_64-unknown-linux-gnu

note: compiler flags: -Z self-profile=/tmp/.tmpNwWiWk/self-profile-output -C embed-bitcode=no -C debuginfo=2 --crate-type bin

note: some of the compiler flags provided by cargo are hidden

thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `LLVMing`,
 right: `Codegenning`', /rustc/cc857b33d54a8ee4ee103623dc1de441b8cf6009/compiler/rustc_codegen_ssa/src/back/write.rs:1419:21

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.48.0-nightly (cc857b33d 2020-09-07) running on x86_64-unknown-linux-gnu

note: compiler flags: -Z self-profile=/tmp/.tmpNwWiWk/self-profile-output -C embed-bitcode=no -C debuginfo=2 --crate-type bin

note: some of the compiler flags provided by cargo are hidden

michaelwoerister · 2020-09-08T10:33:32Z

Looks like this currently fails in rustc, when running on helloworld

Indeed it does. SelfProfiler::bulk_map_query_invocation_id_to_single_string() seems to write quite large chunks of data at once, more than the page size in some cases. I'll make the code a bit more robust against this kind of thing.

Mark-Simulacrum · 2020-09-08T13:30:25Z

@bors try

bors · 2020-09-08T13:30:40Z

⌛ Trying commit e5d80ea with merge da6b924ee19d83e73682a095f60d8bc79de4cdc3...

bors · 2020-09-08T14:14:32Z

☀️ Try build successful - checks-actions, checks-azure
Build commit: da6b924ee19d83e73682a095f60d8bc79de4cdc3 (da6b924ee19d83e73682a095f60d8bc79de4cdc3)

rust-timer · 2020-09-08T14:14:52Z

Queued da6b924ee19d83e73682a095f60d8bc79de4cdc3 with parent 5a6b426, future comparison URL.

Mark-Simulacrum · 2020-09-08T14:42:03Z

I've put a new summarize on the perf collector which seemed to work locally (the collector supports 2 different simultaneous summarizes). I am uncertain if the upload logic will break, but we'll tackle that if necessary.

michaelwoerister · 2020-09-08T15:41:27Z

Thanks @Mark-Simulacrum, I was able to do a full rustc-perf run with this version locally. Let's see if something else breaks.

One way or the other, I'll need to rewrite the whole thing anyway before it can be merged. The current version has quite a bit of room for improvement in terms of code quality.

Mark-Simulacrum · 2020-09-08T16:16:02Z

Okay had to patch perf.rlo a bit but it should collect now. The site is likely to show just noise -- we've likely not changed overhead for collection being off, and perf.rlo only enables self-profile for one out of 2-3 runs of each benchmark. I will be able to manually pull up data from the database for the maximum rather than minimum statistic which should give us some idea of how much of a regression/win this is.

rust-timer · 2020-09-08T17:22:33Z

Finished benchmarking try commit (da6b924ee19d83e73682a095f60d8bc79de4cdc3): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

michaelwoerister · 2020-09-08T18:08:17Z

Alright, those numbers look encouraging. I'll put together a version that doesn't use a background thread tomorrow.

Mark-Simulacrum · 2020-09-08T19:35:17Z

Pulled up maximums rather than minimums, and though they look a bit worse, seems mostly fine. Attached a screenshot -- putting it up online is rather annoying.

michaelwoerister · 2020-09-09T07:39:56Z

Interesting, yes. There seem to be a few minor regressions. Maybe there is a bit of a pattern here: Most regressions seem to be scenarios that are quick to execute and where there's little time spent in LLVM and the linker. Maybe the background thread becomes the bottleneck towards the end, when it is still writing stuff to disk and the rest of the program is only waiting on it to finish?

How did you generate those numbers? Is there a special mode in the collector?

michaelwoerister · 2020-09-09T09:06:59Z

@Mark-Simulacrum I just pushed a version that writes data to disk in a blocking way. That data format is not affected by this so the tooling on the server should still be compatible.

Mark-Simulacrum · 2020-09-09T12:43:28Z

@bors try @rust-timer queue

How did you generate those numbers? Is there a special mode in the collector?

The collector runs rustc 2-3 times: once for self-profile, and then 1-2 times without it. By default, the perf.rust-lang.org frontend shows the minimum of each stat across those runs. Since self-profile does impose some overhead, that's usually not the self-profile run. The page I posted switched min to max (manually editing the SQL query).

I guess it might create a recursive "self feeding" problem, but in theory if we record events for the serialization as well, then crox should be able to show us when those are being serialized and to what extent we're waiting on them. I agree that it seems likely we're waiting at the end.

rust-timer · 2020-09-09T12:43:30Z

Awaiting bors try build completion

bors · 2020-09-09T12:43:40Z

⌛ Trying commit 2ad078b with merge 7479e5973d66928be9e1bbce6cd377ff0957c094...

bors · 2020-09-09T13:30:16Z

☀️ Try build successful - checks-actions, checks-azure
Build commit: 7479e5973d66928be9e1bbce6cd377ff0957c094 (7479e5973d66928be9e1bbce6cd377ff0957c094)

rust-timer · 2020-09-09T13:30:18Z

Queued 7479e5973d66928be9e1bbce6cd377ff0957c094 with parent 3f5e617, future comparison URL.

michaelwoerister · 2020-09-09T14:01:59Z

The collector runs rustc 2-3 times: once for self-profile, and then 1-2 times without it. By default, the perf.rust-lang.org frontend shows the minimum of each stat across those runs. Since self-profile does impose some overhead, that's usually not the self-profile run. The page I posted switched min to max (manually editing the SQL query).

Huh, that basically means we are not measuring anything useful with these perf runs, right? :) That is, I'm trying to find out the overhead of the new serialization strategy (and especially the overhead of the two new approaches relative to each other), but the "min" numbers actually (probably) don't contain self-profiled runs. I guess the solution is to ask you to manually run the modified query again for the current run :)

I guess it might create a recursive "self feeding" problem, but in theory if we record events for the serialization as well, then crox should be able to show us when those are being serialized and to what extent we're waiting on them. I agree that it seems likely we're waiting at the end.

There's definitely some things that we can record. For things that are application specific (e.g. the handling of query keys before they make it into measureme) this is easy. For things that happen inside of measureme, like flushing to disk once the in-memory buffers are full, we could have some kind of built-in recording in measureme itself. That might not even be that hard.

However, the final flushing of the file would need some kind of special handling. Right now we do that in SerializationSink::drop(). But that should actually be doable too. You always lose that last bit of information for how long it takes to write that very last measurement to disk but we could make sure that that is just a single event and thus does not account for much data.

Mark-Simulacrum · 2020-09-09T14:25:48Z

Running the query is not hard so I don't mind doing that. I'm also thinking about exposing a toggle for "just self-profile" results (we do technically have enough information) on the UI, though I'll probably not get around to it for some time.

rust-timer · 2020-09-09T14:40:38Z

Finished benchmarking try commit (7479e5973d66928be9e1bbce6cd377ff0957c094): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

Mark-Simulacrum · 2020-09-09T18:36:30Z

This is looking essentially perfect, lots of small variations but with a code change that affects many places in the compiler that's somewhat expected I think.

As before the max instead of min version, looking very clean:

michaelwoerister · 2020-09-09T19:40:02Z

Yeah, this looks actually better than the more complicated version. That's good to know. These things so often are so counterintuitive :)

Thanks a lot for getting those numbers. I think it's really great that perf.rlo is using an SQL database now, btw.

michaelwoerister · 2020-09-10T08:05:12Z

I'm going to close this PR, we got the numbers we need for deciding on how to forward.

bors added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 7, 2020

michaelwoerister force-pushed the perftest-paged-measureme-background-thread branch 3 times, most recently from 8fd2459 to ca1bb18 Compare September 7, 2020 12:35

Use experimental paging-based data sink for measureme.

e5d80ea

michaelwoerister force-pushed the perftest-paged-measureme-background-thread branch from ca1bb18 to e5d80ea Compare September 8, 2020 13:12

Try out blocking paged data sink.

2ad078b

michaelwoerister closed this Sep 10, 2020

[Experimental] Performance test new paging-based measureme data sink. #76436

[Experimental] Performance test new paging-based measureme data sink. #76436

Conversation

michaelwoerister commented Sep 7, 2020 • edited Loading

michaelwoerister commented Sep 7, 2020

bors commented Sep 7, 2020

Xanewok commented Sep 7, 2020

bors commented Sep 7, 2020

bors commented Sep 7, 2020

rust-log-analyzer commented Sep 7, 2020

michaelwoerister commented Sep 7, 2020

Xanewok commented Sep 7, 2020

rust-timer commented Sep 7, 2020

bors commented Sep 7, 2020

Mark-Simulacrum commented Sep 7, 2020

Mark-Simulacrum commented Sep 7, 2020

michaelwoerister commented Sep 8, 2020

Mark-Simulacrum commented Sep 8, 2020

bors commented Sep 8, 2020

bors commented Sep 8, 2020

rust-timer commented Sep 8, 2020

Mark-Simulacrum commented Sep 8, 2020

michaelwoerister commented Sep 8, 2020

Mark-Simulacrum commented Sep 8, 2020

rust-timer commented Sep 8, 2020

michaelwoerister commented Sep 8, 2020

Mark-Simulacrum commented Sep 8, 2020

michaelwoerister commented Sep 9, 2020

michaelwoerister commented Sep 9, 2020

Mark-Simulacrum commented Sep 9, 2020

rust-timer commented Sep 9, 2020

bors commented Sep 9, 2020

bors commented Sep 9, 2020

rust-timer commented Sep 9, 2020

michaelwoerister commented Sep 9, 2020

Mark-Simulacrum commented Sep 9, 2020

rust-timer commented Sep 9, 2020

Mark-Simulacrum commented Sep 9, 2020

michaelwoerister commented Sep 9, 2020

michaelwoerister commented Sep 10, 2020

michaelwoerister commented Sep 7, 2020 •

edited

Loading