Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experimental] Performance test new paging-based measureme data sink. #76436

Conversation

michaelwoerister
Copy link
Member

@michaelwoerister michaelwoerister commented Sep 7, 2020

This PR will try to make sure that the new measureme file format (as described in rust-lang/measureme#128) does not regress performance while doing self-profiling.

The perf.rlo run is actually expected to fail (because the tools on the perf server cannot handle the new format yet) but hopefully we'll still get the top-level timings that should show potential regressions.

The implementation of the new format as referenced by this PR is not production ready yet, but it should give a good indication of final performance. Once we know if this method of performance testing actually yields results, we can also test an alternative implementation that is slightly simpler.

cc @wesleywiser @Mark-Simulacrum

@michaelwoerister
Copy link
Member Author

@bors try

@bors
Copy link
Contributor

bors commented Sep 7, 2020

@michaelwoerister: 🔑 Insufficient privileges: not in try users

@Xanewok
Copy link
Member

Xanewok commented Sep 7, 2020

@bors try

@bors
Copy link
Contributor

bors commented Sep 7, 2020

⌛ Trying commit d9054f8d414a52d189b26a0fa92f614740d596b2 with merge 12da0f82045ff20d2ae70c83c926bae9c22df223...

@bors
Copy link
Contributor

bors commented Sep 7, 2020

💔 Test failed - checks-actions

@bors bors added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 7, 2020
@rust-log-analyzer
Copy link
Collaborator

The job dist-x86_64-linux of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
   Compiling parking_lot v0.11.0
   Compiling rand_core v0.5.1
   Compiling generic-array v0.12.3
   Compiling parking_lot v0.10.2
   Compiling measureme v0.7.1 (https://github.com/michaelwoerister/measureme.git?branch=paged_sinks#da4d0a6e)
   Compiling rand_chacha v0.2.2
   Compiling matchers v0.0.1
   Compiling rustc_serialize v0.0.0 (/checkout/compiler/rustc_serialize)
   Compiling petgraph v0.5.1
---
   Compiling parking_lot v0.10.2
   Compiling crossbeam-deque v0.7.3
   Compiling block-buffer v0.7.3
   Compiling digest v0.8.1
   Compiling measureme v0.7.1 (https://github.com/michaelwoerister/measureme.git?branch=paged_sinks#da4d0a6e)
   Compiling matchers v0.0.1
   Compiling rustc_serialize v0.0.0 (/checkout/compiler/rustc_serialize)
   Compiling petgraph v0.5.1
   Compiling md-5 v0.8.0
---
    Checking parking_lot v0.10.2
    Checking generic-array v0.12.3
    Checking rand_chacha v0.2.2
    Checking crossbeam-deque v0.7.3
    Checking measureme v0.7.1 (https://github.com/michaelwoerister/measureme.git?branch=paged_sinks#da4d0a6e)
    Checking regex v1.3.9
    Checking rand v0.7.3
    Checking rustc_serialize v0.0.0 (/checkout/compiler/rustc_serialize)
    Checking petgraph v0.5.1

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

@michaelwoerister michaelwoerister force-pushed the perftest-paged-measureme-background-thread branch 3 times, most recently from 8fd2459 to ca1bb18 Compare September 7, 2020 12:35
@michaelwoerister
Copy link
Member Author

OK, this seems to pass all checks now. @Xanewok, would you be so kind and start another perf run for me?

@Xanewok
Copy link
Member

Xanewok commented Sep 7, 2020

Sure! Let's see if this one-liner works...

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Sep 7, 2020

⌛ Trying commit ca1bb18264167719f514e66b94870f465036810f with merge cc857b33d54a8ee4ee103623dc1de441b8cf6009...

@Mark-Simulacrum
Copy link
Member

Try build finished but intentionally not queueing this on perf, as it won't work. I'm working on patch that we can land to make this work side-by-side with regular builds.

@Mark-Simulacrum
Copy link
Member

Looks like this currently fails in rustc, when running on helloworld

thread 'rustc' panicked at 'assertion failed: num_bytes <= self.shared_state.page_size - PAGE_HEADER_SIZE', /cargo/git/checkouts/measureme-11f27a204054df90/c004094/measureme/src/paged_serialization_sink.rs:133:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.48.0-nightly (cc857b33d 2020-09-07) running on x86_64-unknown-linux-gnu

note: compiler flags: -Z self-profile=/tmp/.tmpNwWiWk/self-profile-output -C embed-bitcode=no -C debuginfo=2 --crate-type bin

note: some of the compiler flags provided by cargo are hidden

thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `LLVMing`,
 right: `Codegenning`', /rustc/cc857b33d54a8ee4ee103623dc1de441b8cf6009/compiler/rustc_codegen_ssa/src/back/write.rs:1419:21

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.48.0-nightly (cc857b33d 2020-09-07) running on x86_64-unknown-linux-gnu

note: compiler flags: -Z self-profile=/tmp/.tmpNwWiWk/self-profile-output -C embed-bitcode=no -C debuginfo=2 --crate-type bin

note: some of the compiler flags provided by cargo are hidden


@michaelwoerister
Copy link
Member Author

Looks like this currently fails in rustc, when running on helloworld

Indeed it does. SelfProfiler::bulk_map_query_invocation_id_to_single_string() seems to write quite large chunks of data at once, more than the page size in some cases. I'll make the code a bit more robust against this kind of thing.

@michaelwoerister michaelwoerister force-pushed the perftest-paged-measureme-background-thread branch from ca1bb18 to e5d80ea Compare September 8, 2020 13:12
@Mark-Simulacrum
Copy link
Member

@bors try

@bors
Copy link
Contributor

bors commented Sep 8, 2020

⌛ Trying commit e5d80ea with merge da6b924ee19d83e73682a095f60d8bc79de4cdc3...

@bors
Copy link
Contributor

bors commented Sep 8, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: da6b924ee19d83e73682a095f60d8bc79de4cdc3 (da6b924ee19d83e73682a095f60d8bc79de4cdc3)

@rust-timer
Copy link
Collaborator

Queued da6b924ee19d83e73682a095f60d8bc79de4cdc3 with parent 5a6b426, future comparison URL.

@Mark-Simulacrum
Copy link
Member

I've put a new summarize on the perf collector which seemed to work locally (the collector supports 2 different simultaneous summarizes). I am uncertain if the upload logic will break, but we'll tackle that if necessary.

@michaelwoerister
Copy link
Member Author

Thanks @Mark-Simulacrum, I was able to do a full rustc-perf run with this version locally. Let's see if something else breaks.

One way or the other, I'll need to rewrite the whole thing anyway before it can be merged. The current version has quite a bit of room for improvement in terms of code quality.

@Mark-Simulacrum
Copy link
Member

Okay had to patch perf.rlo a bit but it should collect now. The site is likely to show just noise -- we've likely not changed overhead for collection being off, and perf.rlo only enables self-profile for one out of 2-3 runs of each benchmark. I will be able to manually pull up data from the database for the maximum rather than minimum statistic which should give us some idea of how much of a regression/win this is.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (da6b924ee19d83e73682a095f60d8bc79de4cdc3): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@michaelwoerister
Copy link
Member Author

Alright, those numbers look encouraging. I'll put together a version that doesn't use a background thread tomorrow.

@Mark-Simulacrum
Copy link
Member

Pulled up maximums rather than minimums, and though they look a bit worse, seems mostly fine. Attached a screenshot -- putting it up online is rather annoying.

Screenshot_2020-09-08 rustc performance data

@michaelwoerister
Copy link
Member Author

Interesting, yes. There seem to be a few minor regressions. Maybe there is a bit of a pattern here: Most regressions seem to be scenarios that are quick to execute and where there's little time spent in LLVM and the linker. Maybe the background thread becomes the bottleneck towards the end, when it is still writing stuff to disk and the rest of the program is only waiting on it to finish?

How did you generate those numbers? Is there a special mode in the collector?

@michaelwoerister
Copy link
Member Author

@Mark-Simulacrum I just pushed a version that writes data to disk in a blocking way. That data format is not affected by this so the tooling on the server should still be compatible.

@Mark-Simulacrum
Copy link
Member

@bors try @rust-timer queue

How did you generate those numbers? Is there a special mode in the collector?

The collector runs rustc 2-3 times: once for self-profile, and then 1-2 times without it. By default, the perf.rust-lang.org frontend shows the minimum of each stat across those runs. Since self-profile does impose some overhead, that's usually not the self-profile run. The page I posted switched min to max (manually editing the SQL query).

I guess it might create a recursive "self feeding" problem, but in theory if we record events for the serialization as well, then crox should be able to show us when those are being serialized and to what extent we're waiting on them. I agree that it seems likely we're waiting at the end.

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Sep 9, 2020

⌛ Trying commit 2ad078b with merge 7479e5973d66928be9e1bbce6cd377ff0957c094...

@bors
Copy link
Contributor

bors commented Sep 9, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: 7479e5973d66928be9e1bbce6cd377ff0957c094 (7479e5973d66928be9e1bbce6cd377ff0957c094)

@rust-timer
Copy link
Collaborator

Queued 7479e5973d66928be9e1bbce6cd377ff0957c094 with parent 3f5e617, future comparison URL.

@michaelwoerister
Copy link
Member Author

The collector runs rustc 2-3 times: once for self-profile, and then 1-2 times without it. By default, the perf.rust-lang.org frontend shows the minimum of each stat across those runs. Since self-profile does impose some overhead, that's usually not the self-profile run. The page I posted switched min to max (manually editing the SQL query).

Huh, that basically means we are not measuring anything useful with these perf runs, right? :) That is, I'm trying to find out the overhead of the new serialization strategy (and especially the overhead of the two new approaches relative to each other), but the "min" numbers actually (probably) don't contain self-profiled runs. I guess the solution is to ask you to manually run the modified query again for the current run :)

I guess it might create a recursive "self feeding" problem, but in theory if we record events for the serialization as well, then crox should be able to show us when those are being serialized and to what extent we're waiting on them. I agree that it seems likely we're waiting at the end.

There's definitely some things that we can record. For things that are application specific (e.g. the handling of query keys before they make it into measureme) this is easy. For things that happen inside of measureme, like flushing to disk once the in-memory buffers are full, we could have some kind of built-in recording in measureme itself. That might not even be that hard.

However, the final flushing of the file would need some kind of special handling. Right now we do that in SerializationSink::drop(). But that should actually be doable too. You always lose that last bit of information for how long it takes to write that very last measurement to disk but we could make sure that that is just a single event and thus does not account for much data.

@Mark-Simulacrum
Copy link
Member

Running the query is not hard so I don't mind doing that. I'm also thinking about exposing a toggle for "just self-profile" results (we do technically have enough information) on the UI, though I'll probably not get around to it for some time.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (7479e5973d66928be9e1bbce6cd377ff0957c094): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@Mark-Simulacrum
Copy link
Member

This is looking essentially perfect, lots of small variations but with a code change that affects many places in the compiler that's somewhat expected I think.

As before the max instead of min version, looking very clean:

image

@michaelwoerister
Copy link
Member Author

Yeah, this looks actually better than the more complicated version. That's good to know. These things so often are so counterintuitive :)

Thanks a lot for getting those numbers. I think it's really great that perf.rlo is using an SQL database now, btw.

@michaelwoerister
Copy link
Member Author

I'm going to close this PR, we got the numbers we need for deciding on how to forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants