Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize FileSerializationSink by using parking_lot::Mutex and avoiding heap allocations in write_atomic. #88

Merged
merged 3 commits into from
Nov 22, 2019

Conversation

michaelwoerister
Copy link
Member

This PR makes the FileSerializationSink exactly as fast as the MmapSerializationSink in the benchmarks we have. But I only tested on Linux and I also remember that the benchmarks were no good indication of performance when used in rustc.

It would be nice if we could get rid of the MmapSerializationSink because it keeps everything in memory until the end.

I wonder how much work it would be to have benchmarks that actually run rustc. It's a hassle to test this manually.

@michaelwoerister
Copy link
Member Author

I think our benchmarks generate too little data to be realistic (less than a megabyte, I think). I'll look into that later. Also, a multi-threaded benchmark would be good.

@wesleywiser
Copy link
Member

But I only tested on Linux and I also remember that the benchmarks were no good indication of performance when used in rustc.

FYI, as I recall, the issue was that mmap was quite a bit slower on Windows.

@michaelwoerister
Copy link
Member Author

Here are some benchmark numbers with and without the patch (numbers in milliseconds):

Windows before after improvement
file, 1 thread 897 773 ~15%
file, 8 threads 921 663 ~30%
mmap, 1 thread 909 939 -
mmap, 8 threads 839 830 -
Linux before after improvement
file, 1 thread 605 510 15-20%
file, 8 threads 765 752 -
mmap, 1 thread 512 519 -
mmap, 8 threads 378 349 -

The patch should be very good for Windows, especially for the multithreaded case. On Linux mmap is as fast (1 thread) or much faster (8 threads). I don't have any numbers for macOS.

@wesleywiser
Copy link
Member

I can gather some macOS numbers if you're interested.

@wesleywiser wesleywiser self-assigned this Nov 15, 2019
@michaelwoerister
Copy link
Member Author

If you have time, I'd be interested, yeah.

I got the numbers by running cargo +nightly bench. The "after" numbers are from this PR as it is, and the "before" numbers are from this PR, but with commit 8367ec6 removed. The last commit has to stay in because it modifies the benchmarks.

@michaelwoerister
Copy link
Member Author

Update: I just pushed the "baseline" version to https://github.com/michaelwoerister/measureme/tree/opt-file-sink-ref.

@wesleywiser
Copy link
Member

Here's what I'm seeing on my 4 core 8 thread MBP:

macOS before after improvement
file, 1 thread 759 556 ~26%
file, 8 thread 1,666 559 ~66%
mmap, 1 thread 670 622 ~7%
mmap, 8 thread 394 386 ~2%

@michaelwoerister
Copy link
Member Author

Thanks, @wesleywiser! Looks like we actually might want to switch to FileSerializationSink on macOS too for the non-parallel case.

Copy link
Member

@wesleywiser wesleywiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the second commit still WIP? If not, this looks good to me overall.

measureme/Cargo.toml Show resolved Hide resolved
measureme/src/file_serialization_sink.rs Outdated Show resolved Hide resolved
@michaelwoerister
Copy link
Member Author

The second commit is still work-in-progress, yes. I want to update the code in testing_common to handle multiple threads, which shouldn't be too hard. Thanks for review!

@michaelwoerister
Copy link
Member Author

I did some new Windows measurements of parking_lot vs std::sync. It looks like parking_lot is a bit faster indeed. Let's keep it.

Windows before std::sync parking_lot
file, 1 thread 744 614 589
file, 8 threads 862 637 582

@michaelwoerister
Copy link
Member Author

This is ready for review now.

@@ -16,7 +23,12 @@ impl SerializationSink for FileSerializationSink {
let file = fs::File::create(path)?;

Ok(FileSerializationSink {
data: Mutex::new((BufWriter::new(file), 0)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the gain by removing the BufWriter? feels like the new code is similar to the BufWriter code
so will BufWriter::with_capacity(1024*512, file) give the same result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that BufWriter does not allow directly writing to its buffer, so we are basically re-implementing BufWriter here. The interface of write_atomic requires there to be a writable output buffer.

@andjo403
Copy link
Contributor

andjo403 commented Nov 20, 2019

have someone looked at how mush the result differ for the "fast path" where the buffer is updated and the "slow path" where the file is written? as we are blocking all threads from writing during the file write this can affect many events.
also by increasing the size of the buffer to 512Kb from 8Kb maybe the variance of the measurements bigger as it takes longer to write.
was thinking that maybe some of the variance see in #67 can be due to the file writes.

@michaelwoerister
Copy link
Member Author

@andjo403 I have not investigated variance. The bigger buffer should reduce the number of writes, while making each write larger. So they are less evenly distributed but the fixed overhead might amortize better.

It would be nice to do the actual file writing in a background thread via some kind of double buffering scheme. I haven't tried to implement something like that though.

@wesleywiser wesleywiser merged commit 665b384 into rust-lang:master Nov 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants