Much faster symlinking, safer interactions with `rustc` #386

This MR represents a large performance improvement in practical terms in simple Crane situations. For some unscientific numbers, I tested in the following fashion. For each of the two versions of crane (master, this branch), and my simple closed source software project (it's 1kloc, about 170 crate deps, mostly just hyper): 1. I ran `nix flake check` to ensure that any crate downloads were done, any supporting derivations complete. 1. I then added a new crate (anyhow) and ran `nix flake check` again. This times the whole flow after a dependency is added. 1. I then changed a constant and re-ran, simulating the usual flow. Because this project is relatively small, I would expect that this represents a 'worst case' scenario. For example, when uncompressed, it contains 500MB of dependencies, whereas another project I work on represents 3.6GB. The results were as follows: Before this PR: - build with new dep: 2m15s - build with new code: 1m19s After this PR: - build with new dep: 1m22s - build with new code: 32s In addition, it is more robust to crate rebuilds. How it works/why it's better: 1. Drop the diffing behaviour when doing symlinking. This is an explicit tradeoff - if one is doing symlinking on inheritance, we would expect any duplicate data to be in the form of symlinks, for which diffing file content is unhelpful. Given that this only helps the case where we are not symlinking on inheritance, are not archiving on install, it seems reasonable for it to be potentially slower in this case. I say potentially slower since if we have target dirs of 1GB, we are trading 2GB of reads for up to 1GB fewer writes. I'd note here that Nix store optimisation will cover for space savings. But, main argument: common case should be archival or symlinking, and we can boost the performance of the common case by removing this behaviour. 1. Instead, we build a `symlinks.tar` containing symlinks to the outputs of this derivation. 1. When inheriting, instead of traversing the tree and creating symlinks, we just extract this tar. This is great beacuse it means that on both derivation end and derivation start, we avoid forking O(num files produced by cargo build) processes. Since even small projects have thousands of files emitted (my own has 2033 output files). Effectively, GNU tar is much more optimised than the pre-existing bash script. 1. At this point, we still have the problem where rustc may try to write to a file. We use a `RUSTC_WRAPPER` to instead write to a temporary directory, and after the command is finished we copy the artifacts from the out dir back to the target location. There is a potential (small) slowdown caused by this - I observe cargo to use rustc's stderr to kick off new builds as soon as it can, and so had to capture rustc's stdout. However, this effect is most likely very minor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Much faster symlinking, safer interactions with `rustc` #386

Much faster symlinking, safer interactions with `rustc` #386

Commits on Sep 16, 2023

Much faster symlinking, safer interactions with rustc #386

Much faster symlinking, safer interactions with rustc #386

Commits on Sep 16, 2023

Much faster symlinking, safer interactions with `rustc` #386

Much faster symlinking, safer interactions with `rustc` #386