This MR represents a large performance improvement in practical terms in
simple Crane situations.
For some unscientific numbers, I tested in the following fashion.
For each of the two versions of crane (master, this branch), and my simple
closed source software project (it's 1kloc, about 170 crate deps, mostly just hyper):
1. I ran `nix flake check` to ensure that any crate downloads were done,
any supporting derivations complete.
1. I then added a new crate (anyhow) and ran `nix flake check` again.
This times the whole flow after a dependency is added.
1. I then changed a constant and re-ran, simulating the usual flow.
Because this project is relatively small, I would expect that this
represents a 'worst case' scenario. For example, when uncompressed,
it contains 500MB of dependencies, whereas another project I work on
represents 3.6GB.
The results were as follows:
Before this PR:
- build with new dep: 2m15s
- build with new code: 1m19s
After this PR:
- build with new dep: 1m22s
- build with new code: 32s
In addition, it is more robust to crate rebuilds.
How it works/why it's better:
1. Drop the diffing behaviour when doing symlinking. This is an
explicit tradeoff - if one is doing symlinking on inheritance,
we would expect any duplicate data to be in the form of symlinks,
for which diffing file content is unhelpful. Given that this only
helps the case where we are not symlinking on inheritance, are
not archiving on install, it seems reasonable for it to be
potentially slower in this case. I say potentially slower since
if we have target dirs of 1GB, we are trading 2GB of reads for
up to 1GB fewer writes. I'd note here that Nix store optimisation
will cover for space savings. But, main argument: common case should
be archival or symlinking, and we can boost the performance of the
common case by removing this behaviour.
1. Instead, we build a `symlinks.tar` containing symlinks to the outputs
of this derivation.
1. When inheriting, instead of traversing the tree and creating
symlinks, we just extract this tar. This is great beacuse it means
that on both derivation end and derivation start, we avoid forking
O(num files produced by cargo build) processes. Since even small
projects have thousands of files emitted (my own has 2033 output
files). Effectively, GNU tar is much more optimised than the
pre-existing bash script.
1. At this point, we still have the problem where rustc may try to write
to a file. We use a `RUSTC_WRAPPER` to instead write to a temporary
directory, and after the command is finished we copy the artifacts
from the out dir back to the target location. There is a potential
(small) slowdown caused by this - I observe cargo to use rustc's
stderr to kick off new builds as soon as it can, and so had to
capture rustc's stdout. However, this effect is most likely very
minor.