-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intensional store model #296
Comments
Another interesting possibility is to use OSTree as the underlying store (which already hashes and deduplicates) and then turn /nix/store/* into hardlinks. So then we'd have two levels of store, the intensional and the extensional, which would mostly coexist. |
Does OSTree hardcode paths in libraries with rpath? As far as I understood when I looked at OSTree, it was less granular than nix, as it switches a whole file system. I can't imagine OSTree being used with nix. |
OSTree is just a store of read-only files with extended attributes identified by the hash of their contents, together with code for hardlinking those files into a directory tree. RPM-OSTree is the software that manages and switches the filesystem; I'm not proposing we use that, since our activation scripts are about as featureful and are easier to work with. |
You didn't answer how should it be applied to nix. Shall the whole /nix/store tree change whenever a new derivation is stored or what? |
See the API. Maybe you can see the similarity to nix-store's internal API. My plan was to store each derivation as a commit and check it out to /nix/store/whatever. The runtime dependencies can be parent commits or we can just ignore that part and do the GC ourselves (the SQL database is not going away). |
Any thoughts about the "intensional store"? |
@tomberek I'm not certain who you were speaking to, but my thoughts are that it should be implemented ASAP. |
I thought OSTree doesn't allow accessing multiple versions at once, just as git doesn't. Anyway, we already do have the bare store-part that we need, with always-on top-level-path deduplication and optional file-level deduplication. That's IMHO the easy part. I thought much about the intensional store many months ago, and we certainly do want it at some point. After delving deeper I was very surprised that the consequences are not at all as straightforward as they first appeared. IIRC derivation handling is the main stumbling block and can't be as straightforward as it is now. I have no idea if/how all is dealt with in that prototype code. Also, using the intensional store will put much larger pressure on real binary determinism of the outputs. Our nixpkgs is most likely far from ready ATM. Currently if there's some slight semantics-preserving impurity (like programs wanting to print build date), we don't even notice it, just as in any usual distro. With intensional store these packages would change their output path on every build, including paths of anything that depends on it (transitive runtime dependents). |
So, here is the intensional model as I understand it. Data types (all can be hashed and/or stored on disk and/or streamed if necessary)
Containers
Operations
From this, I can see 4 things:
I've started by rebasing secure onto master, but unfortunately most of the changes were just commenting things out and the rest referred to things that don't exist anymore, so it was mostly useful for learning my way around the code. OSTree can only store checkouts, so it is a feature rather than part of the design. |
Can you elaborate on that? |
The main one is keeping the checksums out of Nixpkgs; they're already stored in the token-storeball |
I believe this has some interesting interactions with recursive nix (#13). First of all, once nix exprs can be developed upstream, it will be even more useful to have an easy way to keep HEAD packages up to date. This implies three phases:
The building of intentionally non-deterministic pkgs seems a lot safer with an intensional store. Whereas most builds would automatically extend the user's trusted build mapping (the one inducing an equivalence set over output paths), intentionally non-deterministic builds such as the repo pre-fetch could create a new mapping which the user could optionally subscribe too. This makes me wonder if even the actions relating to nix-channels could be conceived of as installing non-deterministic packages. |
On another note, I don't know about OSTree, but http://ipfs.io/ once it is ready would make a fantastic intentional store for Nix--we could really be its killer app. I mentioned it on IRC, but thought i should here too. [Disclaimer: I am not associated with IPFS in any way, but neither have I tried it. I just read its paper once and immediately thought it perfect for Nix.] |
Just to clarify the intensional model is explained on page 143 of the thesis: http://nixos.org/~eelco/pubs/phd-thesis.pdf I was wondering why it was called "intensional"? |
I have read that before. Could you elaborate as to how this applies to Nix?
|
The idea is that the store path name reflects the entirety of the properties of the path by containing a hash of its contents. |
When I started using nix I was confused on why we need to calculate the checksum of git repos since the git sha is relatively unique. Now I know the distinction but if we could store git checkouts by their sha it would be really nice and remove a lot of boilerplate. |
A cheap and easy thing to do could be to store each store path at a content addressable hard link, and then make a symlink from the input hash to the hard location. Multiple hard links can be made for different hashing schemes, and multiple input symlinks can point to a single output. I don't know how complicated it would be to perform the switch after build jobs complete and how costly it would be to dereference a symlink for each package reference by inputs. |
the IPFS community would love to help with this! let us know how we can. |
@jbenet Glad to here it! The PHD thesis is still probably the best resource on the idea itself. #378 while superficially not about this at all, I think is actually serves as a good resource on the quirks of the current system, and the usecase where it is most wanting. I'm not any sort of official Nix developer, but happy to answer any questions you may have. |
@ehmry I just had the same idea as you :) https://groups.google.com/forum/#!topic/nix-devel/m8Rrv3VpdBo The difference is that I propose that the build step gets the CAS entries as inputs, not the input hashes. The input hashes would only be used in case the build product needs to refer to itself. Obviously, this means that build outputs that need to access themselves will have a different $cas for different input hashes, even if the build output is otherwise the same. Perhaps builds should be done in |
@wmertens Yes, I had considered supporting multiple hashing schemes, but I no longer think that is worth the effort so replacing the input hashes seems practical. I had a system like this running, I don't remember any specific problems with CAS entries but the whole eventually collapsed from making too many changes to Nix. |
Some progress on this: edolstra@236e87c |
@edolstra wonderful! I just read the Intensional Store section of your thesis, I now wish I did that long ago ;) I see that there is still quite a bit of work to do to get to the Intensional Store you laid out there. One thing that stands out is storing the equivalent hashes ( I'm particularly curious about how this will play out with Hydra, how the refClasses will be provided over the network. It would also be interesting to have a crowd-sourced refClasses database, where many builds by somewhat-trusted users show that a certain input hash leads to some CAS hash. |
I just realized that this initial progress is already enough to rewrite the entire store into CAS equivalents with a script: move+link all the outputs to CAS paths and then rewrite all the hash references to their CAS hash. The Enough to already play with it :) I'll see if I can cook something up this weekend, but I will be happy if someone beats me to it ;) EDIT: the CAS linking + rewriting should happen depth-first and rewrite first, otherwise the CAS hash changes. So if a depends on b depends on c, first calculate c', then rewrite b to use c' into b', then calculate b'', then rewrite a to a' with b'', then calculate a''. |
I read the intensional store thesis chapter and I think it will be a big improvement, but at the same time there seems to be a small conflict with strict determinism. The hash rewriting policy (sec 6.3.2), allows the derivation to build with its own temporary output location in its environment, and a random hash is suggested for such a purpose. But such a random hash puts entropy into that build that it could use to create a different output. If I'm understanding correctly, the minimum example of recompilation avoidance is something like this:
Then That is, the "from scratch guarantee" (like in any incremental computing) is that the full rebuild would have created the same expected bits if it were run. But that's why visibility of Simple solution: Why not just set the output path to a constant? As long as the As a variation on that, it could be the determinism-enforcement sandbox itself that simulates a directory-rename of the P.S. The current |
I marked this as stale due to inactivity. → More info |
Still important to me. |
I marked this as stale due to inactivity. → More info |
Still important, but perhaps this specific issue can be closed. Seems to be well underway with the CA effort. Well done @regnat ! |
I marked this as stale due to inactivity. → More info |
Yes, we do have this now! |
Currently, if any change is made to the build package script, e.g. adding an extra newline in installPhase, then the package and all of its dependencies will be rebuilt because the derivation hash changed. With an intensional store model, only the package will be rebuilt, and the dependencies will remain unchanged, reducing build times.
http://nixos.org/~eelco/pubs/phd-thesis.pdf refers to a "prototype implementation" of the intensional store, which appears to be in https://github.com/NixOS/nix/tree/secure; maybe that could be resurrected and merged?
The text was updated successfully, but these errors were encountered: