Minimal implementation of a drv outputs copy #4225

thufschmitt · 2020-11-06T12:36:19Z

This PR tries to provide the simplest possible way of copying drv outputs.
In particular it doesn't try to tackle the issue of copying the drv output closure at all, all it does is allow to do nix copy /nix/store/...-foo.drv!out which will copy the corresponding outpath and register the out path mapping for ...-foo.drv!out.

For coherence (and also because it's at least as simple to do things that way), nix copy nixpgs#foo has the same semantics, meaning that it will both copy the output path(s) and register the derivation output(s).

thufschmitt · 2020-11-09T10:44:06Z

I've updated the PR to make the outputPath of derivation outputs a foreign key pointing to a ValidPaths entry.
This should make things cleaner as it removes the possibility of having a dangling reference (which would cause issues if this drv output gets rebuild and yields a different output path − bc of non-determinism).

@edolstra the DerivationOutputs table currently doesn't have a user field, but we'll probably want it at some point in the future. Should we add it right now (at the cost of keeping some possibly never used cruft) or defer it (at the cost of another db schema chang)

Ericson2314 · 2020-11-10T13:38:49Z

@edolstra the DerivationOutputs table currently doesn't have a user field, but we'll probably want it at some point in the future. Should we add it right now (at the cost of keeping some possibly never used cruft) or defer it (at the cost of another db schema chang)

I would defer it, as the migration is not so complex that I'm worried about doing it later, but the reverse migration is lossy.

Ericson2314 · 2020-11-10T13:40:32Z

src/libstore/binary-cache-store.cc

+    StringSink sink;
+    dumpString(s, sink);
+    auto source = StringSource { *sink.s };


This looks like a bugfix we should PR sooner?

We could yes. Though I don't think this method was used anywhere before this PR so probably not an emergency

Ericson2314 · 2020-11-10T14:04:31Z

src/libstore/drv-output-info.hh

+    { return static_cast<RawDrvInput>(*this); }
+};
+
+std::list<std::string> stringify_refs(const std::set<DrvInput> & refs);


Thanks, removed in a2803d6

Ericson2314 · 2020-11-10T14:13:31Z

src/libstore/local-store.cc

+    // XXX: This ignores the references of the output because we can
+    // recompute them later from the drv and the references of the associated
+    // store path, but doing so is both inefficient and fragile.


This doesn't apply to this version, right?

Indeed it doesn't. Removed in 58ea93c

Ericson2314 · 2020-11-10T14:30:51Z

I think I would feel most comfortable if the first minimal version didn't store mappings for unresolved derivations to paths, because doing that while making sure we don't have accidental mapping collisions is very hard.

thufschmitt · 2020-11-10T16:36:45Z

I think I would feel most comfortable if the first minimal version didn't store mappings for unresolved derivations to paths, because doing that while making sure we don't have accidental mapping collisions is very hard.

I fail to see which collision you think of here. It's possible that the drv would resolve to something else (and possibly transitively produce a different output path) if you were to rebuild it, but it's not fundamentally different from having the resolved derivation producing a different output path.

Ericson2314 · 2020-11-11T19:03:34Z

I fail to see which collision you think of here.

The only where the collected build-time-only deps could be built again different.

but it's not fundamentally different from having the resolved derivation producing a different output path.

Sure, but if we just start with resolved derivation mappings, we are sure to be correct: rather than stuff being dubiously cached, it will just fail to be cached if some resolved derivation along the way is different.

Because that is is a "definitely correct" starting point, I think it's where I'd like to begin.

Another idea i was thinking was leaving the current DerivationOutputs as-is, and having a new table for CA derv mappings. Queries can union the two tables together, and this means we can continue having the foreign key and simple cleanup for input-addressed derivations, and we separate the "normative" CA mappings from the purely-for-caching IA ones. Thoughts?

Ericson2314 · 2020-11-26T17:07:51Z

So picking up where I left off with #4225 (comment)

I think the most minimal thing to do is:

Just to "atomic" resolved dependency mappings for now:
Allow them to be individually copied between stores with a separate command
Don't worry about making it easy to copy all the resolved derivation mappings needed to re-resolve and unresolved derivation

Yes, we don't want users to have to actually do this tedious stuff in practice, but I think there are good building blocks to have in the end too, and by starting with that, we avoid anything with multiple competing designs.

thufschmitt · 2020-11-27T12:49:58Z

@Ericson2314

I think the most minimal thing to do is:
1. Just to "atomic" resolved dependency mappings for now:

What do you mean exactly by “atomic”? The way I understand it is “without bothering with the dependencies”, but that's already what this PR does.

2. Allow them to be individually copied between stores with a separate command

That's an interesting point. I overloaded nix-copy for that because

It's not a lot more complicated than having a different command
It's actually quite easy for the tests
It's quite likely that this is the (low-level) interface we'll want eventually, so it makes sense to do it right now rather than adding a new command that we'll have to deprecate (or remove, after all it's just experimental) in a few months.

But there's arguments to be made in the other directions, so…

3. Don't worry about making it easy to copy all the resolved derivation mappings needed to re-resolve and unresolved derivation

I feel like it would make things noticeably more complex. Even if just for the tests, it's quite nice to be able to expect nix copy --to $REMOTE ./foo.nix && clearStore && nix copy --from $REMOTE ./foo.nix to work, that gives a nice and easy to check property (I'm secretly dreaming of adding property-based testing to Nix)

Another idea i was thinking was leaving the current DerivationOutputs as-is, and having a new table for CA derv mappings. Queries can union the two tables together, and this means we can continue having the foreign key and simple cleanup for input-addressed derivations, and we separate the "normative" CA mappings from the purely-for-caching IA ones. Thoughts?

As a matter of fact I've been discussing with Eelco yesterday and we reached the same conclusion (though we might not need to do any union as it's totally fine for the two tables to overlap). The nice side of this is that it makes it easier to iterate on the ca-specific schema without touching the rest

thufschmitt · 2020-11-27T13:43:40Z

I've had a chat with @edolstra yesterday, which led to a few suggestions for this:

We could (as @Ericson2314 suggested above) add a separate table in the db to track the output mappings of the CA derivations (to make it easier to evolve that schema independently of the rest). That way we can even have a second version scheme for the ca-specific db aspects, allowing people to easily go back-and-forth between ca and mainline Nix
queryDrvOutputInfo and friends should be
1. Async (like queryPathInfo)
2. Cached in memory (and possibly on-disk, though that's less urgent)

Ericson2314 · 2020-11-28T00:20:37Z

Glad we're all on the same page for multiple tables!

What do you mean exactly by “atomic”? The way I understand it is “without bothering with the dependencies”, but that's already what this PR does.

Right, but my big schick is that that it's sketchy to have mappings of unresolved derivations without dependencies.

What do you mean exactly by “atomic”? The way I understand it is “without bothering with the dependencies”, but that's already what this PR does.

Well the main thing is if we don't have dependencies and restrict ourselves to resolved derivations, we would have a hard time overloading it.

(I also think it is good to allow just copying outputs, derivations, and mappings independent, but this is less important; I'm not disagreeing that copping the data and mappings is a good default once we can support it.)

Even if just for the tests

It does make testing harder, because it's such a cumberson low-level interface. I think that's just the (IMO worth it) price of making a minimal thing with as uncontroversial design as we can.

Basically taking a step back, because I'm very keen on keeping the full provenance of resolved derivation mappings around, I think it's just going to be more work before we get to good usable interfaces. But I'm OK with that.

…sed ones Fix an overlook of NixOS#4056

This requires adding `nix` to its own closure which is a bit unfortunate, but as it is optional (the test will be disabled if `OUTER_NIX` is unset) it shouldn't be too much of an issue. (Ideally this should go in another derivation so that we can build Nix and run the test independently, but as the tests are running in the same derivation as the build it's a bit complicated to do so).

For each “known” derivation output, store: - its output id - its output path This comes with a set of needed changes: - New `drv-output-info` module declaring the types needed for describing these mappings - New `Store::registerDrvOutput` method registering all the needed informations about a derivation output (also replaces `LocalStore::linkDeriverToPath`) - new `Store::queryDrvOutputInfo` method to retrieve the informations for a derivations This introcudes some redundancy on the remote-store side between `wopQueryDerivationOutputMap` and `wopQueryDrvOutputInfo`. However we might need to keep both (regardless of backwards compat) because we sometimes need to get some infos for all the outputs of a derivation (where `wopQueryDerivationOutputMap` is handy), but all the stores can't implement it − because listing all the outputs of a derivation isn't really possible for binary caches where the server doesn't allow to list a directory.

Copying a derivation output is equivalent to copy the output path, except that it will also link the output to the path (with `linkDeriverToPath` on the remote store)

Commands that accept store paths or installables as input can now take a derivation path with outputs. Like when the input is a nix expr, the derivation will be realised if needed.

Generalises `StorePathsCommand` to allow manipulating `Buildables` rather than just store paths.

Mostly dummy atm as it just computes the closure of the output path, but might be extended later

Add a new table for tracking the derivation output mappings. We used to hijack the `DerivationOutputs` table for that, but (despite its name), it isn't a really good fit: - Its entries depend on the drv being a valid path, making it play badly with garbage collection and preventing us to copy a drv output without copying the whole drv closure too; - It dosen't guaranty that the output path exists; By using a different table, we can experiment with a different schema better suited for tracking the output mappings of CA derivations. (incidentally, this also fixes NixOS#4138)

That way we can `nix copy --from` a store that doesn't have the derivation but just its outputs

That way we don't need to resolve it again to know its output paths. In particular, this means that we don't need to keep its whole build-time closure

`buildPaths` can be called even for stores where it's not defined in case it's bound to be a no-op. The “no-op detection” mechanism was only detecting the case wher `buildPaths` was called on a set of (non-drv) paths that were already present on the store. This commit extends this mechanism to also detect the case where `buildPaths` is called on a set of derivation outputs which are already built on the store.

thufschmitt mentioned this pull request Nov 6, 2020

Copy drv outputs between stores #4220

Closed

thufschmitt force-pushed the minimal-drv-outputs-copy branch from 3220131 to 2c9f509 Compare November 6, 2020 14:36

thufschmitt force-pushed the minimal-drv-outputs-copy branch from 45799bb to 6aa848f Compare November 9, 2020 15:48

Ericson2314 reviewed Nov 10, 2020

View reviewed changes

thufschmitt mentioned this pull request Nov 12, 2020

Substitute drv outputs #4238

Closed

thufschmitt force-pushed the minimal-drv-outputs-copy branch from 3be2780 to c29bfd7 Compare November 12, 2020 12:12

thufschmitt added the ca-derivations Derivations with content addressed outputs label Nov 16, 2020

thufschmitt force-pushed the minimal-drv-outputs-copy branch 3 times, most recently from dc9e34a to 8dfbbe3 Compare November 18, 2020 08:15

thufschmitt requested a review from edolstra November 18, 2020 09:39

thufschmitt force-pushed the minimal-drv-outputs-copy branch from 8dfbbe3 to 910fe3e Compare November 18, 2020 16:15

thufschmitt force-pushed the minimal-drv-outputs-copy branch from 910fe3e to faf5d7a Compare November 25, 2020 17:25

thufschmitt force-pushed the minimal-drv-outputs-copy branch 3 times, most recently from 7a0ba80 to 1acb0cb Compare November 27, 2020 15:41

thufschmitt added 3 commits December 1, 2020 16:41

fix the hash rewriting for ca-derivations

0e5c22a

Allow fixed-output derivations to depend on (floating) content-addres…

403fb3b

…sed ones Fix an overlook of NixOS#4056

thufschmitt added 15 commits December 1, 2020 16:41

Add a test for the migration of the db between versions

47f097d

Make copyPaths work with derivation outputs

21bfece

Copying a derivation output is equivalent to copy the output path, except that it will also link the output to the path (with `linkDeriverToPath` on the remote store)

Allow /nix/store/....drv!out as a cli input

30eb04a

Commands that accept store paths or installables as input can now take a derivation path with outputs. Like when the input is a nix expr, the derivation will be realised if needed.

Add a BuildablesCommand type

d2e6cc0

Generalises `StorePathsCommand` to allow manipulating `Buildables` rather than just store paths.

Make Nix copy work on drvOutputIds

08fcb23

Add a store method for computing the closure of a drv output

2607da5

Mostly dummy atm as it just computes the closure of the output path, but might be extended later

Fix addTextToStore for binary caches

ebfdafa

InstallableStorePath: Don't read the derivation if there's no need

3e2201d

That way we can `nix copy --from` a store that doesn't have the derivation but just its outputs

Also register the outputs of the non-resolved derivation

b646c2a

That way we don't need to resolve it again to know its output paths. In particular, this means that we don't need to keep its whole build-time closure

Allow the post-build-hook to copy drv outputs and not just store paths

d9d8891

Add a test for copying CA derivations outputs

0bfcd02

Only update the db schema if the the ca-derivations feature is enabled

dcb8957

thufschmitt force-pushed the minimal-drv-outputs-copy branch from 1acb0cb to dcb8957 Compare December 1, 2020 16:24

thufschmitt added this to the ca-derivations-mvp milestone Dec 11, 2020

thufschmitt closed this Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimal implementation of a drv outputs copy #4225

Minimal implementation of a drv outputs copy #4225

thufschmitt commented Nov 6, 2020 •

edited

Loading

thufschmitt commented Nov 9, 2020

Ericson2314 commented Nov 10, 2020

Ericson2314 Nov 10, 2020

thufschmitt Nov 10, 2020

Ericson2314 Nov 10, 2020

thufschmitt Nov 10, 2020

Ericson2314 Nov 10, 2020

thufschmitt Nov 10, 2020

Ericson2314 commented Nov 10, 2020

thufschmitt commented Nov 10, 2020

Ericson2314 commented Nov 11, 2020 •

edited

Loading

Ericson2314 commented Nov 26, 2020

thufschmitt commented Nov 27, 2020

thufschmitt commented Nov 27, 2020 •

edited

Loading

Ericson2314 commented Nov 28, 2020

Minimal implementation of a drv outputs copy #4225

Minimal implementation of a drv outputs copy #4225

Conversation

thufschmitt commented Nov 6, 2020 • edited Loading

thufschmitt commented Nov 9, 2020

Ericson2314 commented Nov 10, 2020

Ericson2314 Nov 10, 2020

Choose a reason for hiding this comment

thufschmitt Nov 10, 2020

Choose a reason for hiding this comment

Ericson2314 Nov 10, 2020

Choose a reason for hiding this comment

thufschmitt Nov 10, 2020

Choose a reason for hiding this comment

Ericson2314 Nov 10, 2020

Choose a reason for hiding this comment

thufschmitt Nov 10, 2020

Choose a reason for hiding this comment

Ericson2314 commented Nov 10, 2020

thufschmitt commented Nov 10, 2020

Ericson2314 commented Nov 11, 2020 • edited Loading

Ericson2314 commented Nov 26, 2020

thufschmitt commented Nov 27, 2020

thufschmitt commented Nov 27, 2020 • edited Loading

Ericson2314 commented Nov 28, 2020

thufschmitt commented Nov 6, 2020 •

edited

Loading

Ericson2314 commented Nov 11, 2020 •

edited

Loading

thufschmitt commented Nov 27, 2020 •

edited

Loading