Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal implementation of a drv outputs copy #4225

Closed
wants to merge 18 commits into from

Conversation

thufschmitt
Copy link
Member

@thufschmitt thufschmitt commented Nov 6, 2020

Depends on #4227 #4282 #4284

This PR tries to provide the simplest possible way of copying drv outputs.
In particular it doesn't try to tackle the issue of copying the drv output closure at all, all it does is allow to do nix copy /nix/store/...-foo.drv!out which will copy the corresponding outpath and register the out path mapping for ...-foo.drv!out.

For coherence (and also because it's at least as simple to do things that way), nix copy nixpgs#foo has the same semantics, meaning that it will both copy the output path(s) and register the derivation output(s).

@thufschmitt
Copy link
Member Author

I've updated the PR to make the outputPath of derivation outputs a foreign key pointing to a ValidPaths entry.
This should make things cleaner as it removes the possibility of having a dangling reference (which would cause issues if this drv output gets rebuild and yields a different output path − bc of non-determinism).

@edolstra the DerivationOutputs table currently doesn't have a user field, but we'll probably want it at some point in the future. Should we add it right now (at the cost of keeping some possibly never used cruft) or defer it (at the cost of another db schema chang)

@Ericson2314
Copy link
Member

@edolstra the DerivationOutputs table currently doesn't have a user field, but we'll probably want it at some point in the future. Should we add it right now (at the cost of keeping some possibly never used cruft) or defer it (at the cost of another db schema chang)

I would defer it, as the migration is not so complex that I'm worried about doing it later, but the reverse migration is lossy.

Comment on lines +437 to +439
StringSink sink;
dumpString(s, sink);
auto source = StringSource { *sink.s };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a bugfix we should PR sooner?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could yes. Though I don't think this method was used anywhere before this PR so probably not an emergency

{ return static_cast<RawDrvInput>(*this); }
};

std::list<std::string> stringify_refs(const std::set<DrvInput> & refs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed in a2803d6

Comment on lines 621 to 623
// XXX: This ignores the references of the output because we can
// recompute them later from the drv and the references of the associated
// store path, but doing so is both inefficient and fragile.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't apply to this version, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it doesn't. Removed in 58ea93c

@Ericson2314
Copy link
Member

I think I would feel most comfortable if the first minimal version didn't store mappings for unresolved derivations to paths, because doing that while making sure we don't have accidental mapping collisions is very hard.

@thufschmitt
Copy link
Member Author

I think I would feel most comfortable if the first minimal version didn't store mappings for unresolved derivations to paths, because doing that while making sure we don't have accidental mapping collisions is very hard.

I fail to see which collision you think of here. It's possible that the drv would resolve to something else (and possibly transitively produce a different output path) if you were to rebuild it, but it's not fundamentally different from having the resolved derivation producing a different output path.

@Ericson2314
Copy link
Member

Ericson2314 commented Nov 11, 2020

I fail to see which collision you think of here.

The only where the collected build-time-only deps could be built again different.

but it's not fundamentally different from having the resolved derivation producing a different output path.

Sure, but if we just start with resolved derivation mappings, we are sure to be correct: rather than stuff being dubiously cached, it will just fail to be cached if some resolved derivation along the way is different.

Because that is is a "definitely correct" starting point, I think it's where I'd like to begin.


Another idea i was thinking was leaving the current DerivationOutputs as-is, and having a new table for CA derv mappings. Queries can union the two tables together, and this means we can continue having the foreign key and simple cleanup for input-addressed derivations, and we separate the "normative" CA mappings from the purely-for-caching IA ones. Thoughts?

@thufschmitt thufschmitt added the ca-derivations Derivations with content addressed outputs label Nov 16, 2020
@thufschmitt thufschmitt force-pushed the minimal-drv-outputs-copy branch 3 times, most recently from dc9e34a to 8dfbbe3 Compare November 18, 2020 08:15
@Ericson2314
Copy link
Member

So picking up where I left off with #4225 (comment)

I think the most minimal thing to do is:

  1. Just to "atomic" resolved dependency mappings for now:
  2. Allow them to be individually copied between stores with a separate command
  3. Don't worry about making it easy to copy all the resolved derivation mappings needed to re-resolve and unresolved derivation

Yes, we don't want users to have to actually do this tedious stuff in practice, but I think there are good building blocks to have in the end too, and by starting with that, we avoid anything with multiple competing designs.

@thufschmitt
Copy link
Member Author

@Ericson2314

I think the most minimal thing to do is:

1. Just to "atomic" resolved dependency mappings for now:

What do you mean exactly by “atomic”? The way I understand it is “without bothering with the dependencies”, but that's already what this PR does.

2. Allow them to be individually copied between stores with a separate command

That's an interesting point. I overloaded nix-copy for that because

  1. It's not a lot more complicated than having a different command
  2. It's actually quite easy for the tests
  3. It's quite likely that this is the (low-level) interface we'll want eventually, so it makes sense to do it right now rather than adding a new command that we'll have to deprecate (or remove, after all it's just experimental) in a few months.

But there's arguments to be made in the other directions, so…

3. Don't worry about making it easy to copy all the resolved derivation mappings needed to re-resolve and unresolved derivation

I feel like it would make things noticeably more complex. Even if just for the tests, it's quite nice to be able to expect nix copy --to $REMOTE ./foo.nix && clearStore && nix copy --from $REMOTE ./foo.nix to work, that gives a nice and easy to check property (I'm secretly dreaming of adding property-based testing to Nix)

Another idea i was thinking was leaving the current DerivationOutputs as-is, and having a new table for CA derv mappings. Queries can union the two tables together, and this means we can continue having the foreign key and simple cleanup for input-addressed derivations, and we separate the "normative" CA mappings from the purely-for-caching IA ones. Thoughts?

As a matter of fact I've been discussing with Eelco yesterday and we reached the same conclusion (though we might not need to do any union as it's totally fine for the two tables to overlap). The nice side of this is that it makes it easier to iterate on the ca-specific schema without touching the rest

@thufschmitt
Copy link
Member Author

thufschmitt commented Nov 27, 2020

I've had a chat with @edolstra yesterday, which led to a few suggestions for this:

  1. We could (as @Ericson2314 suggested above) add a separate table in the db to track the output mappings of the CA derivations (to make it easier to evolve that schema independently of the rest). That way we can even have a second version scheme for the ca-specific db aspects, allowing people to easily go back-and-forth between ca and mainline Nix
  2. queryDrvOutputInfo and friends should be
    1. Async (like queryPathInfo)
    2. Cached in memory (and possibly on-disk, though that's less urgent)

@thufschmitt thufschmitt force-pushed the minimal-drv-outputs-copy branch 3 times, most recently from 7a0ba80 to 1acb0cb Compare November 27, 2020 15:41
@Ericson2314
Copy link
Member

Glad we're all on the same page for multiple tables!

What do you mean exactly by “atomic”? The way I understand it is “without bothering with the dependencies”, but that's already what this PR does.

Right, but my big schick is that that it's sketchy to have mappings of unresolved derivations without dependencies.

What do you mean exactly by “atomic”? The way I understand it is “without bothering with the dependencies”, but that's already what this PR does.

Well the main thing is if we don't have dependencies and restrict ourselves to resolved derivations, we would have a hard time overloading it.

(I also think it is good to allow just copying outputs, derivations, and mappings independent, but this is less important; I'm not disagreeing that copping the data and mappings is a good default once we can support it.)

Even if just for the tests

It does make testing harder, because it's such a cumberson low-level interface. I think that's just the (IMO worth it) price of making a minimal thing with as uncontroversial design as we can.


Basically taking a step back, because I'm very keen on keeping the full provenance of resolved derivation mappings around, I think it's just going to be more work before we get to good usable interfaces. But I'm OK with that.

This requires adding `nix` to its own closure which is a bit unfortunate,
but as it is optional (the test will be disabled if `OUTER_NIX` is unset) it
shouldn't be too much of an issue.

(Ideally this should go in another derivation so that we can build Nix and run
the test independently, but as the tests are running in the same derivation
as the build it's a bit complicated to do so).
For each “known” derivation output, store:
- its output id
- its output path

This comes with a set of needed changes:

- New `drv-output-info` module declaring the types needed for describing
  these mappings
- New `Store::registerDrvOutput` method registering all the needed informations
  about a derivation output (also replaces `LocalStore::linkDeriverToPath`)
- new `Store::queryDrvOutputInfo` method to retrieve the informations for a
  derivations

This introcudes some redundancy on the remote-store side between
`wopQueryDerivationOutputMap` and `wopQueryDrvOutputInfo`.
However we might need to keep both (regardless of backwards compat)
because we sometimes need to get some infos for all the outputs of a
derivation (where `wopQueryDerivationOutputMap` is handy), but all the
stores can't implement it − because listing all the outputs of a
derivation isn't really possible for binary caches where the server
doesn't allow to list a directory.
Copying a derivation output is equivalent to copy the output path,
except that it will also link the output to the path (with
`linkDeriverToPath` on the remote store)
Commands that accept store paths or installables as input can now take a
derivation path with outputs. Like when the input is a nix expr, the
derivation will be realised if needed.
Generalises `StorePathsCommand` to allow manipulating `Buildables`
rather than just store paths.
Mostly dummy atm as it just computes the closure of the output path, but
might be extended later
Add a new table for tracking the derivation output mappings.

We used to hijack the `DerivationOutputs` table for that, but (despite its
name), it isn't a really good fit:

- Its entries depend on the drv being a valid path, making it play badly with
  garbage collection and preventing us to copy a drv output without copying
  the whole drv closure too;
- It dosen't guaranty that the output path exists;

By using a different table, we can experiment with a different schema better
suited for tracking the output mappings of CA derivations.
(incidentally, this also fixes NixOS#4138)
That way we can `nix copy --from` a store that doesn't have the derivation but
just its outputs
That way we don't need to resolve it again to know its output paths.
In particular, this means that we don't need to keep its whole build-time
closure
`buildPaths` can be called even for stores where it's not defined in case it's
bound to be a no-op.
The “no-op detection” mechanism was only detecting the case wher `buildPaths`
was called on a set of (non-drv) paths that were already present on the store.

This commit extends this mechanism to also detect the case where `buildPaths`
is called on a set of derivation outputs which are already built on the store.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ca-derivations Derivations with content addressed outputs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants