Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for Public/Private Dependencies #1977

Merged
merged 13 commits into from
Sep 17, 2017
199 changes: 199 additions & 0 deletions text/0000-public-private-dependencies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
- Feature Name: `public_private_dependencies`
- Start Date: 2017-04-03
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

Introduce a public/private distinction to crate dependencies.

# Motivation
[motivation]: #motivation

The crates ecosystem has greatly expanded since Rust 1.0 and with that a few patterns for
dependencies have evolved that challenge the currently existing dependency declaration
system in cargo and rust. The most common problem is that a crate `A` depends on
another crate `B` but some of the types from crate `B` are exposed through the API in
crate `A`. This causes problems in practice if that dependency `B` is also used by the
user's code itself which often leaves users in less than ideal situations where either
their code refuses to compile because different versions of those libraries are requested
or where compiler messages are less than clear.

The introduction of an explicit distinction between public and private dependencies can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there concrete usecases where these annotations would help? Those would be worth mentioning here (primarily to pitch this RFC to unconvinced readers)

At least when I started writing Rust, I upgraded dependencies in a PATCH release without thinking about the implications for reexported APIs. This change could make crate authors think about this from the start. The last bulletpoint in "Unresolved Question" also hints at this.

solve some of these issues and also let us lift some restrictions that should make some
code compile that previously was prevented from compiling by restrictions in cargo.

**Q: What is a public dependency?**<br>
Copy link
Contributor

@Ericson2314 Ericson2314 Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this language. If you depend on items from a dependency (e.g. types) even if you don't reexport them, you still need a public dep. Also----and this is crucial---if you don't reexport anything, it's not a breaking change to change your public deps if you're own interface doesn't change.

Consider a situation where a depends (privately or publicly) on b which publicly depends on c. To use any items from b with interfaces relying on c, a should also need to depend on c (and constrain that dep to be unified with b's). But now the extra dep of a on c forces the "effective interface: of b with given c to not break. Conversely, if a doesn't do that, then we know b's use of c is irrelevant so the relevant parts of b also won't break.

This is an instance of the general principle that the version of a crate is like a fallback / catchall that describes the remnants of the interface not already accounted for in the crate metadata, and constrainable with a dependency. Public deps are accounted for and can be constrained downstream, and thus need not cause interface breakage in and of themselves.

This may sounds like needlessly fancy reasoning, but this is actually really important practically. Breaking changes already are difficult growing pain with large ecosystems, and would become impossibly so if on every upstream breaking change, all publicly depending downstream was forced to issue their own breaking change even if all they did is bump a dependency. Conversely, tiny breaking changes are far easier to deal with if every publicly depending downstream libraries that aren't affected (i.e. most) just need to make a new release with a relaxed upper bound on their public dep).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you depend on items from a dependency (e.g. types) even if you don't reexport them, you still need a public dep.

Can you clarify what you mean by that? If you do not re-export them there is no need for a dependency to be public.

Copy link
Contributor

@Ericson2314 Ericson2314 Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we are using "re-export" differently? To me, reexport means you include the *definition" in your interface, i.e. with a pub use, as opposely merely using the dependency's items. Reexporting is bad thing to do because then the public dep really does become part of your API in ways downstream may not control --- merely bumping a version bound over a breaking change may indeed be a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. If you accept a type from a crate as parameter it means it's re-exported in your API.

Copy link
Contributor

@Ericson2314 Ericson2314 Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh I am not sure which you are agreeing with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm just to dumb to understand the comment but I do not quite follow what the difference here would be. Is the idea that if you use a subset of b that does not c you do not want a public dependency to c? That should be covered anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mitsuhiko With the terminology change, the only remaining issue is the sentence: "Effectively the idea is that if your own library bumps a public dependency it means that it's a breaking change of your own crate."

I realize now this sentence is doesn't actually effect what this RFC specifies, so as far as the detailed design goes we're all good---feel free to just cut that sentance and ignore the rest of this post :).

But for the record, actually a changing of a dep should never be a breaking change. In the private dep case, downstream cannot tell at all, so we're good. In the public dep case, the solver will simply not use the new crate if the public dep cannot be unified with other public deps.

I wrote before

This is an instance of the general principle that the version of a crate is like a fallback / catchall that describes the remnants of the interface not already accounted for in the crate metadata, and constrainable with a dependency.

I think I have a better way of describing things. The general maxim for compat is "if I publish this crate, in all solutions where this this crate could be substituted for another, things must work in both case". If there is a solution where the substitution breaks, we could try to fix that case, or we could try to rule it out.

Since public deps may be exposed but also must exist once, any bounds adjustment is OK because the version unification would rule out laxer bounds switching to another version the other crates in the plan cannot cope with.

Mathematically, one can view a crate wrt compatability as Map<PubDeps, Map<Name, Item> (maps are partial functions). Just as adding definitions to a crate preserves compatibility, so relaxing bounds does too---both just extend domains of the partial functions. Whereas removing definitions does not preserves compatibility, tightening bounds is fine because solver-time failures are fine (the solver just moves on) whereas build-time errors aren't.

Copy link
Contributor

@Ericson2314 Ericson2314 Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm @alexcrichton in #1977 (comment) you wrote

When libc reaches 1.0 then many crates will need to bump their major version as libc is a public dependency.

But per this thread it would only be a minor bump. Using this side-thread as it's not really a core part of the RFC but do want to get this clarified.

It's possible we'd need to tweak some things like method resolution for what I said to actually be true. (e.g. do the inherent methods of a reexported type "vagabound" with the type and are usuable via the export?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Alex meant all the crates with a public dependency (direct or transitive) on libc would need a major version bump when libc goes 1.0, but all the crates with a private dependency (direct or transitive) on libc would not. That's why the next sentence of that comment is:

This RFC will allow us to precisely identify what set of crates need to be bumped, transitively!

Where I believe "need to be bumped" refers only to major version bumps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'm trying to argue that, absent reexports (as opposed to mere exposing of departure) changing dependencies in any way is not a breaking change because every plan the build could go wrong the solver will disallow anyways.

A: a dependency is public if some of the types or trait of that dependency is itself
exported through the main crate. The most common places where this happens is obviously
return values and function parameter but obviously the same applies to trait implementations
and many other things. Because public can be tricky to determine for a user this RFC
proposes to extend the compiler infrastructure to detect the concept of "public dependency".
This will help the user understanding this concept and avoid making mistakes in
the `Cargo.toml`

Effectively the idea is that if your own library bumps a public dependency it means that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this only be true for bumping the major version of a public dependency?

For example, say crate A version 1.0.0 depends on crate B version 2.0.0. Can A upgrade to B version 2.0.1 in a patch release (e.g. A version 1.0.1)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presuming A's dependency on B is public, That depends on the version requirements that all packages which depend on A place on B, and the version requirements that all packages place on A.

As there are non-public crates (iow: ones that are not visible in the crates.io registry), we're limited in the assumptions that can be made about what version requirements are made.

it's a breaking change of your *own* crate.

**Q: What is a private dependency?**<br>
A: On the other hand a private dependency is contained within your crate and effectively
invisible for users of your crate. As a result private dependencies can be freely
duplicated. This distinction will also make it possible to relax some restrictions that
currently exist in Cargo which sometimes prevent crates from compiling.

**Q: Can public become private later?**<br>
A: Public dependencies are public within a reachable subgraph but can become private if a
crate stops exposing a public dependency. For instance it is very possible to have a
family of crates that all depend on a utility crate that provides common types which is
a public dependency for all of them. However your own crate only ends up being a user of
this utility crate but none of its types or traits become part of your own API then this
utility crate dependency is marked private.

**Q: Where is public / private defined?**<br>
Dependencies are private by default and are made public through a `public` flag in the
dependency in the `Cargo.toml` file. This also means that crates created before the
implementation of this RFC will have all their dependencies private.

**Q: How is backwards compatibility handled?**<br>
A: It will continue to be permissible to "leak" dependencies and there are even some
use cases of this, however the compiler or cargo will emit warnings if private
dependencies become part of the public API. Later it might even become invalid to
publish new crates without explicitly silencing these warnings or marking the
dependencies as public.

**Q: Can I export a type from a private dependency as my own?**<br>
A: For now it will not be strictly permissible to privately depend on a crate and export
a type from there as your own. The reason for this is that at the moment it is not
possible to force this type to be distinct. This means that users of the crate might
accidentally start depending on that type to be compatible if the user starts to depend
on the crate that actually implements that type. The limitations from the previous
answer apply (eg: you can currently overrule the restrictions).

**Q: How does semver and depenencies interact?**<br>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/depenencies/dependencies/

A: It is already the case that changing your own dependencies would require a semver
bumb for your own library because your API contract to the outside world changes. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bumb/bump/

RFC however makes it possible to only have this requirement for public dependencies and
would permit cargo to prevent new crate releases with semver violations.

# Detailed design
[design]: #detailed-design

There are a few areas that require to be changed for this RFC:

* The compiler needs to be extended to understand when crate dependencies are
considered a public dependency
* The `Cargo.toml` manifest needs to be extended to support declaring public
dependencies
* The cargo publish process needs to be changed to warn (or prevent) the publishing
of crates that have undeclared public dependencies
* crates.io should show public dependencies more prominently than private ones.

## Compiler Changes

The main change to the compiler will be to accept a new parameter that cargo
supplies which is a list of public dependencies. The compiler then emits
warnings if it encounters private dependencies leaking to the public API of a
crate. `cargo publish` might change this warning into an error in its lint
step.

Additionally later on the warning can turn into a hard error in general.

In some situations it can be necessary to allow private dependencies to become
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rather not go into too much detail about this as this is a big unknown. I think it's largely irrelevant for this RFC anyways because we initially start out with just compiler warnings. Where to go from there will be seen by how much damage this does.

part of the public API. In that case one can permit this with
`#[allow(external_private_dependency)]`. This is particularly useful when
paired with `#[doc(hidden)]` and other already existing hacks.

This most likely will also be necessary for the more complex relationship of
`libcore` and `libstd` in Rust itself.

## Changes to `Cargo.toml`

The `Cargo.toml` file will be amended to support the new `public` parameter on
dependencies. Old cargo versions will emit a warning when this key is encountered
but otherwise continue. Since the default for a dependency to be private only
public ones will need to be tagged which should be the minority.

Example dependency:

```toml
[dependencies]
url = { version = "1.4.0", public = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to be able to express a private dependency which is locked with another dep's public dependency, see my running a -> c; a ->b; b -> c example from above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ericson2314 is this really necessary? Why can you not just have a pin on the major version? eg: url = { version = "1", public = true }.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mitsuhiko actually it might not be, but we should make not of this in the semantics: just as your public deps must be unique version per name, so your private dependencies and their public dependencies must be unique version per name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is one place where @bbatha's concern (link-only-once) diverges from what's described by (and very useful for) this RFC, and the two probably need handled distinctly.

In particular, in the absence of link-only-once dependencies, a private dependency hides all transitive dependencies behind it - and this is hugely useful in some cases.

However, link-only-once crates must not be hidden in this manner.

As a result, I think these might be better served by orthogonal mechanisms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I am not entirely sure how to best deal with link only dependencies. They are definitely iffy and out of the scope as far as I'm concerned. I will add a section about those to the RFC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it's out of scope. Simply requiring that all link-once deps be reachable through chains of public dependencies is sound, but way overly restrictive.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about cratename = { from_dependencies = true } (or similar)? You're relying on deps to not break API with their deps being bumped, but that's true anyways already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why from_dependencies is a better name than public, or how the version could be inferred. Could you give an example?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be a replacement for version = "…" and an indication that cargo should figure out the version based on public dependencies of dependencies instead of trying to give a constraint directly. Basically "I'm fine with whatever my dependencies need because that's how I use this crate".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, as far as I understand this could be a different RFC independent from this one though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think people should just pin less strictly instead. It's fine to just pin for 1 for instance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if I use a crate only because dependencies use it, why not have a way to say "I use it only because dependencies make me use it; let them tell me what version I need"?

In any case, probably a topic for a separate RFC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if I use a crate only because dependencies use it

If you only use it because dependencies use it and it's not part of their APIs the dependency is invisible to you. If it's a visible (public) dependency then the version of the dependency is relevant to you.

```

## Changes to Cargo Publishing

When a new crate version is published Cargo will warn about types and traits that
the compiler determined to be public but did not come from a public dependency. For
now it should be possible to publish anyways but in some period in the future it will
be necessary to explicitly mark all public dependencies as such or explicitly
mark them with `#[allow(external_private_dependency)]`.
Copy link

@eternaleye eternaleye Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're missing something - specifically, I think this may require changes to the index format in order to allow Cargo to make dependency-resolution decisions based on whether dependencies are public.

Without changes to the index, either Cargo will still need to perform conservative "everything is public" resolution (and the situation does not actually improve), or it may perform "optimistic" resolution (as if everything is private) and decide on a resolution that it cannot tell is invalid until after it's downloaded the crate archives (which is wasteful, and offers no clean recovery path).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eternaleye why would cargo have to assume a public dependency?

Copy link
Contributor

@Ericson2314 Ericson2314 Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mitsuhiko this is about whether the solver guess solutions which it then checks against dependency privacy, or whether it has access to privacy while solving---the latter is probably far more efficient as partial solutions can be ruled out earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ericson2314 cargo knows what dependencies are public from the definition in the Cargo.toml. The only thing the #[allow(...)] does is silence the warning (later error). I don't believe it needs to impact the dependency resolution algorithm but I might be missing something here.

Copy link

@eternaleye eternaleye Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mitsuhiko: The problem here is with the privacy/publicity of transitive dependencies. Imagine the following dependency graph:

  • A
    • B
      • D
      • E
    • C
      • E

All dependencies are public, and furthermore let us presume that these "E" dependencies are ones that, if one of them were private, would fall into resolutions that "previously [were] prevented from compiling by restrictions in cargo."

I clone A, and run cargo build. Cargo's dependency resolution does not have access to the Cargo.toml for anything other than A. For everything else, it accesses the index. As a result, in order to know whether B and C depend on E privately or publicly, the index must carry this information.

If it does not, Cargo has a few options:

  1. Presume they are all public, and reject the resolution. No improvement over today.
  2. Presume they are all public, and if the resolution would be rejected, fetch all potentially eligible crates in order to re-perform resolution using their Cargo.toml files. Incredibly network-intensive, may fail anyway, significant new logic.
  3. Presume they are all private, begin fetching the selected crates (which may have version skews that are not permissible!), and then discover then that the resolution is irresolvable. Network-intensive, offers no usable error handling path where some versions of B (or C) make E private and others make E public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eternaleye in case you are referring to the crates.io index, yes that information (public true/false) would be contained in the "deps" section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I for some reason thought you meant that #[allow(external_private_dependency)] needs to be reflected in the index.


# How We Teach This
[how-we-teach-this]: #how-we-teach-this

From the user's perspective the initial scope of the RFC will be quite transparent
but it will definitely show up for users as a question of what the new restrictions
mean. In particular a common way to leak out types from APIs that most crates do
is error handling. Quite frequently it happens that users wrap errors from other
libraries in their own types. It might make sense to identify common cases of where
type leakage happens and provide hints in the lint about how to deal with it.

Cases that I anticipate that should be explained separately:

* type leakage through errors. This should be easy to spot for a lint because the
wrapper type will implement `std::error::Error`. The recommendation should most
likely be to encourage containing the internal error.
* traits from other crates. In particular serde and some other common crates will
show up frequently and it might make sense to separately explain types and traits.
* type leakage through derive. Users might not be aware they have a dependency to
a type when they derive a trait (think `serde_derive`). The lint might want to
call this out separately.

The feature will be called `public_private_dependencies` and it comes with one
lint flag called `external_private_dependency`. For all intents and purposes this
should be the extent of the new terms introduced in the beginning. This RFC however
lays the groundwork for later providing aliasing so that a private dependency could
be forcefully re-exported as own types. As such it might make sense to already
consider what this will be referred to.

It is assumed that this feature will eventually become quite popular due to patterns
that already exist in the crate ecosystem but it's likely that it will evoke some
negative opinions initially. As such it would be a good idea to make a run with
cargobomb/crater to see what the actual impact of the new linter warnings is and
how far we are off to making them errors.

crates.io should most likely be updated to render public and private dependencies
separately.

# Drawbacks
[drawbacks]: #drawbacks

I believe that there are no drawbacks if implemented well (this assumes good
linters and error messages).

# Alternatives
[alternatives]: #alternatives

For me the biggest alternative to this RFC would be a variation of it where type
and trait aliasing becomes immediately part of it. This would meant that a crate
can have a private dependency and re-export it as its own type, hiding where it
came from originally. This would most likely be easier to teach users and can get
rid of a few "cul-de-sac" situations where users can end up in and their only way
out is to introduce a public dependency for now. The assumption is that if trait
and type aliasing is available the `external_public_dependency` would not need to
exist.

# Unresolved questions
[unresolved]: #unresolved-questions

There are a few open questions about how to best hook into the compiler and cargo
infrastructure:

* is passing in the list of public dependencies the correct way to get around it?
If yes, what is the parameter supposed to be called.
* what is the impact of this change going to be. This most likely can be answered
running cargobomb/crater.
* since changing public dependency pins/ranges requires a change in semver it might
be worth exploring if cargo could prevent the user in pushing up new crate
versions that violate that constraint.
Copy link
Contributor

@untitaker untitaker Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a review, but random thought: You might have a public dependency that changes its major version but doesn't change the part of the API you reexport, or perhaps it just bumps its major version too aggressively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a breaking change in your library.

Suppose you and another crate foo both depend on the same version of bar, and both re-export bar::Bar. I depend on both you and foo, and pass a Bar I got from you to foo in my code. This works fine, because your Bar is the same type as foo's Bar.

bar releases a breaking change, which does nothing to the definition of Bar. You update to that major version, but don't do a breaking change because you believe nothing in your code has changed.

However, now its impossible for cargo to unify your dependency on bar with foo's dependency on bar, because they're different major versions. This means that you::Bar and foo::Bar are no longer the same type. If I upgrade my dependency on you, I will get a breakage, so you have made a breaking change.

Copy link
Contributor

@untitaker untitaker Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I completely overlooked that aspect. I have a few other concerns/questions about this feature (that don't really affect this PR), but let's cut this discussion short to not clutter this PR.

Copy link

@mathstuf mathstuf Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you relax the range on bar to include the new version rather than just bumping to only using the latest version. What kind of bump does that mandate?

ISTR seeing that it would be a major change due to cargo always choosing at the high end of the version constraint range. Would it be worth an RFC to cargo to have it see dependencies of >= 1.1, < 3 and ^1 in different crates to, instead of deciding on two copies, one of version 2.x and another of, say, 1.5, instead see that 1.5 satisfies both and ignore the 2.x version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mathstuf yeah this is what I thought about as well but consider off-topic for this RFC