Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lockfiles: check if context's requirements are compatible with the lockfile's #12610

Closed
Eric-Arellano opened this issue Aug 19, 2021 · 15 comments · Fixed by #12782
Closed

Lockfiles: check if context's requirements are compatible with the lockfile's #12610

Eric-Arellano opened this issue Aug 19, 2021 · 15 comments · Fixed by #12782
Assignees

Comments

@Eric-Arellano
Copy link
Contributor

Currently, we expect that the consumer of a lockfile has identical input requirements to what was used to generate the lockfile. We do this by hashing the input requirements and saving that hash in the lockfile.

This works great for tool lockfiles, where the context's requirements should always be the same as what the tool lockfile was generated with. But it does not work well with user requirements, where often the context is a subset of a bigger universe of requirements. For example, a test only uses requests, which is 1 of 20 requirements in the lockfile. We should not error, so long as the version of requests is compatible with what was used to generate the lockfile.

We sort of do this right now with constraints files:

unconstrained_projects = name_req_projects - constraint_file_projects
if unconstrained_projects:
logger.warning(
f"The constraints file {python_setup.requirement_constraints} does not contain "
f"entries for the following requirements: {', '.join(unconstrained_projects)}"
)

But this compatibility check is not robust enough. It's only checking that the project_name is contained in the lockfile, whereas we need to validate that the entire requirement string is compatible. In a world of multiple user lockfiles, it will be valid to have one lockfile with Django==2 and another lockfile with Django==3. We need to make sure the whole requirement is compatible.

--

To implement, we should probably start preserving the original input requirement strings in the Pants metadata header at lockfile generation time. We can then check that the context's set of Requirement objects is a set.subset() of the lockfile's Requirement objects.

Eric-Arellano added a commit that referenced this issue Aug 20, 2021
…2611)

Prework for #12610. Using a class gives better namespacing and easier access to the `LockfileMetadata` fields via `self`.

[ci skip-rust]
[ci skip-build-wheels]
@Eric-Arellano
Copy link
Contributor Author

Counterpoint to this idea: sometimes removing a dependency does change the behavior of the build. For example, Flake8 plugins. When a user removes a Flake8 plugin from [flake8].extra_requriements, their lockfile will need to be regenerated for it to actually be removed and for Flake8 behavior to change. There, we do not want to check that the context's lockfile is compatible: we want an exact match.

For user requirements, I do think we need to use the less precise "is compatible" check. Even though it has the same risk of not handling removing dependencies, it's necessary for performance so that we don't have to compute the entire resolve's requirement strings. And there's a workaround that users can manually regenerate the lockfile, we only won't automate telling them to.

So I think the takeaway is: for tool lockfiles, stick to exact matching. For user requirements, switch to compatibility checking.

@jsirois
Copy link
Contributor

jsirois commented Aug 21, 2021

Fwiw, if tool resolves also subsetted their locks, everything would work the same and be correct in the dep remove case.

@Eric-Arellano
Copy link
Contributor Author

Ah, true. Thanks John!

Eric-Arellano added a commit that referenced this issue Aug 23, 2021
For now, we should expect that the lockfile metadata is well-formed and present. This allows us to simplify some of the code.

Prework for #12610.

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this issue Aug 23, 2021
This ensures that callers can't mess up the determinism of the lockfile's invalidation digest. It will also make #12610 more presentable for users, when we store the requirement strings in the lockfile rather than just a hash.

[ci skip-rust]
[ci skip-build-wheels]
@chrisjrn chrisjrn self-assigned this Sep 1, 2021
@chrisjrn
Copy link
Contributor

chrisjrn commented Sep 1, 2021

I am going to start approaching this work today. My current interpretation is that "compatible" requirements, for now, is that the context's requirement strings are a subset of the requirement strings specified in the lockfile.

In the future, we may be able to parse out version strings and do the requisite set math on requirements strings (which will also help simplify our interpreter constraints work), but right now that is out of scope.

I currently do not have plans to verify that the header is unmodified, but that seems like a reasonable fix too.

@Eric-Arellano
Copy link
Contributor Author

Sounds good! I agree with exact matches, ideally impervious to white space differences etc

As discussed in DM, tool lockfiles should continue using a hash and exact matches.

@chrisjrn
Copy link
Contributor

chrisjrn commented Sep 1, 2021

Here's what I intend to do then:

  • Add a new field to the metadata header called tags. tags is a dict[str,str].
  • tags is reasonably free-form, but for now will include keys purpose and version. purpose can be either tool or user, and version should be castable to int.
  • We retain our right to deprecate specific (purpose, version) pairs
  • Lockfiles without a tags field will be assumed to be version 0 purpose tool lockfiles, and will be deprecated at our earliest convinience per Lockfiles: confirm deprecation plan for lockfile header #12683

@Eric-Arellano
Copy link
Contributor Author

What's the motivation for embedding tool vs user, along with version?

I suspect an even simpler modeling is something like this:

{
    "requirements": ["foo==1.2", "bar>=1"],
    "requirementes_hash": null,
    "interpreter_constraints": ["==3.6.*"],
    "platforms": null,
}

I know you also proposed some nesting:

{
    "requirements": ["foo==1.2", "bar>=1"],
    "requirementes_hash": null,
    "env": {
        "interpreter_constraints": ["==3.6.*"],
        "platforms": null,
     },
}

@chrisjrn
Copy link
Contributor

chrisjrn commented Sep 1, 2021

The motivation for embedding tool vs user was to give us an expectation of what fields need to be present to validate the lockfile, which would be important if we add a verification field (to check that the rest of the data in the header is unmodified)

@Eric-Arellano
Copy link
Contributor Author

was to give us an expectation of what fields need to be present to validate the lockfile

That expectation comes from our code itself. If it's a tool lockfile, we need requirementes_hash else requirements. Our code knows what the lockfile refers to, no need afaict for that to be embedded in the header.

@Eric-Arellano
Copy link
Contributor Author

Eric-Arellano commented Sep 1, 2021

if we add a verification field

What do you mean? Like a schema number? Ad discussed in #12683, I think we possibly would benefit from that, but probably not necessary because of the deprecation plan to not support older schema than the prior release's schema. We can likely keep it simple and leave that off, rather than needing to add and test this new mechanism.

@chrisjrn
Copy link
Contributor

chrisjrn commented Sep 1, 2021

I meant hashing the data in the header

@Eric-Arellano
Copy link
Contributor Author

(to check that the rest of the data in the header is unmodified)

Hm, I'm not sure that is important to address right now. We already have a big warning to not change the metadata. If you do, that's at your own peril. And it's easy to fix by regenerating the lockfile.

I don't think we need an extra mechanism to validate they didn't tamper with the header.

@chrisjrn
Copy link
Contributor

chrisjrn commented Sep 1, 2021

Seems reasonable

@chrisjrn chrisjrn closed this as completed Sep 7, 2021
@Eric-Arellano
Copy link
Contributor Author

Did you mean to close this @chrisjrn? This hasn't been merged to main.

@chrisjrn
Copy link
Contributor

chrisjrn commented Sep 7, 2021

No, I must have accidentally clicked a button

@chrisjrn chrisjrn reopened this Sep 7, 2021
Eric-Arellano pushed a commit that referenced this issue Sep 14, 2021
…ally specified requirements (#12782)

This factors out versioning capabilities into `LockfileMetadata` so that it's possible to easily change the set of validation requirements for a lockfile. V1 represents the original lockfile version (where constraints and an invalidation digest are set). V2 allows for the old behaviour, but also allows specifying the input requirements for a lockfile, and verifying that the user requirements are ~a non-strict subset of~ identical to the input requirements.

We decided to replace the requirements hex digest with requirements strings to allow us to test whether the lockfile produces a _compatible_ environment rather than an _identical_ environment, which will be useful for user lockfile support when we eventually enable that. In the meantime, tool lockfiles still test for an identical environment, but the extra data in the lockfile will allow for more fine-grained error messages in a future version.

The implementations of  `_from_json_dict` and `is_valid_for` are a bit repetitive; I can factor out the common behaviour with a bit of work, but given we expect to delete the V1 implementation before too long.

Currently this _does not_ add the `platforms` capability to the header, but now it's going to be easy enough to bump the version number if we want to add more fields.

Closes #12610
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants