Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new version of LockfileMetadata to support checking for identically specified requirements #12782

Conversation

chrisjrn
Copy link
Contributor

@chrisjrn chrisjrn commented Sep 8, 2021

This factors out versioning capabilities into LockfileMetadata so that it's possible to easily change the set of validation requirements for a lockfile. V1 represents the original lockfile version (where constraints and an invalidation digest are set). V2 allows for the old behaviour, but also allows specifying the input requirements for a lockfile, and verifying that the user requirements are a non-strict subset of identical to the input requirements.

We decided to replace the requirements hex digest with requirements strings to allow us to test whether the lockfile produces a compatible environment rather than an identical environment, which will be useful for user lockfile support when we eventually enable that. In the meantime, tool lockfiles still test for an identical environment, but the extra data in the lockfile will allow for more fine-grained error messages in a future version.

The implementations of _from_json_dict and is_valid_for are a bit repetitive; I can factor out the common behaviour with a bit of work, but given we expect to delete the V1 implementation before too long.

Currently this does not add the platforms capability to the header, but now it's going to be easy enough to bump the version number if we want to add more fields.

Closes #12610

metadata = LockfileMetadata.new(
req.requirements_hex_digest,
req.interpreter_constraints,
{Requirement.parse(i) for i in req.requirements},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not absolutely convinced that this is the best place to put this parsing logic. It may be worth moving it over to new at some point, but for now, this works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can fail and lead to confusing error messages. We probably want to catch those and enrich the error, but not clear to me the best way to do that. Mind adding a TODO(#12314) please? (No need to fix now)

Copy link
Contributor Author

@chrisjrn chrisjrn Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an error a TODO in here. As mentioned on the other location where I'm using .parse, the exception that gets raised doesn't play nice with mypy, and that's going to get in the way of improving this. I'm going to return to this in the future.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Chris.

As discussed in #12683, I originally thought we might not want to put version numbers in the lockfile metadata, even it makes total sense to do internally. My main thought is it might suggest we support more versions than the current and latest - but I realize now that's not a good reason. It's valid for Pants 2.11 to only support lockfile versions 3 and 4, for example. And I think that is more clear. So, I think I now agree with your decision to surface version.

If we do add version, I think we want to update 2.7 to set version = 1 before the stable release goes out. We still have time to make that breaking change. Could you please split out this PR into that prework—i.e. not do the "compatible" requirements change yet—so we can cherry-pick?

Christopher Neugebauer added 3 commits September 8, 2021 12:57
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Christopher Neugebauer added 7 commits September 8, 2021 13:50
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
This permits creating new versions of `LockfileMetadata` with different data requirements with minimal changes to the external API. Generally speaking, end-users should interact with `LockfileMetadata` and not the concrete subclasses.

[ci skip-build-wheels]

[ci skip-rust]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@chrisjrn chrisjrn force-pushed the chrisjrn/12610-compatible-requirements-only branch from 93bbf5b to 1ea583a Compare September 8, 2021 22:54
@chrisjrn
Copy link
Contributor Author

chrisjrn commented Sep 8, 2021

Updated description to refer to the fact that the original version is now called "V1" rather than "V0", and the former V1 is now V2.

Christopher Neugebauer added 5 commits September 8, 2021 16:55
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
…ible-requirements-only

[ci skip-rust]

[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@@ -612,6 +612,7 @@ def _validate_metadata(
requirements.lockfile_hex_digest,
request.interpreter_constraints,
python_setup.interpreter_universe,
{}, # TODO(chrisjrn): include requirements strings here
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment I'm not passing the actual requirements strings in, because we're only validating for tool lockfiles. This will change once we start using user lockfiles (which themselves send down requirements strings)

@Eric-Arellano
Copy link
Contributor

Thanks Chris! Before I look more closely at the code, I wanted to revisit the metadata format discussion. We've talked in the past about how the digest is only useful for tool lockfiles, whereas the list of requirements is only necessary for user lockfiles (but maybe still insightful for tool lockfiles?)

Now that you've landed the version argument, we also have a lot more flexibility. Previously, I suggested that we use the metadata headers as a proxy for the metadata version. That's no longer necessary.

This means that we could do a scheme like:

// Tool lockfile
{
  "invalidation_hash": "abc",
  "interpreter_constraints": ["==3.9.*"],
  "version": 2
}

// User lockfile
{
  "requirements": ["foo"],
  "interpreter_constraints": ["==3.9.*"],
  "version": 2
}

Or, to be more verbose:

// Tool lockfile
{
  "invalidation_hash": "abc",
  "requirements": null,
  "interpreter_constraints": ["==3.9.*"],
  "version": 2
}

// User lockfile
{
  "invalidation_hash": null,
  "requirements": ["foo"],
  "interpreter_constraints": ["==3.9.*"],
  "version": 2
}

Or, your format where you set all the metadata even if it's not applicable.

With this discussion, it's worth considering how we'll handle platforms, which is mutually exclusive with interpreter constraints.

What are your thoughts on the format? I personally think I prefer the simplest: only put metadata for what the caller actually needs. Keep our Pants boilerplate to a minimum.

…eing more verbose

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@chrisjrn
Copy link
Contributor Author

chrisjrn commented Sep 9, 2021

My previous suggestion on the original ticket was to include a purpose field as well as a version field, which IMO will make it easier for us to document what the correct set of fields are, once we have a docs page for users about how to generate their lockfiles. I might be wrong about that though.

I think there are two pertinent questions here:

  1. Do we always want to store generate-time requirements, even if we're using digests? I think the answer here is yes, because if we have access to the requirements strings for a given target, then we can give a more meaningful error message (by stating what the diverging requirements are).
  2. What do we do with platforms? If we're certain that these are mutually exclusive with interpreter constraints then it makes sense to have only one or the other. Is there a situation where we want to accept a platform and then exclude specific interpreter constraints? I could think of having e.g. platform macosx_11_0_arm64-cp-39-cp39 but needing to exclude a specific version (e.g. CPython==3.9.4) because there's a severe bug in it. Wouldn't making platforms and ICs mutually exclusive prevent expressing that description?

@Eric-Arellano
Copy link
Contributor

I think the answer here is yes, because if we have access to the requirements strings for a given target, then we can give a more meaningful error message (by stating what the diverging requirements are).

Makes sense. At that point, I suggest getting rid of the digest. It's no longer necessary - you can do exact string comparisons for tool lockfiles, whereas compatible check for user lockfiles. The digest maybe improves perf, but feels like a micro-optimization and we should go with a simpler lockfile format. Wdyt?

@chrisjrn
Copy link
Contributor Author

chrisjrn commented Sep 9, 2021

That's a fair point. If we do go with that approach, then we're going to need to add a new field to specify whether we want an exact requirements match, or whether we want to allow only compatible requirements. I considered whether purpose might be enough for us here, but it's not clear that all tools are going to want exact requirements into the future (e.g. flake8 may like extra user requirements set).

Having either the digest or the requirements strings makes it clear which behaviour is going to be applied; having only strings is going to require that behaviour flag on the side.

Also, did you have any thoughts about platforms vs ICs?

Christopher Neugebauer added 3 commits September 9, 2021 10:38
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
…ents strings instead

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@chrisjrn
Copy link
Contributor Author

For paper trail purposes -- we've decided to remove the requirements_invalidation_digest, and we're just going to with a list of requirement string literals. Eric was satisfied that we could determine the matching behaviour by the context in which the lockfile is consumed, either way, we don't need to support user lockfiles until V3 any way.

Christopher Neugebauer added 2 commits September 10, 2021 08:28
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
…ible-requirements-only

# Conflicts:
#	3rdparty/python/lockfiles/user_reqs.txt
@@ -160,6 +161,7 @@ def pex_requirements(
importlib.resources.read_binary(*self.default_lockfile_resource),
),
lockfile_hex_digest=hex_digest,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n.b. We still calculate the hex digest while V1 lockfiles are still supported.

@chrisjrn chrisjrn marked this pull request as draft September 10, 2021 15:51
Christopher Neugebauer added 2 commits September 10, 2021 08:52
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
…r requirements mismatches

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@chrisjrn chrisjrn marked this pull request as ready for review September 10, 2021 16:38
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@chrisjrn chrisjrn changed the title Add new version of LockfileMetadata to support checking for compatible requirements Add new version of LockfileMetadata to support checking for ~compatible~ identically specified requirements Sep 10, 2021
@chrisjrn chrisjrn changed the title Add new version of LockfileMetadata to support checking for ~compatible~ identically specified requirements Add new version of LockfileMetadata to support checking for identically specified requirements Sep 10, 2021
@chrisjrn
Copy link
Contributor Author

@Eric-Arellano This implements the lockfile metadata per our discussion yesterday and is probably ready for review

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Could you please update the PR description to explain how we reached this decision about removing the hash and using a list of requirements?

@@ -0,0 +1,142 @@
# Copyright 2021 Pants project contributors (see CONTRIBUTORS.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I'm confused still - what is this being used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could swear I deleted this. FIXING.

src/python/pants/backend/python/util_rules/pex.py Outdated Show resolved Hide resolved
@@ -527,10 +530,17 @@ def _validate_metadata(
python_setup: PythonSetup,
) -> None:

req_strings = (
{Requirement.parse(i) for i in requirements.req_strings}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto that this can fail, and please adding a TODO to improve the error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error that comes up here is pkg_resources.extern.packaging.requirements.InvalidRequirement, which mypy balks at:

src/python/pants/backend/python/util_rules/pex.py:19:1: error: Skipping
analyzing "pkg_resources.extern.packaging.requirements": found module but no
type hints or library stubs  [import]

I'll add the TODO, but it may end up taking some work to actually improve this error message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we handle it like this:

def parse_requirements_file(content: str, *, rel_path: str) -> Iterator[Requirement]:
"""Parse all `Requirement` objects from a requirements.txt-style file.
This will safely ignore any options starting with `--` and will ignore comments. Any pip-style
VCS requirements will fail, with a helpful error message describing how to use PEP 440.
"""
for i, line in enumerate(content.splitlines()):
line = line.strip()
if not line or line.startswith("#") or line.startswith("-"):
continue
try:
yield Requirement.parse(line)
except Exception as e:
raise ValueError(
format_invalid_requirement_string_error(
line, e, description_of_origin=f"{rel_path} at line {i + 1}"
)
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so we're just handling the generic exception? That makes sense.

This static method should be used in place of the `LockfileMetadata` constructor.
"""

return LockfileMetadataV2(valid_for_interpreter_constraints, requirements)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused on this part. Why is it hardcoded to V2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new should produce the current version of the metadata header, but this means that we only have to update which constructor is called once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: This keeps the API to places that need to create a new LockfileMetadata object reasonably consistent. We don't want to support creating older versions, but we do want to support deserialising them until the deprecation process is complete. Having the new method in place means the factory method has a consistent name, and consumers will get directed to new parameters that need to be added by the type checker. I suspect it's a lower support burden than trying to keep track of which version of the class is the current one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Okay that last paragraph makes way more sense to me. Thanks for explaining. Could you please add that explanation to the docstring: when writing lockfiles, we always use the latest metadata version, even if we support reading older versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I guess you do already say "the most recent version". I'm trying to figure out why this was still confusing me. Maybeee something like this:

Create an instance the most up-to-date subclass of LockfileMetadata.

Note that we always use the most recent lockfile version when writing lockfiles, even if we support reading older versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    This static method should be used in place of the `LockfileMetadata` constructor. This 
    gives calling sites a predictable method to call to construct a new `LockfileMetadata` for
    writing, while still allowing us to support _reading_ older, deprecated metadata versions.

Christopher Neugebauer added 3 commits September 13, 2021 08:40
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
…quirements`

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
…ible-requirements-only

# Conflicts:
#	3rdparty/python/lockfiles/user_reqs.txt

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Christopher Neugebauer added 2 commits September 13, 2021 09:12
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@@ -4,10 +4,14 @@
#
# --- BEGIN PANTS LOCKFILE METADATA: DO NOT EDIT OR REMOVE ---
# {
# "version": 1,
# "requirements_invalidation_digest": "815457a1baf6226c993e5468ccdf64c69fe7214d3d9237911c762733e0130526",
# "version": 2,
# "valid_for_interpreter_constraints": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name has grown on me btw :) especially when paired with generated_with_requirements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason why naming things is a hard problem in computer science is because people insist on being concise 😆

This static method should be used in place of the `LockfileMetadata` constructor.
"""

return LockfileMetadataV2(valid_for_interpreter_constraints, requirements)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I guess you do already say "the most recent version". I'm trying to figure out why this was still confusing me. Maybeee something like this:

Create an instance the most up-to-date subclass of LockfileMetadata.

Note that we always use the most recent lockfile version when writing lockfiles, even if we support reading older versions.

) -> _T:
"""Construct a `LockfileMetadata` subclass from the supplied JSON dict.

*** Not implemented. Subclasses should override. ***
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, imo better is to set @abstractmethod and subclass ABC? MyPy doesn't like the interaction of ABC and dataclasses - if it complains, use # type: ignore[error-code].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's complaining, and it breaks the type checking in subclasses too. That's a bit too much complaining for me to be comfortable with suppressing.

def _header_dict(self) -> dict[Any, Any]:
"""Produce a dictionary to be serialized into the lockfile header.

Subclasses should call `super` and update the resulting dictionary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Subclasses should call `super` and update the resulting dictionary
Subclasses should call `super` and update the resulting dictionary.

@@ -0,0 +1,142 @@
# Copyright 2021 Pants project contributors (see CONTRIBUTORS.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I'm confused still - what is this being used for?

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@Eric-Arellano Eric-Arellano merged commit e1dd981 into pantsbuild:main Sep 14, 2021
@Eric-Arellano
Copy link
Contributor

Thanks! Note that we will not cherry-pick this to Pants 2.7 @chrisjrn

@chrisjrn chrisjrn deleted the chrisjrn/12610-compatible-requirements-only branch September 14, 2021 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lockfiles: check if context's requirements are compatible with the lockfile's
2 participants