Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace include_special_cased_deps flag with should_traverse_deps_predicate #19272

Merged
merged 20 commits into from
Jun 13, 2023

Conversation

cognifloyd
Copy link
Member

@cognifloyd cognifloyd commented Jun 8, 2023

This PR is extracted from #19155 which provides a way to stop dependency traversal through package targets. Several other improvements will probably benefit from this predicate as well (including proposals around multiple dependencies fields and/or tagged dependencies).

Passing a predicate on Request types

This PR replaces the include_special_cased_deps flag (on DependenciesRequest, TransitiveTargetsRequest, and CoarsenedTargetsRequest) with the should_traverse_deps_predicate predicate:

class DependenciesRequest(EngineAwareParameter):
field: Dependencies
should_traverse_deps_predicate: ShouldTraverseDepsPredicate = TraverseIfDependenciesField()

class TransitiveTargetsRequest:
"""A request to get the transitive dependencies of the input roots.
Resolve the transitive targets with `await Get(TransitiveTargets,
TransitiveTargetsRequest([addr1, addr2])`.
"""
roots: Tuple[Address, ...]
should_traverse_deps_predicate: ShouldTraverseDepsPredicate

class CoarsenedTargetsRequest:
"""A request to get CoarsenedTargets for input roots."""
roots: Tuple[Address, ...]
expanded_targets: bool
should_traverse_deps_predicate: ShouldTraverseDepsPredicate

Here is the ShouldTraverseDepsPredicate base class. This is a dataclass that serves as the "closure" around the __call__ predicate function; if it needs any data, the subclass should add fields for that data:

@dataclass(frozen=True)
class ShouldTraverseDepsPredicate(metaclass=ABCMeta):
"""This callable determines whether to traverse through deps of a given Target + Field.
This is a callable dataclass instead of a function to avoid issues with hashing closures.
Only the id is used when hashing a function; any closure vars are NOT accounted for.
NB: When subclassing a dataclass (like this one), generated methods on a dataclass are NOT
inherited because the @dataclass decorator only inspects the current class before generating
a function--the decorator does NOT inspect __mro__.
"""
# This _callable field ensures that __call__ is included in the __hash__ method generated by @dataclass.
_callable: Callable[
[Any, Target, Dependencies | SpecialCasedDependencies], bool
] = dataclasses.field(init=False, repr=False)
def __post_init__(self):
object.__setattr__(self, "_callable", type(self).__call__)
@abstractmethod
def __call__(self, target: Target, field: Dependencies | SpecialCasedDependencies) -> bool:
"""If this predicate returns false, then the target's field's deps will be ignored."""

Calling the requested predicate

The resolve_dependencies rule is responsible for calling this predicate in two places.

  1. To bail out early and avoid dep hydration and inference if unneeded:

    tgt = wrapped_tgt.target
    # This predicate allows the dep graph to ignore dependencies of selected targets
    # including any explicit deps and any inferred deps.
    # For example, to avoid traversing the deps of package targets.
    if not request.should_traverse_deps_predicate(tgt, request.field):
    return Addresses([])

  2. To replace the include_special_cased_deps flag on a per-field basis. If there are no SpecialCasedDependencies fields, then the predicate won't be called.

    special_cased_fields = tuple(
    field
    for field in tgt.field_values.values()
    if isinstance(field, SpecialCasedDependencies)
    and request.should_traverse_deps_predicate(tgt, field)
    )

Replacing the include_special_cased_deps with the predicate

This predicate replaces include_special_cased_deps to minimize the API around dep calculation as @stuhood recommended in #19155 (comment):

IMO, a true benchmark for this filtering API would be that we can replace the include_special_cased_deps flag with it.

I am very much in favor of adding an API to allow for filtering like this, but I think that from a complexity-budget perspective, killing/replacing "special-cased-deps" would be a really, really good thing to accomplish here.

The default predicate, TraverseIfDependenciesField uses isinstance(field, Dependencies) which is False for any SpecialCasedDependencies subclasses:

class TraverseIfDependenciesField(ShouldTraverseDepsPredicate):
"""This is the default ShouldTraverseDepsPredicate implementation.
This skips resolving dependencies for fields (like SpecialCasedDependencies) that are not
subclasses of Dependencies.
"""
def __call__(self, target: Target, field: Dependencies | SpecialCasedDependencies) -> bool:
return isinstance(field, Dependencies)

Any thing that used include_special_cased_deps=True (the default was False) now passes in the AlwaysTraverseDeps which always returns True.

class AlwaysTraverseDeps(ShouldTraverseDepsPredicate):
"""A predicate to use when a request needs all deps.
This includes deps from fields like SpecialCasedDependencies which are ignored in most cases.
"""
def __call__(self, target: Target, field: Dependencies | SpecialCasedDependencies) -> bool:
return True

These are examples of passing in that predicate:

transitive_targets = await Get(
TransitiveTargets,
TransitiveTargetsRequest(
[root.address], should_traverse_deps_predicate=AlwaysTraverseDeps()
),
)
adjacent_targets_per_target = await MultiGet(
Get(
Targets,
DependenciesRequest(
tgt.get(Dependencies), should_traverse_deps_predicate=AlwaysTraverseDeps()
),
)

Comment on lines +1210 to +1222
async def convert_dependencies_request_to_explicitly_provided_dependencies_request(
request: DependenciesRequest,
) -> ExplicitlyProvidedDependenciesRequest:
"""This rule discards any deps predicate from DependenciesRequest.

Calculating ExplicitlyProvidedDependencies does not use any deps traversal predicates as it is
meant to list all explicit deps from the given field. By stripping the predicate from the
request, we ensure that the cache key for ExplicitlyProvidedDependencies calculation does not
include the predicate increasing the cache-hit rate.
"""
# TODO: Maybe require Get(ExplicitlyProvidedDependencies, ExplicitlyProvidedDependenciesRequest)
# and deprecate Get(ExplicitlyProvidedDependencies, DependenciesRequest) via this rule.
return ExplicitlyProvidedDependenciesRequest(request.field)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #19155, I just had a comment on the determine_explicitly_provided_dependencies rule that said: NB: This rule ignores request.should_traverse_deps_predicate.

@thejcannon recommended I use an actual type instead of just a comment in #19155 (comment)

We might consider a different input type that doesn't have the ignored attribute?

So, I added this ExplicitlyProvidedDependenciesRequest with a rule to minimize API churn in this PR.

Do we want to deprecate Get(ExplicitlyProvidedDependencies, DependenciesRequest)? If/when we do that, we can probably put a deprecation warning in this rule.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea, although feel free to carry this over into a future PR so we can merge this one.

Also up to you here 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can deprecate once we have updated all uses within the pants repo. So, it fits best in a future PR.

src/python/pants/engine/target.py Outdated Show resolved Hide resolved
Copy link
Member

@thejcannon thejcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! Likely the only round of comments then 🚀

src/python/pants/backend/project_info/dependencies.py Outdated Show resolved Hide resolved
Comment on lines +1210 to +1222
async def convert_dependencies_request_to_explicitly_provided_dependencies_request(
request: DependenciesRequest,
) -> ExplicitlyProvidedDependenciesRequest:
"""This rule discards any deps predicate from DependenciesRequest.

Calculating ExplicitlyProvidedDependencies does not use any deps traversal predicates as it is
meant to list all explicit deps from the given field. By stripping the predicate from the
request, we ensure that the cache key for ExplicitlyProvidedDependencies calculation does not
include the predicate increasing the cache-hit rate.
"""
# TODO: Maybe require Get(ExplicitlyProvidedDependencies, ExplicitlyProvidedDependenciesRequest)
# and deprecate Get(ExplicitlyProvidedDependencies, DependenciesRequest) via this rule.
return ExplicitlyProvidedDependenciesRequest(request.field)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea, although feel free to carry this over into a future PR so we can merge this one.

Also up to you here 😄

src/python/pants/engine/target.py Outdated Show resolved Hide resolved
src/python/pants/engine/target.py Outdated Show resolved Hide resolved
kaos
kaos previously approved these changes Jun 8, 2023
Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! 🚀

Comment on lines 717 to 742
class DepsTraversalPredicates:
"""All methods on this class are ShouldTraverseDepsPredicate implementations."""

# NB: ShouldTraverseDepsPredicate is defined below after SpecialCasedDependencies.
# Implementations are defined here, however, before they are using them as defaults
# in CoarsenedTargetsRequest, TransitiveTargetsRequest, and DependenciesRequest.

@staticmethod
def if_dependencies_field(
target: Target, field: Dependencies | SpecialCasedDependencies
) -> bool:
"""This is the default ShouldTraverseDepsPredicate implementation.

This skips resolving dependencies for fields (like SpecialCasedDependencies) that are not
subclasses of Dependencies.
"""
return isinstance(field, Dependencies)

@staticmethod
def always(target: Target, field: Dependencies | SpecialCasedDependencies) -> bool:
"""A predicate to use when a request needs all deps.

This includes deps from fields like SpecialCasedDependencies which are ignored in most
cases.
"""
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I think this is a big improvement actually :)

@kaos kaos self-requested a review June 12, 2023 19:52
@kaos kaos dismissed their stale review June 12, 2023 19:53

I think we need to address the point raised by @jriddy

closure functions have a useless hash. The hash of a function does not
account for any closure vars, including unhashable mutable vars,
so that hash is not reliable.
Plus, every instance of a closure will have a different id, so there cannot be
any reuse of anything cached based on the function's hash.

Using a frozen dataclass bypasses these issues. We include the class's __call__ function
as a field in the parent dataclass so that it is part of
the __hash__ function generated by the
@DataClass decorator. Using the class's __call__ is more stable than using an instance's function.
"""If this predicate returns false, then the target's field's deps will be ignored."""


@dataclass(frozen=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the subclasses actually need the dataclass decorator at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without it, aren't they unfrozen? I'll try dropping the decorator on these subclasses since they don't add any fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It works just fine without decorating the subclasses. I will have to add the decorator on the predicate that stops at package targets because that will add a field. These, however, don't need it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we should still decorate:

  • It'll work correctly when someone adds a field later
  • When people copy+paste this code elsewhere they'll have it
    My 2c

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted one of the tests to make sure it works without the decorator. Doing that helped me resolve a bug with the type hint on _callable since the test made mypy inspect it more carefully.

I can re-add the decorator, I think I like it with the decorator a bit better as it is more consistent. But I've documented it as is for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'll come down to preference then. I'm for not having them until needed, but won't object/block if you prefer to keep them.

This tests 2 things:
- that subclasses without an @DataClass decorator are still frozen
- that mypy says the types of _callable and __call__ are compatible

For the mypy issue, I tried using Self instead of Any, but Self resolved to ShouldTraverseDepsPredicate which did not match SkipDepsTagOrTraverse for some reason.
@cognifloyd cognifloyd requested a review from jriddy June 13, 2023 18:34
Copy link
Member

@thejcannon thejcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful

src/python/pants/engine/target.py Outdated Show resolved Hide resolved
Co-authored-by: Joshua Cannon <joshdcannon@gmail.com>
Copy link
Sponsor Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM FWIW

huonw added a commit that referenced this pull request Jun 14, 2023
This updates `changelog.py` to write the (non-internal) changes to the
relevant release notes file directly, rather than requiring a human to
copy-paste it in. For now, the file is just mutated, without committing.
The internal changes are still printed for the human to deal with.

For instance, `pants run src/python/pants_release/changelog.py --
--prior 2.18.0.dev1 --new 2.18.0.dev2` adds a new section to
`src/python/pants/notes/2.18.x.md`:

```diff
diff --git a/src/python/pants/notes/2.18.x.md b/src/python/pants/notes/2.18.x.md
index e648a4525c..d6668a24b1 100644
--- a/src/python/pants/notes/2.18.x.md
+++ b/src/python/pants/notes/2.18.x.md
@@ -1,5 +1,35 @@
 # 2.18.x Release Series
 
+## 2.18.0.dev2 (Jun 14, 2023)
+
+### New Features
+
+* Include complete platforms for FaaS environments for more reliable building ([#19253](#19253))
+
+* Add experimental support for Rustfmt ([#18842](#18842))
+
+* Helm deployment chart field ([#19234](#19234))
+
+### Plugin API Changes
+
+* Replace `include_special_cased_deps` flag with `should_traverse_deps_predicate` ([#19272](#19272))
+
+### Bug Fixes
+
+* Raise an error if isort can't read a config file ([#19294](#19294))
+
+* Improve handling of additional files in Helm unit tests ([#19263](#19263))
+
+* Add taplo to the release ([#19258](#19258))
+
+* Handle `from foo import *` wildcard imports in Rust dep inference parser ([#19249](#19249))
+
+* Support usage of `scala_artifact` addresses in `scalac_plugin` targets ([#19205](#19205))
+
+### Performance
+
+* `scandir` returns `Stat`s relative to its directory. ([#19246](#19246))
+
 ## 2.18.0.dev1 (Jun 02, 2023)
 
 ### New Features
```

This also moves it into the new `pants_release` root, adds some basic
tests, and has it fetch the relevant branch directly, rather than
prompting the user to do so. It also pulls out a `git.py` support module
with helpers for exec-ing `git` as an external process.

The commits are individually reviewable.

This is a step towards automating more of the "start release" process,
per #19279. After this
core refactoring of the functionality, I think the next step is to
combine it with the CONTRIBUTORS update and version bumping, and then
after that string it all together on CI so that starting a release is
fully automated.
jriddy pushed a commit that referenced this pull request Jun 17, 2023
…plugins (#19306)

This builds on #19272, adding another `should_traverse_deps_predicate`
that stops dependency traversal at any package targets.

This is mostly extracted from #19155.

`TraverseIfNotPackageTarget` will be useful whenever a
`TransitiveTargetsRequest`, `CoarsenedTargetsRequest`, or
`DependenciesRequest` could benefit from treating package targets as
leaves. This PR does not change any `TransitiveTargetsRequest`s because
that is probably a change in user-facing behavior (even if it counts as
a bugfix) and needs to be documented as such. This PR merely adds the
feature, which, on its own, does not impact anything else.

Related:
- #18254
- #17368
- #15855
- #15082

---------

Co-authored-by: Andreas Stenius <andreas.stenius@imanage.com>
cognifloyd added a commit that referenced this pull request Jul 15, 2023
…19155)

This builds on:
- #19272
- #19306
- #19387

This only updates the `pex_binary` rule to use the new
`TraverseIfNotPackageTarget` predicate when requesting the source files
that should be included in this pex. Wheels, other pex_binaries, and
other package targets are not handled via this code path since those are
not python sources that get included in the pex based on the
`include_sources` flag.

Fixes #15855

This fixes #15855 because anything in a `python_distribution` does not
need to be included as a source file in the `pex_binary`. wheels get
included via the `LocalDists` rules. In some cases, we might still get
more sources in the pex than intended. This might happen if a dependency
is inferred between sources, bypassing the logic that looks for wheels
that own the relevant things. In any case, this is PR provides an
improvement and resolves the sample error in #15855.

Related:
- #18254
- #17368
- #15082

---------

Co-authored-by: Joshua Cannon <joshdcannon@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants