-
-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal: sort more tuples to potentially improve cache hit rates. #20767
Conversation
src/python/pants/backend/python/dependency_inference/module_mapper.py
Outdated
Show resolved
Hide resolved
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using a helper for this, so you can comment why were doing this?
Otherwise it isn't obvious why these things should be sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, increasing cache hit rates is good!
def __lt__(self, other: Any) -> bool: | ||
if not isinstance(other, Target): | ||
return NotImplemented | ||
return self.address < other.address |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I note that I think pair of targets a
and b
can theoretically have all of these false: a < b
, a == b
, a > b
, if the addresses are equal but at least one of the other values used in __eq
__ (__class__
, residence_dir
or field_values
) are not. That is, targets are not "totally ordered".
Does that matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that, but no I don't think that matters, as in our case, if the address is equal, it should be the same target, i.e. have the same field values too. So in this case, it's an optimization to not also compare the fields for the lt/gt methods used for sorting.
@@ -98,7 +98,9 @@ async def isolate_local_dist_wheels( | |||
) | |||
provided_files = set(wheels_listing_result.stdout.decode().splitlines()) | |||
|
|||
return LocalDistWheels(tuple(wheels), wheels_snapshot.digest, frozenset(provided_files)) | |||
return LocalDistWheels( | |||
tuple(sorted(wheels)), wheels_snapshot.digest, frozenset(sorted(provided_files)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor (and no ned to change), but... I think a frozenset
is already order-independent, so there's no need to sort the input.
This suggests another potential solution: use a frozenset
(or FrozenOrderedSet
) instead of tuple
if order truly doesn't matter for all uses. This may be more resilient for future refactorings, because caller can never forget to sort. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I failed to observe that. It could certainly be valuable to go over and see where order is of no consequence and save a few cycles by not sorting.
Saving that as an exercise for the reader/or future me. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets keep an eye on performance, in case it seems to have degraded for largish repos, it could be from the additional sorting going into this.. which suggests using a version of frozen set may be the better route.
Is this deterministic enough a problem that we could add a ruff/flake linter for it? |
I think I'd prefer to land this first, to see if it has any noticeable impact at all and take it from there (with helpers/linters etc). |
👍🏽 If it is meaningful, let me know - I don't think I've written a linter before, and I'm currently on a multi-language AST spree, so wouldn't mind adding another one under my belt :) |
add unit tests internal: sort more tuples to potentially improve cache hit rates. (pantsbuild#20767) I looked through most `tuple` uses in the Python backend, and added `sorted(..)` where I think the order wasn't already stable and the result ends up as a rule output. update docs and test fixture
I looked through most
tuple
uses in the Python backend, and addedsorted(..)
where I think the order wasn't already stable and the result ends up as a rule output.