Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distribution-filename: speed up is_compatible #367

Merged
merged 5 commits into from
Nov 9, 2023

Conversation

BurntSushi
Copy link
Member

This PR tweaks the representation of Tags in order to offer a
faster implementation of WheelFilename::is_compatible. We now use a
nested map of tags that lets us avoid looping over every supported
platform tag. As the code comments suggest, that is the essential gain.
We still do not mind looping over the tags in each wheel name since they
tend to be quite small. And pushing our thumb on that side of things can
make things worse overall since it would likely slow down WheelFilename
construction itself.

For micro-benchmarks, we improve considerably for compatibility
checking:

$ critcmp base test3
group                                                   base                                    test3
-----                                                   ----                                    -----
build_platform_tags/burntsushi-archlinux                1.00     46.2±0.28µs        ? ?/sec     2.48    114.8±0.45µs        ? ?/sec
wheelname_parsing/flyte-long-compatible                 1.00    624.8±3.31ns   174.0 MB/sec     1.01    629.4±4.30ns   172.7 MB/sec
wheelname_parsing/flyte-long-incompatible               1.00    743.6±4.23ns   165.4 MB/sec     1.00    746.9±4.62ns   164.7 MB/sec
wheelname_parsing/flyte-short-compatible                1.00    526.7±4.76ns    54.3 MB/sec     1.01    530.2±5.81ns    54.0 MB/sec
wheelname_parsing/flyte-short-incompatible              1.00    540.4±4.93ns    60.0 MB/sec     1.01    545.7±5.31ns    59.4 MB/sec
wheelname_parsing_failure/flyte-long-extension          1.00     13.6±0.13ns     3.2 GB/sec     1.01     13.7±0.14ns     3.2 GB/sec
wheelname_parsing_failure/flyte-short-extension         1.00     14.0±0.20ns  1160.4 MB/sec     1.01     14.1±0.14ns  1146.5 MB/sec
wheelname_tag_compatibility/flyte-long-compatible       11.33   159.8±2.79ns   680.5 MB/sec     1.00     14.1±0.23ns     7.5 GB/sec
wheelname_tag_compatibility/flyte-long-incompatible     237.60 1671.8±37.99ns    73.6 MB/sec    1.00      7.0±0.08ns    17.1 GB/sec
wheelname_tag_compatibility/flyte-short-compatible      16.07   223.5±8.60ns   128.0 MB/sec     1.00     13.9±0.30ns     2.0 GB/sec
wheelname_tag_compatibility/flyte-short-incompatible    149.83   628.3±2.13ns    51.6 MB/sec    1.00      4.2±0.10ns     7.6 GB/sec

We do regress slightly on the time it takes for Tags::new to run, but
this is somewhat expected. And in absolute terms, 114us is perfectly
acceptable given that it's only executed ~once for each puffin
invocation.

Ad hoc benchmarks indicate an overall 25% perf improvement in puffin pip-compile times. This roughly corresponds with how much time
is_compatible was taking. Indeed, profiling confirms that it has
virtually disappeared from the profile.

Fixes #157

(It might be easiest to review this PR commit-by-commit.)

I added this comment because I was trying to figure out where and
how the build tags were used before modifying the code. But it looks
like they aren't. I also checked and, at least the `flyte.in`
benchmark does have some wheel names that contain a build number.
There is perhaps a latent bug here, but I'm not certain.
@BurntSushi
Copy link
Member Author

Fun fact: the last commit in this PR represents my second attempt at optimization. My first attempt ended up being bunk. I had attempted to build one regex that included all platform tags that could be matched against the wheel filename itself directly. It worked and is_compatible was implemented simply via re.is_match(..). While it did indeed fix is_compatible, it made Tags::new slow enough that it took about ~10% of resolution time. The regex could likely be shrunk due to the amount of redundancy in the tags, but at that point, I figured it got too complex. The place I ended up was much simpler.

Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work!

crates/platform-tags/src/lib.rs Show resolved Hide resolved
crates/bench/benches/distribution_filename.rs Outdated Show resolved Hide resolved
crates/bench/benches/distribution_filename.rs Show resolved Hide resolved
crates/bench/benches/distribution_filename.rs Outdated Show resolved Hide resolved
Comment on lines +35 to +39
// 2023-11-08(burntsushi): It looks like the code below actually drops
// the build tag if one is found. According to PEP 0427, the build tag
// is used to break ties. This might mean that we generate identical
// `WheelName` values for multiple distinct wheels, but it's not clear
// if this is a problem in practice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped it for the initial version to add it later when a case comes up but so far i didn't hit any case where there the build tag mattered (i haven't actually seen any build tag in the wild yet)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are lots in the flyte.in benchmark. For example, Pillow-9.4.0-2-pp39-pypy39_pp73-macosx_10_10_x86_64 has a build tag of 2.

crates/distribution-filename/src/wheel.rs Outdated Show resolved Hide resolved
crates/distribution-filename/src/wheel.rs Show resolved Hide resolved
The goal here is to optimize WheelFilename::is_compatible. In so doing,
we want to not only track our progress to ensure a real improvement has
been made, but also that we don't regress other things. (Or, if we do,
that such regressions are okay.) For example, speeding up
WheelFilename::is_compatible will likely require making Tags::new
slower. Which is probably fine so long as it remains "reasonable" since
Tags::new is only executed ~once for the lifetime of a `puffin` process.

The benchmarking structure here was mostly copied from ruff_benchmark.
Before tweaking this to improve the speed of tag compatibility checks,
we add a few tests to check basic parsing behavior. We try to cover all
error conditions.

In the process of doing this, I changed one error condition to an unwrap
since it's impossible for str::split to return an empty iterator for any
inputs.

This also adds Insta as a dependency of distribution-filename for the
tests.
Previously, the wheel filename parser would accept names with more than
6 components. It would behave as if any component after the sixth did
not exist.

PEP 0427 seems to imply that there really can be no more than 6
components. In particular, if more were allowed or if dashes were
allowed in the platform tag, then it would be ambiguous in some
cases whether a wheel filename had a build number or not.

The reason for adding this error case is because we want to depend
on the fact that none of the tags contain dashes when optimizing.
Technically we could do so without this error case, but without it,
the code doesn't quite make it clear whether anything extra is left
out by mistake or not. By adding an explicit error case, we make it
a little harder to accidentally break assumptions made by other
parts of the code.
This commit tweaks the representation of `Tags` in order to offer a
faster implementation of `WheelFilename::is_compatible`. We now use a
nested map of tags that lets us avoid looping over every supported
platform tag. As the code comments suggest, that is the essential gain.
We still do not mind looping over the tags in each wheel name since they
tend to be quite small. And pushing our thumb on that side of things can
make things worse overall since it would likely slow down WheelFilename
construction itself.

For micro-benchmarks, we improve considerably for compatibility
checking:

    $ critcmp base test3
    group                                                   base                                    test3
    -----                                                   ----                                    -----
    build_platform_tags/burntsushi-archlinux                1.00     46.2±0.28µs        ? ?/sec     2.48    114.8±0.45µs        ? ?/sec
    wheelname_parsing/flyte-long-compatible                 1.00    624.8±3.31ns   174.0 MB/sec     1.01    629.4±4.30ns   172.7 MB/sec
    wheelname_parsing/flyte-long-incompatible               1.00    743.6±4.23ns   165.4 MB/sec     1.00    746.9±4.62ns   164.7 MB/sec
    wheelname_parsing/flyte-short-compatible                1.00    526.7±4.76ns    54.3 MB/sec     1.01    530.2±5.81ns    54.0 MB/sec
    wheelname_parsing/flyte-short-incompatible              1.00    540.4±4.93ns    60.0 MB/sec     1.01    545.7±5.31ns    59.4 MB/sec
    wheelname_parsing_failure/flyte-long-extension          1.00     13.6±0.13ns     3.2 GB/sec     1.01     13.7±0.14ns     3.2 GB/sec
    wheelname_parsing_failure/flyte-short-extension         1.00     14.0±0.20ns  1160.4 MB/sec     1.01     14.1±0.14ns  1146.5 MB/sec
    wheelname_tag_compatibility/flyte-long-compatible       11.33   159.8±2.79ns   680.5 MB/sec     1.00     14.1±0.23ns     7.5 GB/sec
    wheelname_tag_compatibility/flyte-long-incompatible     237.60 1671.8±37.99ns    73.6 MB/sec    1.00      7.0±0.08ns    17.1 GB/sec
    wheelname_tag_compatibility/flyte-short-compatible      16.07   223.5±8.60ns   128.0 MB/sec     1.00     13.9±0.30ns     2.0 GB/sec
    wheelname_tag_compatibility/flyte-short-incompatible    149.83   628.3±2.13ns    51.6 MB/sec    1.00      4.2±0.10ns     7.6 GB/sec

We do regress slightly on the time it takes for `Tags::new` to run, but
this is somewhat expected. And in absolute terms, 114us is perfectly
acceptable given that it's only executed ~once for each `puffin`
invocation.

Ad hoc benchmarks indicate an overall 25% perf improvement in `puffin
pip-compile` times. This roughly corresponds with how much time
`is_compatible` was taking. Indeed, a profile confirm that it has
virtually disappeared from the profile.

Fixes #157
@BurntSushi BurntSushi merged commit 33c0901 into main Nov 9, 2023
3 checks passed
@BurntSushi BurntSushi deleted the ag/improve-wheel-tag-matching branch November 9, 2023 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance of wheel tag matching
3 participants