-
-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python_dist
doesn't safeguard against pex by-version caching, may provide stale dists when used
#5449
Comments
python_dist
doesn't safeguard against pex by-version caching, may provide stale dists when used
I started an initial scaffold project for the setuptools plugin last week and will bang on it some more this week time permitting - but this ticket should really be concerned with the for the plugin approach I suggested, the API I'm thinking of would be something along the lines of:
where the value of one other important aspect to point out here is that for this to work end to end, the setuptools plugin requirement would also need to be implicitly injectable into the running |
the other alternative here of course would be to improve the pex resolver cache for this case if you want to look into that. that could potentially be simpler and a better net payoff - but I know we have other uses for the setuptools plugin in e.g. pants release land, so don't mind hacking up a plugin either way. |
How would the pex resolver cache be changed to solve this problem? Seems like that route would require a special ttl /not caching at all based on specific criteria? Regarding the API above, that looks sensible to me, however that approach would require modifying the Another easy solution to integration with py_dist build task is to write the plugin such that it looks for an environment variable to append and if it is there, it does it, and if not, no-op. That way, we could simply set |
to be clear: the
for the resolver cache, there are two sides there: build-time and run-time resolve caching. it's not clear to me if the build time resolver cache is a factor here based on how dists are directly included in the pex vs resolved via for the runtime resolver cache, since the dists being cached are local (one is in ultimately, both appending a local version identifier to the dist or improving the cache key for the pex resolver cache attack the same fundamental problem of keying into distinct cache paths - just from two different angles. I'm mostly convinced that the pex resolver path would be a better net win overall (and definitely more maintainable/less hacky). before targeting any one solution tho, a good first step here would be to characterize the current bug |
I am clear on the shell env variable, I was saying that to do that approach, I think it would require changing the This is not theoretical; I bumped into this a few times testing py_dist where a bad pex was built and on [run], pants errors out saying no compatible dists could be found (“missing the following dependencies for target: blah==1.0 blah==2.0”), and when the same task re-runs with working dists, it still errors out, leading me to believe that this is a build time caching issue. It only crops up when two dists have the exact same name and version, like in our basic test cases. Furthermore, the cached wheel sits in ~/.cache/pants/..../my.whl, leading me to believe that this is likely pants caching and not pex resolver...maybe. I’m unsure because I would think that if it is the pex resolver, it would be in the ~/.pex directory. In any case, I will get a repro together to expose the bug. |
you would only prepend the env var to the commandline if you were calling it from the shell interactively as a user - the snippit I sent is just a way of documenting the CLI with a shell usage example. in the code, you'd just set the env var in a contextmanager before calling
if it's happening under |
…we don't use the first cached one (#6022) ### Problem I made #5479 because the dist resolved for a `python_dist()` would not change after changing any of the source files, and this persisted even after a `clean-all`. This issue is more thoroughly explained in #5449 (which I literally didn't see before making this PR, but describes almost exactly what this PR does, it seems -- great minds think alike?). We were building the `python_dist()` targets each time they were invalidated, but caching the package locally in `~/.cache/pants/` after we built it the first time and using that instead, because the version string was the same for both the first and subsequent builds of the `python_dist()` target. ### Solution - Move argv generation for setup.py invocations into `BuildLocalPythonDistributions`. - Append the `python_dist()` target's fingerprint to the version string using `--tag-build`. This conforms to the "local" version specifier format as per [PEP 440](https://www.python.org/dev/peps/pep-0440/#local-version-identifiers). This tagged version is then used in the synthetic `PythonRequirement` stitched into the build graph for the local dist. - Add an integration test `test_invalidation()` using `mock_buildroot()` (which fails on master, but not in this PR) where we run a binary depending on a `python_dist()`, rewrite the contents of a source file to change the output, then run pants again to verify that the new dist is resolved. *Separately:* - Made `PythonDistribution` subclass `PythonTarget` to avoid duplicated logic. It's not clear what the original reason for not subclassing it was at first, but it seems to be not be in effect anymore. - Changed the `dir` field name for the `Manager` namedtuple in `mock_buildroot()` to be `new_buildroot` so it doesn't get highlighted as a builtin in certain text editors. ### Result When we resolve local dists in any python requirements task, we get the dist corresponding to the target we just built. Resolves #5449.
As of #5141, Pants has a new target that allows users to define setup.py-based distributions. A new Pants task will run the user-provided setup.py and produce a wheel for consumption by other Python targets.
Currently, the pex resolver handles resolving these dists into the output pex, and caches them using the name of the wheel in the pex resolver cache. This can be problematic if two
python_dist
targets in the same project have the same name and version info in their respective setup.py's, leading to undefined behavior in the event that there is a collision in the pex resolver cache.One solution is to write a
setuptools
plugin to hijack setup.py execution and inject a local version identifier into the wheel name within the context of Pants task execution, allowing the resolver to safely cache wheels because wheel names will contain their correspondingpython_dist
target's unique sha.cc @kwlzn
The text was updated successfully, but these errors were encountered: