Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix python_dist() caching #5479

Conversation

cosmicexplorer
Copy link
Contributor

Problem

python_dist() doesn't update the binary created when sources change. On master, you can run:

> ./pants run testprojects/src/python/python_distribution/fasthello_with_install_requires:main_with_no_conflict
...
Super hello
United States
...

Then, change either super_greet.c or hello.py to modify the output (I added an exclamation point to "Super hello!" in super_greet.c). Then run:

> ./pants run testprojects/src/python/python_distribution/fasthello_with_install_requires:main_with_no_conflict
...
Super hello
United States
...

The same output appears. This also occurs after a clean-all -- I'm not exactly sure why.

Solution

Add the built dists to the pex builder in ResolveRequirementsTaskBase#_build_requirements_pex(). I haven't investigated enough into pex or python dists in pants enough to know exactly why this works yet.

Result

Binaries built with python_dist() are now updated when source files change. I have added a test_invalidation() test to test_python_distribution_integration.py, but I'm not sure whether this tests everything it should be, as the change that fixes it isn't in binary creation, but in resolve_requirements_task_base.py. I would love if someone with more experience with the python backend could give me more insight on that (but will totally investigate myself before this is merged). I would also like to know why the original error would occur even after a clean-all, which seems indicative of some greater issue, but might be nothing.

@jsirois
Copy link
Contributor

jsirois commented Feb 16, 2018

I haven't dug into the diff yet, but inspection of the pre-existing code/mechanism brings to light what looks like a flaw: a distribution is built via setup.py, and then consumed via standard requirement resolution. Without ensuring the version of the distribution is changed to reflect underlying source file changes, resolution will hit caches; ie: If the underlying setup.py specifies a static version, ie version='1.0.0', and this is not munged by the BuildLocalPythonDistributions task (it isn't), then changes in underlying source will lead to new wheels being built, but all with the same version. Resolution then of the wheel is likely to stop at a cache, grabbing an old wheel with the same version. Just checking this is also your understanding of the underlying problem here - I think this is the root at any rate.

@jsirois
Copy link
Contributor

jsirois commented Feb 16, 2018

To underscore the above, this artifact: /home/jsirois/.cache/pants/python_cache/requirements/CPython-2.7.12/fasthello_test-1.0.0-cp27-cp27mu-linux_x86_64.whl
That should be stably resolved for any request of fasthello_test==1.0.0, bypassing resolution of a new wheel from, e.g.: .pants.d/pyprep/build-local-dists/current/testprojects.src.python.python_distribution.fasthello_with_install_requires.fasthello/current/dist/fasthello_test-1.0.0-cp27-cp27mu-linux_x86_64.whl.

@cosmicexplorer
Copy link
Contributor Author

Some discussion on the slack:

  1. The persistence after a clean-all is likely due to the cache resolution issue described above.
  2. In the debug log you can see that we hit pypi for fasthello_test -- this is another example of incorrect resolution, although I don't know if it would have ended up using a pypi fasthello_test if it existed (@jsirois mentioned that this is also a reason for version munging).

@jsirois
Copy link
Contributor

jsirois commented Feb 16, 2018

To confirm, the resolver will use the 1st fasthello_test==1.0.0 it finds - if on pypi, then that wins! The summary of the version-munging approach is to allow the code to use its standard resolution pipeline while ensuring the only version of the dist that exists in the universe is local and unique to the hash of the sources making it up. All this said, it may not be desirable to "allow the code to use its standard resolution pipeline". There are likely better approaches.

@CMLivingston
Copy link
Contributor

CMLivingston commented Feb 16, 2018

@kwlzn and I have had some discussion around this and one potential solution is to write a setuptools plugin that injects a local version identifier into the wheel name during setup.py execution so as to properly use the new artifact. Another solution could be to modify the caching strategy to give local py_dist wheels special treatment to avoid this.

@jsirois So you are saying that the resolver will use pypi if that exists over a local find-links repo? The current implementation creates a synthetic req lib for the local wheel and gives it the .pants.d path as its PythonRequirement repository attribute, resolving from there. Based on my testing, it looks to me like the resolver uses the find-links repo over pypi (for the case where I name fasthello_test==1.0.0 in setup.py to be a real package on pypi like flask=0.12).

Given that rebuilding on source code changes has worked before when we did not create a synthetic lib, I believe that this stale usage is a byproduct of creating the synthetic req lib and passing it to the pex resolver with a find-links repo.

@jsirois
Copy link
Contributor

jsirois commented Feb 16, 2018

And yet another solution is to not resolve the local dists at all since we already know their location. We only need resolve their requirements, not them themselves.

So you are saying that the resolver will use pypi if that exists over a local find-links repo?

Ack, and I think I was just wrong on this one. I did not review the code to see if the find-links were mixed in to the resolver set or used individually. So yay!

@CMLivingston
Copy link
Contributor

CMLivingston commented Feb 17, 2018

Yep! This intention of this was to get all of the install_requires requirements into the resolver, so there are a few avenues we can take to improve here.

@CMLivingston
Copy link
Contributor

You may want to wait on landing this. I'd be interested in @kwlzn 's take on these changes, we have been discussing alternatives to this approach in this issue here: #5449.

@cosmicexplorer
Copy link
Contributor Author

Yeah, this can lie dormant for now, or I/someone can close it if it's going to be covered by other PRs.

@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Feb 17, 2018

Just closed because I think this could/should be moved to #5449, let's just make sure to mention #5480 in that thread too so we can get all the context.

cosmicexplorer added a commit that referenced this pull request Jul 17, 2018
…we don't use the first cached one (#6022)

### Problem

I made #5479 because the dist resolved for a `python_dist()` would not change after changing any of the source files, and this persisted even after a `clean-all`. This issue is more thoroughly explained in #5449 (which I literally didn't see before making this PR, but describes almost exactly what this PR does, it seems -- great minds think alike?). We were building the `python_dist()` targets each time they were invalidated, but caching the package locally in `~/.cache/pants/` after we built it the first time and using that instead, because the version string was the same for both the first and subsequent builds of the `python_dist()` target.

### Solution

- Move argv generation for setup.py invocations into `BuildLocalPythonDistributions`.
- Append the `python_dist()` target's fingerprint to the version string using `--tag-build`. This conforms to the "local" version specifier format as per [PEP 440](https://www.python.org/dev/peps/pep-0440/#local-version-identifiers). This tagged version is then used in the synthetic `PythonRequirement` stitched into the build graph for the local dist.
- Add an integration test `test_invalidation()` using `mock_buildroot()` (which fails on master, but not in this PR) where we run a binary depending on a `python_dist()`, rewrite the contents of a source file to change the output, then run pants again to verify that the new dist is resolved.

*Separately:*

- Made `PythonDistribution` subclass `PythonTarget` to avoid duplicated logic. It's not clear what the original reason for not subclassing it was at first, but it seems to be not be in effect anymore.
- Changed the `dir` field name for the `Manager` namedtuple in `mock_buildroot()` to be `new_buildroot` so it doesn't get highlighted as a builtin in certain text editors.

### Result

When we resolve local dists in any python requirements task, we get the dist corresponding to the target we just built. Resolves #5449.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants