Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PytestRun --fast multi-source-chroots do not account for namespace packages #5426

Closed
jsirois opened this issue Feb 2, 2018 · 7 comments
Closed
Assignees
Labels
Milestone

Comments

@jsirois
Copy link
Contributor

jsirois commented Feb 2, 2018

A classic case here is a repo layout of:

src/python/lib/item.py
tests/python/lib/test_item.py

Here we have tests with a parallel package structure to the code under test, but separated into two source roots. Under the covers, we construct an execution pex by PEX_PATH combining two independent source pexes, neither of which is declaring namespace packages for lib in this example - and this leads to an inability to find code.

@jsirois
Copy link
Contributor Author

jsirois commented Feb 6, 2018

OK - the current state of the world, in layers from low to high:

  1. PEX injects ns packages where no __init__.py exist
  2. GatherSources does not clear out empty __init__.py to make way for mechanism one, or else convert to ns packages itself.
  3. Some cases of user code (Twitter has these) do have non-empty __init__.py non-ns packages in a source tree that are tested in a parallel test tree.

Off the top - the PEX mechanism generating ns packages in 1 should not be relied on - this is TODO'd dead code.

Implementing a proactive 2 that injected ns packages in all PEXs generated by GatherSources for both missing and empty __init__.py would solve both coverage and import problems for trees with no __init__.py with non-ns package contents. This would not solve 3, which requires the parallel test code to live in the same directory as the code it tests. In the old model with a single PEX for all gathered sources, this naturally was supported and only coverage was broken for setups with multiple (non-test) source roots.

An approach here then might be to:

  1. Identify all targets that need sources gathered.
  2. Separate out the test targets
  3. Gather the non-test targets into source root buckets, recording the packages contained within those buckets.
  4. Add the test targets whose packages match any indexed source root buckets into those buckets.
  5. Add the ramining test targets to their own source root bucket.

The assumption here is coverage data is never desired for the code contained in test targets.

@jsirois
Copy link
Contributor Author

jsirois commented Feb 6, 2018

A few considerations that fall out of the above:

  1. GatherSources currently has no knowledge of tests, which seems desirable - how to decouple?
  2. If we find a case where a test package is parallel to a source package and both have non-empty non-ns __init__.py we should probably fail-fast.

Perhaps 1 & 2 can be addressed by learning GatherSources to care about matching packages across source roots only - regardless of target type owning the package. It could gather all these and implement the sanity check in 2 - either exactly 1 package with non-empty non-ns __init__.py or die, and then having verified this, place all code with like packages in the same source pex.

This still falls down though, since the logic would take proj1/src/python/util/fs.py and place it in the same pex with proj2/src/python/util/cache.py and then proj2/test/python/util/test_cache.py would fail to be able to get coverage if proj1/src/python/util/fs.py was the ~random winner of the coverage mapping ordering. So... more thought needed on rules. This may in fact require knowledge of the test target vs other targets distinction wrt coverage, which implies a custom GatherSources task or a new downstream task that takes the GatherSources PythonSources product and applies test logic to re-jigger pexes into a form useable by PytestRun.

@jsirois
Copy link
Contributor Author

jsirois commented Feb 6, 2018

The whole other tack to take here would be to modify coverage with a contribution to support split source roots in the [paths] configuration. I was hoping to avoid this; ie to be robust to the underlying tooling vagaries, but it may be the only real option.

@jsirois
Copy link
Contributor Author

jsirois commented Feb 6, 2018

A coverage plug-in may be a way to modify coverage reporting stably: https://coverage.readthedocs.io/en/latest/api_plugin.html#api-plugin

I'll give this a spin when I next pick this work back up. The main thrust being we could return to a single source pex and just have the coverage plugin wire up the correct report mappings back to the source root original sources.

@stuhood
Copy link
Sponsor Member

stuhood commented Feb 7, 2018

Perhaps 1 & 2 can be addressed by learning GatherSources to care about matching packages across source roots only - regardless of target type owning the package. It could gather all these and implement the sanity check in 2 - either exactly 1 package with non-empty non-ns init.py or die, and then having verified this, place all code with like packages in the same source pex.

This seems reasonable to me... we're happy to clean up code if the result is something that follows python conventions anyway.

This still falls down though, since the logic would take proj1/src/python/util/fs.py and place it in the same pex with proj2/src/python/util/cache.py and then proj2/test/python/util/test_cache.py would fail to be able to get coverage if proj1/src/python/util/fs.py was the ~random winner of the coverage mapping ordering.

I feel like this impact (specifically the impact on coverage) would be fine as long as it results in a stable winning target and a warning that only one target is being considered for coverage (with a little bit of intelligence as to how the target is selected)? When I think of the usecases in pantsbuild/pants for overlapping packages, the primary usecase is with nesting: so the contrib modules all collide, but only for nested/empty preceding packages. So the choice of which target to consider for coverage of a module could be stable as "has the most sources in that module", or something.

@jsirois
Copy link
Contributor Author

jsirois commented Feb 27, 2018

OK - checking back in to note that the coverage plugin approach appears to work. A good bit more polish is needed as well as unit tests, but the mechanism looks good.

@jsirois
Copy link
Contributor Author

jsirois commented Feb 28, 2018

OK, implementing a clean plugin is currently not possible due to https://bitbucket.org/ned/coveragepy/issues/646/modifying-coverage-reporting-for-python but the workaround demonstrated there suffices for Pants for now.

jsirois added a commit to jsirois/pants that referenced this issue Mar 1, 2018
Use a coverage plugin instead of `coverage combine` to open the door to
using a single PEX test source chroot even when testing against multiple
repo source roots at once.

Groundwork for pantsbuild#5426.
jsirois added a commit to jsirois/pants that referenced this issue Mar 1, 2018
Use a coverage plugin instead of `coverage combine` to open the door to
using a single PEX test source chroot even when testing against multiple
repo source roots at once.

Needed for pantsbuild#5426
Depends on pantsbuild#5420
jsirois added a commit to jsirois/pants that referenced this issue Mar 1, 2018
This change reverts the majority of pantsbuild#5400 (GatherSources) and adjusts
the new coverage plugin path mapping system to handle mapping a single
source chroot back to potentially multiple source roots. The existing
tests of this  capability introduced in pantsbuild#5400 still pass and as a result
the previously conflicting use cases represented by pantsbuild#5314 (multiple
library/binary source roots with interdependencies under test) and
pantsbuild#5426 (parallel test packages in a test source root) are now both
satisfied.

Fixes pantsbuild#5314
Fixes pantsbuild#5426
Depends on pantsbuild#5534
wisechengyi pushed a commit that referenced this issue Mar 1, 2018
Use a coverage plugin instead of `coverage combine` to open the door to
using a single PEX test source chroot even when testing against multiple
repo source roots at once.

Needed for #5426
Depends on #5420
wisechengyi pushed a commit that referenced this issue Mar 1, 2018
This change reverts the majority of #5400 (GatherSources) and adjusts
the new coverage plugin path mapping system to handle mapping a single
source chroot back to potentially multiple source roots. The existing
tests of this  capability introduced in #5400 still pass and as a result
the previously conflicting use cases represented by #5314 (multiple
library/binary source roots with interdependencies under test) and
#5426 (parallel test packages in a test source root) are now both
satisfied.

Fixes #5314
Fixes #5426
Depends on #5534
jsirois added a commit to jsirois/pants that referenced this issue Mar 1, 2018
Use a coverage plugin instead of `coverage combine` to open the door to
using a single PEX test source chroot even when testing against multiple
repo source roots at once.

Needed for pantsbuild#5426
Depends on pantsbuild#5420
jsirois added a commit that referenced this issue Mar 1, 2018
Use a coverage plugin instead of `coverage combine` to open the door to
using a single PEX test source chroot even when testing against multiple
repo source roots at once.

Needed for #5426
Depends on #5420
jsirois added a commit to jsirois/pants that referenced this issue Mar 1, 2018
This change reverts the majority of pantsbuild#5400 (GatherSources) and adjusts
the new coverage plugin path mapping system to handle mapping a single
source chroot back to potentially multiple source roots. The existing
tests of this  capability introduced in pantsbuild#5400 still pass and as a result
the previously conflicting use cases represented by pantsbuild#5314 (multiple
library/binary source roots with interdependencies under test) and
pantsbuild#5426 (parallel test packages in a test source root) are now both
satisfied.

Fixes pantsbuild#5314
Fixes pantsbuild#5426
Depends on pantsbuild#5534
jsirois added a commit that referenced this issue Mar 2, 2018
This change reverts the majority of #5400 (GatherSources) and adjusts
the new coverage plugin path mapping system to handle mapping a single
source chroot back to potentially multiple source roots. The existing
tests of this  capability introduced in #5400 still pass and as a result
the previously conflicting use cases represented by #5314 (multiple
library/binary source roots with interdependencies under test) and
#5426 (parallel test packages in a test source root) are now both
satisfied.

Fixes #5314
Fixes #5426
Depends on #5534
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants