Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[internal] Split shell targets into atom vs generator #12957

Merged
merged 6 commits into from
Sep 26, 2021

Conversation

Eric-Arellano
Copy link
Contributor

Demos the technique described in #12954. The end user sees no change, including ./pants filter --target-type=python_library and ./pants peek behaving the same way.

But our internal representation now uses the changes from the proposal. The major benefit is that now our rules don't need to worry about Address.is_file_target - instead, the Target API ensures that no matter what, the rules will only ever operate on files as atoms. Even with fixes like #12952, we'll do the right thing.

This change also frees us up to add an overrides field to the target generator safely.

[ci skip-rust]

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]
all_expanded_targets = await Get(Targets, AddressSpecs([DescendantAddresses("")]))
shell_tgts = tuple(tgt for tgt in all_expanded_targets if tgt.has_field(ShellSources))
all_targets = await Get(Targets, AddressSpecs([DescendantAddresses("")]))
shell_tgts = tuple(tgt for tgt in all_targets if tgt.has_field(ShellSourceField))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file mean that dep inference only happens for the file-based atomic targets, i.e. ShellSourceTarget and ShellTestTarget.

Comment on lines -183 to +185
# Because a FieldSet corresponds to a file address, there should be exactly 1 file in the
# sources. This assumption allows us to simplify determining which shell to use via inspecting
# the shebang.
if len(field_set_digest_content) != 1:
raise AssertionError(
f"The file address {request.field_set.address} had sources != 1, which is unexpected: "
f"{field_set_sources.snapshot.files}. Please file a bug at "
"https://github.com/pantsbuild/pants/issues/new with this error message copied."
)
original_test_file_content = field_set_digest_content[0]
updated_test_file_content = add_source_shunit2(original_test_file_content)
# `ShellTestSourceField` validates that there's exactly one file.
test_file_content = field_set_digest_content[0]
updated_test_file_content = add_source_shunit2(test_file_content)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited about this cleanup in particular. Rule authors don't need to know about "file targets" anymore, or even generated targets! The rule behaves the same whether the file-based target was generated or explicitly declared.

class ShellSources(Sources):
class ShellSourceField(Sources):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in love with this name, particularly wondering if it should be Source vs. Sources.

I think the superclass will probably always be Sources, which we cannot rename to Source because there are non-file based atomic targets like go_package which need >1 source file. So, weird for subclass to be Source and parent to be Sources.

But ShellSourcesField also feels wrong - we want to express in the type system that Shell operates with files as its atoms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm thinking about how to model subclasses like this expressing with the type system that they are file-based, while still having Sources as a superclass. Something like either a class-decorator or subclassing SourceField (which itself subclasses Sources). It would rename the field alias to be source and expect a single str, rather than list[str].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShellSingleSourceField ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's better, even it the extra characters are annoying.

Also, I tried this weekend experimenting with subclasses of Sources abusing the class to be source: str instead of sources: list[str], and it was chaotic. Pretty sure it violates Liskov Substitution and is a bad idea. If we want a source vs. sources field, then I think we need two distinct classes and for all our supporting code (graph.py etc) to work with both. That's not worth doing for now, so I think we stick with sources and set expected_num_files = 1.

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ShellSourcesField is fine - the fact that we operate on single sources at a time is somewhat an implementation choice. So I don't think we want to encode that in the name. Imagine a language where the theoretical atom is a single file, but in practice we choose to act on a few files together for whatever reason. There is no reason that code consuming a ShellSourcesField should care that it happens to represent a single source? That decision has been made upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sgtm. Thanks Benjy!

Comment on lines +40 to +41
class ShellGeneratingSources(Sources):
uses_source_roots = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB that the Target API needs to view these sources as different than the file-based targets for all our rules to do the right thing. This split is what unlocks all the PR's benefits.

I anticipate that this will soon subclass something like GeneratingSources, rather than Sources. See #12953 for that idea.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit premature to do this until we know what https://docs.google.com/document/d/17XvhHlx8jBn4K9qFRZPe7qaKJQjTfiKMKYRubnzWxjA/edit will look like.

In particular: an unnamed target never actually needs to exist as a Target object if there is a shim for it during BUILD file parsing. And for a named Target generator, after BUILD file parsing, it shouldn't need a Sources field because it will be sources-less, with dependencies on its generated Targets instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the idea of GeneratingSources is that it's not normal Sources. It's not representing ownership of the files, only a relationship that those files trigger generation.

Either way, this PR improves the modeling by having the generator sources be a different subclass than the atom sources.

Comment on lines -94 to +95
class Shunit2TestsTimeout(IntField):
class Shunit2TestTimeoutField(IntField):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I could go back, I would have added Field to the end of most field names. A couple extra characters, but I think it brings clarity - it's often confusing imo what is a target's field vs. a rule's helper type, e.g. EntryPoint from pants.backend.target_types.

default = ("*_test.sh", "test_*.sh", "tests.sh")


class Shunit2TestsGeneratorTarget(Target):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is verbose, but I like it for making abundantly clear its purpose.

I thought about using Macro instead of GeneratorTarget, but that's a misnomer. An important part of https://docs.google.com/document/d/1HpJn2jTWf5sKob6zhe4SqHZ7KWBlX4a6OJOwnuw-IHo/edit#heading=h.7qrnzjjahn5v is realizing that target generators are targets themselves.

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
class ShellSources(Sources):
class ShellSourceField(Sources):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShellSingleSourceField ?

@Eric-Arellano
Copy link
Contributor Author

Bump @benjyw @stuhood re the naming of things. I'd like to do this same type of PR for the rest of the codebase once this lands.

…rget

# Conflicts:
#	src/python/pants/backend/shell/lint/shell_fmt.py
#	src/python/pants/backend/shell/register.py
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@Eric-Arellano Eric-Arellano merged commit 906099f into pantsbuild:main Sep 26, 2021
@Eric-Arellano Eric-Arellano deleted the shell-test-target branch September 27, 2021 04:46
@wisechengyi wisechengyi mentioned this pull request Oct 2, 2021
stuhood pushed a commit that referenced this pull request Oct 2, 2021
* [internal] Run pyupgrade on src/python/pants/backend/python ([#13073](#13073))
* [internal] Re-enable some skipped JVM tests. ([#13074](#13074))
* [internal] Use `DownloadedExternalModules` when analyzing external Go packages ([#13076](#13076))
* [internal] Use `DownloadedExternalModules` during Go target generation ([#13070](#13070))
* [internal] Replace deprecated use of `[pytest] junit_xml_dir` with `[test] xml_dir. ([#13069](#13069))
* [internal] Add `DownloadedExternalModules` for Go ([#13068](#13068))
* [internal] Always use jars on the user classpath, and generalize transitive classpath building ([#13061](#13061))
* Add failing tests for Go external modules ([#13065](#13065))
* [internal] java: fix version in test ([#13064](#13064))
* [internal] Skip additional inference tests ([#13062](#13062))
* [internal] java: enable cycles for file-level targets generated by `java_sources` ([#13058](#13058))
* [internal] Add a `@logging` decorator for tests. ([#13060](#13060))
* [internal] Improve compatibility of nailgun with append only caches, and use them for Coursier ([#13046](#13046))
* [internal] Stop using `go.sum` when generating `_go_external_package` targets ([#13052](#13052))
* [internal] Rename `go_module` target to `go_mod` ([#13053](#13053))
* [internal] Refactor `go/util_rules/external_module.py` ([#13051](#13051))
* [internal] go: add analyzer and rules for test sources ([#13041](#13041))
* [Internal] Refactoring how we integrate with dockerfile ([#13027](#13027))
* [internal] Simplify `go/package_binary.py` ([#13045](#13045))
* [internal] Refactor `OwningGoMod` ([#13042](#13042))
* [internal] Refactor `go_mod.py` ([#13039](#13039))
* [internal] Record metadata on engine-aware params ([#13040](#13040))
* [internal] Test discovery of `go` binary ([#13038](#13038))
* [internal] Extract directory setup for terraform linters / formatters into a separate rule. ([#13037](#13037))
* [internal] java: register dependency inference rules ([#13035](#13035))
* [internal] Add `strutil.bullet_list()` to DRY formatting ([#13031](#13031))
* Minor cleanups for the autoflake linter / formatter. ([#13032](#13032))
* Ensure XML results recorded for both pytest and junit ([#13025](#13025))
* [internal] go: refactor compilation into separate rule ([#13019](#13019))
* [internal] go: refactor link step into separate rule ([#13022](#13022))
* [internal] go: enable plugin in repo and cleanup test project ([#13018](#13018))
* [internal] go: use colon to separate binary name and version ([#13020](#13020))
* [internal] tweak formatting of help text for sourcefile-validation subsystem. ([#13016](#13016))
* [internal] Use system-installed Go rather than installing via Pants ([#13007](#13007))
* Move the `process-execution-local-cleanup` hint to a more specific location. ([#13013](#13013))
* [internal] Split shell targets into atom vs generator ([#12957](#12957))
* Install Go in CI ([#13011](#13011))
* Refresh maintainers list. ([#13004](#13004))
* [internal] Refactor setup of GOROOT and `import_analysis.py` ([#13000](#13000))
* Infer dependencies on COPY of pex binaries for `docker_image`s. ([#12920](#12920))
* Prepare 2.7.0. ([#12995](#12995))
* [internal] jvm: skip JDK tests unless env var set ([#12994](#12994))
* [internal] jvm: limit caching of JDK setup processes ([#12992](#12992))
* [internal] Async-ify `NailgunPool::connect` and `nailgun::CommandRunner`. ([#12990](#12990))
* [internal] Replace `java_library` with `java_source` and `java_sources`, and add `java_test` ([#12976](#12976))
* Prepare 2.7.0rc5. ([#12987](#12987))
* [internal] terraform: refactor parser script into its own file ([#12985](#12985))
* [internal] jvm/java: ensure JDK downloaded in one process ([#12972](#12972))
* add JDK to GitHub Actions CI ([#12977](#12977))
* [internal] Re-enable the `clippy::used_underscore_binding` check. ([#12983](#12983))
* [internal] Use target generation for `_go_external_package` ([#12929](#12929))
* [internal] Bump CI token expiration threshold. ([#12974](#12974))
* [internal] Re-enable the Java backend. ([#12971](#12971))
* [internal] Implement `@union`s via `Query`s ([#12966](#12966))
* Remove `Enriched*Result` classes in favor of `EngineAwareReturnType.cacheable` ([#12970](#12970))
* [internal] Remove spurious `python_tests` directive ([#12968](#12968))
* [internal] Python coverage report generation uses precomputed addresses. ([#12965](#12965))
* Add PackageRootedDependencyMap for mapping inferred Java dependencies ([#12964](#12964))
Eric-Arellano added a commit that referenced this pull request Oct 9, 2021
Similar to #12957, this implements the technique from #12954. `./pants filter --target-type` and `./pants peek` behave the same as before, but our internal implementation differentiates between the atom target vs. its generator.

[ci skip-rust]
[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants