-
-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[internal] jvm/java: ensure JDK downloaded in one process #12972
Conversation
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
0e989ce
to
0300dab
Compare
coursier.coursier.exe, | ||
"java", | ||
"--system-jvm", # TODO(#12293): use a fixed JDK version from a subsystem. | ||
f"{jdk_setup.java_home}/bin/java", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A tricky bit about this in the context of remote execution is that the JDK selection process isn't guaranteed to run on the same machine as the compile.
So rather than having the jdk_setup
expose java_home
as a property, it should probably expose it as a command prefix, which we can hope (or know, locally) will hit the cached pre-selected JDK rather than re-fetching it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So rather than having the jdk_setup expose java_home as a property, it should probably expose it as a command prefix, which we can hope (or know, locally) will hit the cached pre-selected JDK rather than re-fetching it.
That would require using coursier to invoke the JDK every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. But it appears to take a few milliseconds, because the coursier binary is native.
Also, that time is moot with nailgun: I'll probably get to that after hours this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay implemented calling into coursier's binary each time.
Note: With remote execution, the user will likely have to choose "system" JDK since there is no guarantee that one remote executor will share the same JDK cache as an executor used for a subsequent remote execution request. We shouldn't attempt to solve the remote execution case right now. |
[ci skip-rust] [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
[ci skip-rust] [ci skip-build-wheels]
There is still a failure when downloading even with this PR:
https://github.com/pantsbuild/pants/runs/3669182720?check_suite_focus=true#step:11:534 #12977 adds a JDK into the GitHub Actions CI so there is always a "system" JDK available. |
java
/ javac
directly], | ||
input_digest=coursier.digest, | ||
description="Invoke Coursier with system-jvm to fingerprint JVM version.", | ||
cache_scope=ProcessCacheScope.PER_RESTART_SUCCESSFUL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed while rebasing #12982 that this line was lost: it's important when using the system JDK, because it isn't stable, and the fingerprint computation can't be cached. I doubt it's related to the current flakiness, but fyi.
As per #12972 (review), the `Process`es used to obtain information on the JDK should not be cached permanently especially for use of the system JVM. This was originally present in the code refactored by #12972 but was lost in a rebase. [ci skip-rust] [ci skip-build-wheels]
* [internal] Run pyupgrade on src/python/pants/backend/python ([#13073](#13073)) * [internal] Re-enable some skipped JVM tests. ([#13074](#13074)) * [internal] Use `DownloadedExternalModules` when analyzing external Go packages ([#13076](#13076)) * [internal] Use `DownloadedExternalModules` during Go target generation ([#13070](#13070)) * [internal] Replace deprecated use of `[pytest] junit_xml_dir` with `[test] xml_dir. ([#13069](#13069)) * [internal] Add `DownloadedExternalModules` for Go ([#13068](#13068)) * [internal] Always use jars on the user classpath, and generalize transitive classpath building ([#13061](#13061)) * Add failing tests for Go external modules ([#13065](#13065)) * [internal] java: fix version in test ([#13064](#13064)) * [internal] Skip additional inference tests ([#13062](#13062)) * [internal] java: enable cycles for file-level targets generated by `java_sources` ([#13058](#13058)) * [internal] Add a `@logging` decorator for tests. ([#13060](#13060)) * [internal] Improve compatibility of nailgun with append only caches, and use them for Coursier ([#13046](#13046)) * [internal] Stop using `go.sum` when generating `_go_external_package` targets ([#13052](#13052)) * [internal] Rename `go_module` target to `go_mod` ([#13053](#13053)) * [internal] Refactor `go/util_rules/external_module.py` ([#13051](#13051)) * [internal] go: add analyzer and rules for test sources ([#13041](#13041)) * [Internal] Refactoring how we integrate with dockerfile ([#13027](#13027)) * [internal] Simplify `go/package_binary.py` ([#13045](#13045)) * [internal] Refactor `OwningGoMod` ([#13042](#13042)) * [internal] Refactor `go_mod.py` ([#13039](#13039)) * [internal] Record metadata on engine-aware params ([#13040](#13040)) * [internal] Test discovery of `go` binary ([#13038](#13038)) * [internal] Extract directory setup for terraform linters / formatters into a separate rule. ([#13037](#13037)) * [internal] java: register dependency inference rules ([#13035](#13035)) * [internal] Add `strutil.bullet_list()` to DRY formatting ([#13031](#13031)) * Minor cleanups for the autoflake linter / formatter. ([#13032](#13032)) * Ensure XML results recorded for both pytest and junit ([#13025](#13025)) * [internal] go: refactor compilation into separate rule ([#13019](#13019)) * [internal] go: refactor link step into separate rule ([#13022](#13022)) * [internal] go: enable plugin in repo and cleanup test project ([#13018](#13018)) * [internal] go: use colon to separate binary name and version ([#13020](#13020)) * [internal] tweak formatting of help text for sourcefile-validation subsystem. ([#13016](#13016)) * [internal] Use system-installed Go rather than installing via Pants ([#13007](#13007)) * Move the `process-execution-local-cleanup` hint to a more specific location. ([#13013](#13013)) * [internal] Split shell targets into atom vs generator ([#12957](#12957)) * Install Go in CI ([#13011](#13011)) * Refresh maintainers list. ([#13004](#13004)) * [internal] Refactor setup of GOROOT and `import_analysis.py` ([#13000](#13000)) * Infer dependencies on COPY of pex binaries for `docker_image`s. ([#12920](#12920)) * Prepare 2.7.0. ([#12995](#12995)) * [internal] jvm: skip JDK tests unless env var set ([#12994](#12994)) * [internal] jvm: limit caching of JDK setup processes ([#12992](#12992)) * [internal] Async-ify `NailgunPool::connect` and `nailgun::CommandRunner`. ([#12990](#12990)) * [internal] Replace `java_library` with `java_source` and `java_sources`, and add `java_test` ([#12976](#12976)) * Prepare 2.7.0rc5. ([#12987](#12987)) * [internal] terraform: refactor parser script into its own file ([#12985](#12985)) * [internal] jvm/java: ensure JDK downloaded in one process ([#12972](#12972)) * add JDK to GitHub Actions CI ([#12977](#12977)) * [internal] Re-enable the `clippy::used_underscore_binding` check. ([#12983](#12983)) * [internal] Use target generation for `_go_external_package` ([#12929](#12929)) * [internal] Bump CI token expiration threshold. ([#12974](#12974)) * [internal] Re-enable the Java backend. ([#12971](#12971)) * [internal] Implement `@union`s via `Query`s ([#12966](#12966)) * Remove `Enriched*Result` classes in favor of `EngineAwareReturnType.cacheable` ([#12970](#12970)) * [internal] Remove spurious `python_tests` directive ([#12968](#12968)) * [internal] Python coverage report generation uses precomputed addresses. ([#12965](#12965)) * Add PackageRootedDependencyMap for mapping inferred Java dependencies ([#12964](#12964))
Motivation
As described in #12293, multiple Coursier invocations were downloading the JDK and triggering a race condition in Coursier's locking that caused flakiness in tests.
Solution
This PR mitigates the issue by isolating JDK download to a single
Process
. The newJdkSetup
type provides rules with the command to obtain the location of the JDK so they may query Coursier for JAVA_HOME. This has the benefit of still downloading in remote execution, but providing some guarantee that there will be a single download.