Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[internal] jvm/java: ensure JDK downloaded in one process #12972

Merged
merged 9 commits into from
Sep 22, 2021

Conversation

tdyas
Copy link
Contributor

@tdyas tdyas commented Sep 21, 2021

Motivation

As described in #12293, multiple Coursier invocations were downloading the JDK and triggering a race condition in Coursier's locking that caused flakiness in tests.

Solution

This PR mitigates the issue by isolating JDK download to a single Process. The new JdkSetup type provides rules with the command to obtain the location of the JDK so they may query Coursier for JAVA_HOME. This has the benefit of still downloading in remote execution, but providing some guarantee that there will be a single download.

Tom Dyas added 7 commits September 21, 2021 14:31
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
fmt
[ci skip-rust]

[ci skip-build-wheels]
fmt
[ci skip-rust]

[ci skip-build-wheels]
coursier.coursier.exe,
"java",
"--system-jvm", # TODO(#12293): use a fixed JDK version from a subsystem.
f"{jdk_setup.java_home}/bin/java",
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tricky bit about this in the context of remote execution is that the JDK selection process isn't guaranteed to run on the same machine as the compile.

So rather than having the jdk_setup expose java_home as a property, it should probably expose it as a command prefix, which we can hope (or know, locally) will hit the cached pre-selected JDK rather than re-fetching it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So rather than having the jdk_setup expose java_home as a property, it should probably expose it as a command prefix, which we can hope (or know, locally) will hit the cached pre-selected JDK rather than re-fetching it.

That would require using coursier to invoke the JDK every time?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. But it appears to take a few milliseconds, because the coursier binary is native.

Also, that time is moot with nailgun: I'll probably get to that after hours this week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay implemented calling into coursier's binary each time.

@tdyas
Copy link
Contributor Author

tdyas commented Sep 21, 2021

Note: With remote execution, the user will likely have to choose "system" JDK since there is no guarantee that one remote executor will share the same JDK cache as an executor used for a subsequent remote execution request. We shouldn't attempt to solve the remote execution case right now.

[ci skip-rust]

[ci skip-build-wheels]
Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

src/python/pants/backend/java/test/junit.py Outdated Show resolved Hide resolved
[ci skip-rust]

[ci skip-build-wheels]
@tdyas
Copy link
Contributor Author

tdyas commented Sep 21, 2021

There is still a failure when downloading even with this PR:

ValueError: Failed to determine JAVA_HOME for JDK system: Downloading https://github.com/shyiko/jabba/raw/master/index.json
E           Downloaded https://github.com/shyiko/jabba/raw/master/index.json
E           Downloading https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u292-b10/OpenJDK8U-jdk_x64_linux_hotspot_8u292b10.tar.gz
E           Still downloading:
E           https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u292-b10/OpenJDK8U-jdk_x64_linux_hotspot_8u292b10.tar.gz (26.83 %, 27640420 / 103026380)
E           
E           Downloaded https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u292-b10/OpenJDK8U-jdk_x64_linux_hotspot_8u292b10.tar.gz
E           Extracting
E             /home/runner/.cache/coursier/v1/https/github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u292-b10/OpenJDK8U-jdk_x64_linux_hotspot_8u292b10.tar.gz
E           in
E             /home/runner/.cache/coursier/jvm/adopt@1.8.0-292
E           Extraction failed: java.nio.file.FileSystemException: /home/runner/.cache/coursier/jvm/.adopt@1.8.0-292.part/jdk8u292-b10 -> /home/runner/.cache/coursier/jvm/adopt@1.8.0-292: Directory not empty
E           Exception in thread "main" java.nio.file.FileSystemException: /home/runner/.cache/coursier/jvm/.adopt@1.8.0-292.part/jdk8u292-b10 -> /home/runner/.cache/coursier/jvm/adopt@1.8.0-292: Directory not empty
E           	at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:417)
E           	at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:267)
E           	at java.nio.file.Files.move(Files.java:1421)
E           	at coursier.jvm.JvmCache.$anonfun$tryExtract$1(JvmCache.scala:66)
E           	at coursier.jvm.JvmCache.$anonfun$withLockFor$1(JvmCache.scala:267)
E           	at coursier.cache.CacheLocks$.loop$1(CacheLocks.scala:72)
E           	at coursier.cache.CacheLocks$.withLockOr(CacheLocks.scala:98)
E           	at coursier.jvm.JvmCache.withLockFor(JvmCache.scala:267)
E           	at coursier.jvm.JvmCache.tryExtract(JvmCache.scala:48)
E           	at coursier.jvm.JvmCache.$anonfun$get$9(JvmCache.scala:140)
E           	at coursier.util.Task$.wrap(Task.scala:84)
E           	at coursier.util.Task$.$anonfun$delay$2(Task.scala:49)
E           	at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
E           	at scala.util.Success.$anonfun$map$1(Try.scala:255)
E           	at scala.util.Success.map(Try.scala:213)
E           	at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
E           	at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
E           	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
E           	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
E           	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
E           	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
E           	at java.lang.Thread.run(Thread.java:834)
E           	at com.oracle.svm.core.thread.JavaThreads.threadStartRoutine(JavaThreads.java:517)
E           	at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:193)

https://github.com/pantsbuild/pants/runs/3669182720?check_suite_focus=true#step:11:534

#12977 adds a JDK into the GitHub Actions CI so there is always a "system" JDK available.

@tdyas tdyas changed the title [internal] jvm/java: ensure JDK downloaded and invoke java / javac directly [internal] jvm/java: ensure JDK downloaded in one process Sep 22, 2021
@tdyas tdyas merged commit f47dc6f into pantsbuild:main Sep 22, 2021
@tdyas tdyas deleted the jvm_process_helper branch September 22, 2021 16:29
],
input_digest=coursier.digest,
description="Invoke Coursier with system-jvm to fingerprint JVM version.",
cache_scope=ProcessCacheScope.PER_RESTART_SUCCESSFUL,
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed while rebasing #12982 that this line was lost: it's important when using the system JDK, because it isn't stable, and the fingerprint computation can't be cached. I doubt it's related to the current flakiness, but fyi.

tdyas pushed a commit that referenced this pull request Sep 23, 2021
As per #12972 (review), the `Process`es used to obtain information on the JDK should not be cached permanently especially for use of the system JVM. This was originally present in the code refactored by #12972 but was lost in a rebase.

[ci skip-rust]

[ci skip-build-wheels]
@wisechengyi wisechengyi mentioned this pull request Oct 2, 2021
stuhood pushed a commit that referenced this pull request Oct 2, 2021
* [internal] Run pyupgrade on src/python/pants/backend/python ([#13073](#13073))
* [internal] Re-enable some skipped JVM tests. ([#13074](#13074))
* [internal] Use `DownloadedExternalModules` when analyzing external Go packages ([#13076](#13076))
* [internal] Use `DownloadedExternalModules` during Go target generation ([#13070](#13070))
* [internal] Replace deprecated use of `[pytest] junit_xml_dir` with `[test] xml_dir. ([#13069](#13069))
* [internal] Add `DownloadedExternalModules` for Go ([#13068](#13068))
* [internal] Always use jars on the user classpath, and generalize transitive classpath building ([#13061](#13061))
* Add failing tests for Go external modules ([#13065](#13065))
* [internal] java: fix version in test ([#13064](#13064))
* [internal] Skip additional inference tests ([#13062](#13062))
* [internal] java: enable cycles for file-level targets generated by `java_sources` ([#13058](#13058))
* [internal] Add a `@logging` decorator for tests. ([#13060](#13060))
* [internal] Improve compatibility of nailgun with append only caches, and use them for Coursier ([#13046](#13046))
* [internal] Stop using `go.sum` when generating `_go_external_package` targets ([#13052](#13052))
* [internal] Rename `go_module` target to `go_mod` ([#13053](#13053))
* [internal] Refactor `go/util_rules/external_module.py` ([#13051](#13051))
* [internal] go: add analyzer and rules for test sources ([#13041](#13041))
* [Internal] Refactoring how we integrate with dockerfile ([#13027](#13027))
* [internal] Simplify `go/package_binary.py` ([#13045](#13045))
* [internal] Refactor `OwningGoMod` ([#13042](#13042))
* [internal] Refactor `go_mod.py` ([#13039](#13039))
* [internal] Record metadata on engine-aware params ([#13040](#13040))
* [internal] Test discovery of `go` binary ([#13038](#13038))
* [internal] Extract directory setup for terraform linters / formatters into a separate rule. ([#13037](#13037))
* [internal] java: register dependency inference rules ([#13035](#13035))
* [internal] Add `strutil.bullet_list()` to DRY formatting ([#13031](#13031))
* Minor cleanups for the autoflake linter / formatter. ([#13032](#13032))
* Ensure XML results recorded for both pytest and junit ([#13025](#13025))
* [internal] go: refactor compilation into separate rule ([#13019](#13019))
* [internal] go: refactor link step into separate rule ([#13022](#13022))
* [internal] go: enable plugin in repo and cleanup test project ([#13018](#13018))
* [internal] go: use colon to separate binary name and version ([#13020](#13020))
* [internal] tweak formatting of help text for sourcefile-validation subsystem. ([#13016](#13016))
* [internal] Use system-installed Go rather than installing via Pants ([#13007](#13007))
* Move the `process-execution-local-cleanup` hint to a more specific location. ([#13013](#13013))
* [internal] Split shell targets into atom vs generator ([#12957](#12957))
* Install Go in CI ([#13011](#13011))
* Refresh maintainers list. ([#13004](#13004))
* [internal] Refactor setup of GOROOT and `import_analysis.py` ([#13000](#13000))
* Infer dependencies on COPY of pex binaries for `docker_image`s. ([#12920](#12920))
* Prepare 2.7.0. ([#12995](#12995))
* [internal] jvm: skip JDK tests unless env var set ([#12994](#12994))
* [internal] jvm: limit caching of JDK setup processes ([#12992](#12992))
* [internal] Async-ify `NailgunPool::connect` and `nailgun::CommandRunner`. ([#12990](#12990))
* [internal] Replace `java_library` with `java_source` and `java_sources`, and add `java_test` ([#12976](#12976))
* Prepare 2.7.0rc5. ([#12987](#12987))
* [internal] terraform: refactor parser script into its own file ([#12985](#12985))
* [internal] jvm/java: ensure JDK downloaded in one process ([#12972](#12972))
* add JDK to GitHub Actions CI ([#12977](#12977))
* [internal] Re-enable the `clippy::used_underscore_binding` check. ([#12983](#12983))
* [internal] Use target generation for `_go_external_package` ([#12929](#12929))
* [internal] Bump CI token expiration threshold. ([#12974](#12974))
* [internal] Re-enable the Java backend. ([#12971](#12971))
* [internal] Implement `@union`s via `Query`s ([#12966](#12966))
* Remove `Enriched*Result` classes in favor of `EngineAwareReturnType.cacheable` ([#12970](#12970))
* [internal] Remove spurious `python_tests` directive ([#12968](#12968))
* [internal] Python coverage report generation uses precomputed addresses. ([#12965](#12965))
* Add PackageRootedDependencyMap for mapping inferred Java dependencies ([#12964](#12964))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants