Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use nailgun for javac and import parser #12982

Merged
merged 7 commits into from
Sep 24, 2021

Conversation

stuhood
Copy link
Sponsor Member

@stuhood stuhood commented Sep 22, 2021

Warm JVMs are necessary for reasonable performance in common tasks like compilation and import parsing. Although there are other options to achieve them (Bazel's gRPC-based worker protocol in particular), we have had good experiences using nailgun, and already have a client and nascent support.

This change refreshes the process_execution crate's support for nailgun, and then enables it by default. Individual processes opt-in to using nailgun by setting a second digest on Process which includes all inputs for the nailgun server.

  • Moves from a is_nailgunnable: bool to a use_nailgun: Digest field, which is used as the inputs-to and fingerprint-for the server. This hands the responsibility for computing the server-relevant subset of the entire input_digest to the Process author.
  • Moves the NailgunPool to a fingerprint-aware LRU, where we first attempt to find an idle server with a matching fingerprint, and then kill the least recently used idle server without a matching fingerprint. As mentioned in comments, the NailgunPool assumes it is used under a Semaphore (the BoundedCommandRunner), and will error if the pool is full and none are idle.
  • Adds a BorrowedNailgunProcess drop guard that ensures that the working directory of the server is cleaned up between runs, and kills the server on cancellation. Moves the workdir creation which used to be implemented by the CapturedWorkdir trait out into the local::CommandRunner, because the nailgun::CommandRunner must reuse the server's workdir, clearing it between runs.
  • Refactors JdkSetup to (mostly) encapsulate the selection of the JDK, and to add the nailgun classpath to args.

Fixes #12810, fixes #8481, and fixes #8527. Although nailgun servers will be torn down via Drop in most cases, ensuring that they die when pantsd is killed by SIGKILL is future work covered by #12996.

stuhood added a commit that referenced this pull request Sep 23, 2021
…er`. (#12990)

To prepare for #12982, swap the `process_execution::nailgun` module to async/await.

[ci skip-build-wheels]
…nner needs to use the nailgun directory.

[ci skip-build-wheels]
…dentify the JVM to use.

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
@stuhood
Copy link
Sponsor Member Author

stuhood commented Sep 24, 2021

Commits are overlapping, but somewhat useful to review independently.

Apologies for the larger review: the clearest way to provide test coverage for this code is to expose and use it in Python tests, so while landing the rust-only portion with only minimal testing would be an option, I thought that it made more sense to land the coverage here.

/// TODO: Currently this Digest must be a subset of the `input_digest`, but we should consider
/// making it disjoint, and then automatically merging it.
///
pub use_nailgun: Digest,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some trumping good ergonomic that outweighs the awkward ergonomic of having to pass EMPTY_DIGEST instead of None? An option seems more clear than the EMPTY_DIGEST ~magic value.

Copy link
Sponsor Member Author

@stuhood stuhood Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Python side this is a default argument, so users mostly don't need to worry about it. But we have in general used the "null object" pattern (with EMPTY_DIGEST as null) for digests, rather than making them optional on Process: see input_files, for example.

src/rust/engine/process_executor/src/main.rs Outdated Show resolved Hide resolved
if processes.len() >= self.size {
// Find the oldest idle non-matching process and remove it.
let idx = Self::find_lru_idle(&mut *processes)?.ok_or_else(|| {
// NB: See the method docs: the pool assumes that it is running under a semaphore, so this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any harm in in-fact using another semaphore that has size slots? I'm guessing its not a measurable perf hit but maybe it is? Even though this is not needed globally, having it locally would make the code make sense without comment.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really... my primary concern was really just that there are already two mutexes in here, and adding a third primitive would make things a little bit unwieldy. I think that if the consequence of forgetting to do this was a panic or silently incorrect behavior, then it would be necessary to include it here. But given that we can error instead, and we know how we are called in general, this seemed slightly easier.

[ci skip-build-wheels]
@stuhood stuhood enabled auto-merge (squash) September 24, 2021 18:19
@Eric-Arellano
Copy link
Contributor

enabled auto-merge (squash)

This is a pretty big PR - might be better to not auto-merge in case people don't realize it when hitting accept.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@stuhood
Copy link
Sponsor Member Author

stuhood commented Sep 24, 2021

enabled auto-merge (squash)

This is a pretty big PR - might be better to not auto-merge in case people don't realize it when hitting accept.

John already accepted, so this is just waiting on CI to go green.

version_result.stderr.decode("utf-8"),
version_result.stdout.decode("utf-8"),
]
/bin/ln -s "$({java_home_command})" "{JdkSetup.java_home}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this symlinking JAVA_HOME within the input root?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fingerprint_comment: str
nailgun_jar: str
jdk_preparation_script: ClassVar[str] = "__jdk.sh"
java_home: ClassVar[str] = "__java_home"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a constant?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClassVars are effectively constant unless we expect this to be subclassed. But the real answer is that I was just following the existing pattern in the JavacBinary class.

@@ -86,22 +90,23 @@ async def build_processors(bash: BashBinary, javac: JavacBinary) -> JavaParserCo
ProcessResult,
Process(
argv=[
bash.path,
javac.javac_wrapper_script,
*jdk_setup.args(bash, [f"{jdk_setup.java_home}/lib/tools.jar"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be invoking javac and not assuming existence of non-java. packages? I.e., what is someone uses a JVM that does not have com.sun.tools.javac.Main as an entrypoint?

Copy link
Sponsor Member Author

@stuhood stuhood Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nailgun is not aware of javac in particular, and requires the main class as the first argument to the server. And if this class isn't present, then it's likely because coursier has selected something that is only a JVM and not a JDK.

This class is a de-factor public API... I'm sure it will eventually be deprecated, but none of these packages have moved post-Oracle.

nailgun = await Get(
ResolvedClasspathEntry,
CoursierLockfileEntry(
coord=Coordinate.from_coord_str("com.martiansoftware:nailgun-server:0.9.1"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use the Coordinate constructor directly instead of from_coord_str?

@stuhood stuhood merged commit ee9a43b into pantsbuild:main Sep 24, 2021
@stuhood stuhood deleted the stuhood/nailgun-for-javac branch September 24, 2021 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants