Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running multiple nailguns for different tools in parallel causes problems in v1 tasks #7089

Closed
baroquebobcat opened this issue Jan 16, 2019 · 2 comments

Comments

@baroquebobcat
Copy link
Contributor

baroquebobcat commented Jan 16, 2019

When a task uses threads to run multiple nailgunned tools, if two different tools have overlapping runs, pants will kill the existing process because the tool fingerprint doesn't match.

This produces behavior like the below.

A and B are nailgunned tools

T1, A is running
T2, task needs to run B, finds A and kills its process
T3, B starts

T1  T2 T3
A --X
       B->

At time T2, whatever job was running A dies

The current workaround is to ensure that only one nailgun tool is run at a time. It would be better to find a way to allow multiple to run at the same time though.

cosmicexplorer added a commit that referenced this issue Jan 18, 2019
…classpaths (#7092)

*Resolves #7089.*

### Problem

`RscCompile` is the one task in pants which invokes multiple JVM tools over the course of its run, as a consequence of [using outlining to generate semantic information with rsc which zinc can compile against](https://github.com/twitter/rsc/blob/master/docs/compiler.md#typechecking-in-rsc-mid-2018). This doesn't play nice with our `NailgunExecutor`, and causes error messages like the following when `--worker-count` > 1 (see #7089):

```
metacp(jdk) failed: Problem launching via NailgunClient(host=u'127.0.0.1', port=55511, workdir=u'/Users/dmcclanahan/tools/pants') command scala.meta.cli.Metacp --verbose --out .pants.d/tmp/tmpX8iJid.pants.d/compile/rsc/a04416cba788/--jdk--/index /Library/Java/JavaVirtualMachines/TwitterJDK/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/TwitterJDK/Contents/Home/lib/dt.jar:/Library/Java/JavaVirtualMachines/TwitterJDK/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/TwitterJDK/Contents/Home/lib/tools.jar: (u'Problem talking to nailgun server (address: 127.0.0.1:55511, remote_pid=<remote PID chunk not yet received!>, remote_pgrp=<remote PGRP chunk not yet received!>): TruncatedHeaderError(u"Failed to read nailgun chunk header (TruncatedRead(u\'Expected 5 bytes before socket shutdown, instead received 0\',)).",)', TruncatedHeaderError(u"Failed to read nailgun chunk header (TruncatedRead(u'Expected 5 bytes before socket shutdown, instead received 0',)).",))
                     E   	                   Traceback:
                     E   	                     File "/Users/dmcclanahan/tools/pants/src/python/pants/backend/jvm/tasks/jvm_compile/execution_graph.py", line 276, in worker
                     E   	                       work()
                     E   	                   
                     E   	                     File "/Users/dmcclanahan/tools/pants/src/python/pants/backend/jvm/tasks/jvm_compile/execution_graph.py", line 44, in __call__
                     E   	                       self.fn()
                     E   	                   
                     E   	                     File "/Users/dmcclanahan/tools/pants/src/python/pants/backend/jvm/tasks/jvm_compile/rsc/rsc_compile.py", line 339, in work_for_vts_rsc_jdk
                     E   	                       output_dir=rsc_index_dir)
                     E   	                   
                     E   	                     File "/Users/dmcclanahan/tools/pants/src/python/pants/backend/jvm/tasks/jvm_compile/rsc/rsc_compile.py", line 819, in _runtool
                     E   	                       dist=distribution
                     E   	                   
                     E   	                     File "/Users/dmcclanahan/tools/pants/src/python/pants/backend/jvm/tasks/nailgun_task.py", line 111, in runjava
                     E   	                       raise TaskError(e)
                     E   	                   [info] Compiling 1 Java source to /Users/dmcclanahan/tools/pants/.pants.d/tmp/tmpX8iJid.pants.d/compile/rsc/a04416cba788/examples.src.java.org.pantsbuild.example.hello.greet.greet/current/zinc/classes ...
                     E   	                       [info] Done compiling.
                     E   	                       [info] Compile success at Jan 16, 2019 6:36:30 PM [2.258s]
                     E   	                       
                     E   	FAILURE: Compilation failure: Failed jobs: metacp(jdk)
```

### Solution

- Introduce `JvmToolMixin.register_combined_jvm_tools()` as an API for accessing specific JVM tools as a combined classpath (a suggestion from @xeno-by) but different main classes, allowing the use of a single nailgun instance (resolving #7089).
- Introduce `NailgunTaskBase#do_for_execution_strategy_variant()` which allows specifying different actions to perform for different values of the `--execution-strategy` option in a structured way.
- Use the above to add a code path for a nailgun execution for `RscCompile`, and add testing for `--worker-count` > 1.
- Introduce `ZincCompile#get_zinc_compiler_classpath()` to allow `RscCompile` to override it with the combined classpath to persist the nailgun.
- Remove the mysterious `or self.cmd != self._distribution.java` in `nailgun_executor.py` -- this is necessary for this PR to work, but an option can be plumbed in if it breaks pantsd (this might motivate #6579).

### Result

`RscCompile` can be invoked with > 1 nailgun instance at a time by bundling all the tool jars into a single classpath, without affecting the performance for any of the other execution strategies.
@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Jan 18, 2019

I was a bit overeager in marking this resolved in #7092 -- the correct solution ideally works for any NailgunTask, and the right solution isn't just to bundle the jars together like we do there, sorry.

@Eric-Arellano
Copy link
Contributor

Closing as unlikely to fix given the active focus on V2, rather than V1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants