Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move from manual javac to java_common.compile #164

Closed
ittaiz opened this issue Mar 9, 2017 · 34 comments
Closed

Move from manual javac to java_common.compile #164

ittaiz opened this issue Mar 9, 2017 · 34 comments

Comments

@ittaiz
Copy link
Member

ittaiz commented Mar 9, 2017

Hi,
I'm trying to tackle moving from a manual javac to java_common.compile as discussed in bazelbuild/bazel#970 (comment).
Should we just return two jars? If so any preference to the names? Do you think anyone counts on it being one jar? We can unzip and jar both of them back together but it's messy.
Other than that it looks pretty straightforward to me by just obliterating all javac references, calling java_common.compile at the end of _compile( and propagate the providers out

@johnynek
Copy link
Member

johnynek commented Mar 9, 2017

I think we will have re-architect the rules or have one output. We at least have to combine to make the _deploy.jar. But if we emit two jars then targets that consume this one will have to add both jars to their inputs (since the whole idea is to use this when there are circular dependencies).

We already have jar creation code, we can just make it work for jar concatenation without unjarring first. It's slightly more IO but I bet not st all the slow part compared to scalac.

@ittaiz
Copy link
Member Author

ittaiz commented Mar 9, 2017

the problem with joining the jars is that currently this happens in the worker and the java_common will happen at skylark level.
AFAIU we have two options:

  1. Do the jar concatenation in bash on skylark
  2. Change the worker to be able to get two types of requests and add a concat jars request type
  3. Call JarCreator (or a variant) plainly without a worker after java_common and start a new jvm.

I think #1 sounds least expensive performance wise since IINM this problem isn't jvm related but just archive manipulation.
WDYT?

@johnynek
Copy link
Member

johnynek commented Mar 9, 2017 via email

@ittaiz
Copy link
Member Author

ittaiz commented Mar 9, 2017 via email

@johnynek
Copy link
Member

johnynek commented Mar 9, 2017

I think that could be fine. Since it is still related to compiling scala.

So, we could add some optional args to the scalac worker to just act as a jar concatenator.

@ittaiz
Copy link
Member Author

ittaiz commented Mar 9, 2017 via email

@johnynek
Copy link
Member

johnynek commented Mar 9, 2017 via email

@sdtwigg
Copy link
Contributor

sdtwigg commented Mar 27, 2017

How far along are you with this approach?

Basically, I really need the java-scala sandwich so I was contemplating splitting this work in half. One step just uses java_common.create_provider on the current scalac (invoking javac) output and emits a java provider as well as/instead the current scala provider. It is this step that I am tempted to work on (basically immediately unless you are well-progressed on this part).

The other step is moving the javac use inside scalac to be java_common.compile with the merge step proposed here.

@johnynek
Copy link
Member

I don't think anyone has started, but I think everything in bazel 0.4.5 is ready to do what we need. I don't think the calling javac is the main issue, if you ask me. The current situation is not bad there in my view. The bigger issue is making scala libraries that java libraries can depend on, which again, 0.4.5 should support.

@ittaiz
Copy link
Member Author

ittaiz commented Mar 28, 2017 via email

@sdtwigg
Copy link
Contributor

sdtwigg commented Mar 28, 2017

What do you all think between scala_library emitting both a scala provider and java provider (and then preferring the scala_provider when available) versus just emitting a java provider.

Main thought is that the latter is simpler is simpler overall; however java_common.create_provider only has two inputs: compile and runtime jars. But, then the java provider can present transitive_deps, transitive_exports, transitive_runtime_deps, transitive_source_jars so need to trace how it builds all that information.

As a first pass, my plan was to do the compile based on the ijar from all dependencies. Then, the provider is created with compile jars = generated ijar (or real jar for macros) + transitive_exports only plus runtime jars = real jar + runtime jars of all deps.

PS: I may also dig through the Skylark/Bazel sources to get details on where it is actually storing this information from java_library. A cursory look indicates that some experimentation is likely needed. For example, transitive_exports looks like it is actually tracing up through rules that define "exports" as an attr. (See JavaExportsProvider.) Thus, it may be possible to skip adding transitive_exports to compile jars....

@johnynek
Copy link
Member

What do we need in a scala provider not in the java provider? It seems to me if we can just use the java provider we are dogfooding more, since all builds will exercise the same code path, not just java -> scala dependencies.

I assumed that the for the provider we pass only our direct compile and runtime jars and internally it builds up the transitive deps. The exports are an issue, though it sounds like. We really need to have exports support. source_jars don't make quite as much sense, since clearly we are not giving java sources, but we can create a scala source jar (though I don't know why one needs those).

Anyway, as a first pass, scala and java providers can be fine I think if we can't support exports. If we can't do what we want to do, we should probably file issues/send PRs to fix sooner than later.

@ittaiz
Copy link
Member Author

ittaiz commented Mar 29, 2017 via email

@johnynek
Copy link
Member

I don't think so. The provider is one item in the return struct. See here:

https://bazel.build/blog/2017/03/07/java-sandwich.html

@sdtwigg
Copy link
Contributor

sdtwigg commented Mar 30, 2017

We will need to retain the scala provider for now because java_common.create_provider (and accessing target[java_common.provider]) return a JavaProvider, which is essentially a blackbox that only internal rules and actions can make full use of. To custom Skylark rules, it only exposes some default fields and transitive_runtime_deps.

For reference, .java actually returns a JavaSkylarkApiProvider, which can only be constructed by native rules (and they usually do so in parallel with a JavaProvider).

I think JavaProvider is still under some churn since it contains a set of providers that are a subset of the providers in JavaSkylarkApiProvider. For example, exports are in JavaSkylarkApiProvider but not JavaProvider, so we will need to do the previously discussed faking of them. It would be nice if JavaProvider took a superset of the data JavaSkylarkApiProvider took and then any rule (built-in or Skylark) would automatically add a java=JavaSkylarkApiProvider to any other rule that has a JavaProvider declaredProvider.

PS: Amusingly, it looks like java.transitive_deps actually already includes java.transitive_exports even though java_library claims an export isn't necessarily included as a dep for that library's compilation.

PPS: Sorry this took longer than projected. Beyond standard work 'distractions', spent a bit of time on the jdk issue and doing a post-mortem. Also, wanted to spend some time sort-of-deeply looking into the bazel internals to see what certain fields were actually doing.

@sdtwigg
Copy link
Contributor

sdtwigg commented Mar 30, 2017

Upon deeper analysis of the java rules implementation in Bazel (as well as tests), we will be able to 'sandwich' for a deps|runtime_deps attribute in java_library, etc. but not for exports. This is because deps is configured to reject if both the dep rule type is not in a whitelist and the dep does not provide a JavaProvider. However, exports is configured to simply reject any dep not in the whitelist.

Also, the current use of transitive_deps for java in collect_jars is technically too permissive, since it is capturing that target (good), that target's transitive exports (good), and that target's transitive dependencies (bad). Unfortunately, transitive_exports just returns a set of labels (which are slightly fancy strings). It might be possible to get the transitive_exports as artifacts by taking transitive_deps and filtering out all artifacts whose owner (generating label) is not in transitive_exports. However, it is supposedly possible for owner to be 'blank' but maybe it is only blank for when it is just a file. Fortunately (as an amusing interaction of restrictions), exports is further restricted to ONLY be rule labels (whereas deps could be files like a deploy jar) so it is possible that one can assume anything in transitive_deps that is coming from an export must have been from a generating rule!

PS: Unsure where else to have this discussion but it is somewhat off topic. I am mostly trying to point out relevant issues for you all to consider/debate while I experiment with this.... Also, note that this is all subject to change as they refine the proposal in the sandwich blog. create_provider isn't documented yet so it is possible there will still be some iteration. (Just looking at the internals indicates there is a lot of potential for simplification/removal of redundancies....)

@ittaiz
Copy link
Member Author

ittaiz commented Mar 30, 2017 via email

@johnynek
Copy link
Member

johnynek commented Mar 30, 2017 via email

@ittaiz
Copy link
Member Author

ittaiz commented Mar 30, 2017 via email

@sdtwigg
Copy link
Contributor

sdtwigg commented Mar 30, 2017

On the plus side, after typing that out and doing the analysis both here and there, I have a much stronger idea for how to execute the scala build rules using JavaProvider....

@ittaiz
Copy link
Member Author

ittaiz commented Mar 30, 2017 via email

@sdtwigg
Copy link
Contributor

sdtwigg commented Mar 30, 2017

That sounds about right. The listing of items was targeted mainly at 3, although you will not yet be able to say a scala_library is an exports of a java_library.

@johnynek
Copy link
Member

johnynek commented Mar 30, 2017 via email

@sdtwigg
Copy link
Contributor

sdtwigg commented Apr 1, 2017

This is going well. As part of it, I have been trying to reforge the scala provider to look more like the java provider (partly as a projection towards the future and partly so I/you don't have to maintain a mental distinction). Basically, the attribute has 3 fields:

  • outputs = the struct of all the files the target output (actually not used anymore)
  • compile_jars = the set of all jars exported by the rule for use in dependent compiles (ijar + exports). This ends up being naturally recursive on exports, which is perfect.
  • transitive_runtime_jars = the set of all jars needed on the runtime classpath when running this jar (real jar + exports + runtime_deps + deps).

I am going to present (without proof but can be supplied if possible) that this structure very closely micmics how native.java_library uses its JavaProvider. (The fields in JavaSkylarkApiProvider are a red-herring and only made available to Skylark but not used by native rules.) A key takeaway is that java_library does not distinguish between jars from itself and jars from exports when sending stuff to the next rule.

The transition has been fine except in twitter_scrooge.bzl:
The gen_scrooge_srcjar rule would provide transitive_compile_exports as a union of transitive_compile_exports in its deps. However, those the deps (if coming from a scala_library) would only have a transitive_compile_exports from their own exports, not including its own ijar.

Since the new scala provider (and eventually JavaProvider) just has compile_jars, it doesn't differentiate between the ijar and jars added to compile from exports. Thus, the compile_jars from a gen_scrooge_srcjar now would include the ijars from its direct dependencies. Is this OK? (I am not totally familiar with what twitter_scrooge is doing and thus unclear with what makes sense to expose to its dependents). Note that runtime jar propagation will not change at all and still acts perfectly fine.

There are a couple of workarounds if old behavior is needed (up to including adding a separate field to the attr just for this case). Although, the old behavior did seem odd where it was both exporting its dependencies at all and then further only exporting parts (the dependencies exports).

@sdtwigg
Copy link
Contributor

sdtwigg commented Apr 1, 2017

As a quick amendment, I did some digging and found that scrooge_scala_srcjar is immediately used in the macro scrooge_scala_library. The scrooge_scala_library was then relying on the original deps and also the result of scrooge_scala_srcjar. In fact, the changes would allow us to drop the dependency on deps in the scala_library, since scrooge_scala_srcjar would already include them.

Regardless, after futher thinking (before the scrooge_scala_library discovery), there are many ways to restore the old behavior if desired. The fastest is to put exports into the scala provider: it would just be the union of all compile_jars from exports (and not be used by any other rules). A 'better' solution might be to exploit aspects to 'cheat' and allow scrooge_scala_srcjar to grab the exports from its direct dependencies. This would avoid needing a one-off field in the scala provider (and allow a java_library to work/provide exports when in deps as well).

@johnynek
Copy link
Member

johnynek commented Apr 4, 2017

Scrooge is really important (critical currently) for us at Stripe. That said, I am very happy to make is more idiomatic with java (and scala) rules in bazel if we can. There are many current things that are probably just implementation details.

As long as the tests pass, we are probably fine. Our use case is generally to make a thrift_library which is just a bundle of thrifts with some meta-data and then use the scrooge_scala_library to generate the compiled code.

In fact, srcjar support was added to the scala rules explicitly for scrooge.

@sdtwigg
Copy link
Contributor

sdtwigg commented Apr 5, 2017

@johnynek as long as you are using scrooge_scala_library and not relying on peculiar behaviors of specifically gen_scrooge_scala_srcjar, then you should see no difference as the scala_library of scrooge_scala_library was already masking all the weird things gen_scrooge_scala_srcjar was doing with the scala provider.

@sdtwigg
Copy link
Contributor

sdtwigg commented Apr 5, 2017

Please see sdtwigg@9c8f720
as a WIP commit for the proposal. The major structure is done and mostly just needs minor polish here and there.

That commit works with bazel test -- ... -test_expect_failure/.., and bash test_run.sh
However, not quite sending out for PR because I want to ensure the current tests are actually covering all the cases.

That said, if any of you want to test drive that commit on any projects to see if it breaks anything, go right ahead (as that may help accelerate my debugging). Keep in mind that with that commit a scala_library can be a dependency for deps and runtime_deps of java_library but have https://bazel-review.googlesource.com/#/c/9671/ out to get it to work on everything else.

@ittaiz
Copy link
Member Author

ittaiz commented Apr 5, 2017

awesome!
is this gerrit change the last one pending? I saw you had several CLs active

@sdtwigg
Copy link
Contributor

sdtwigg commented Apr 5, 2017 via email

@johnynek
Copy link
Member

johnynek commented Apr 6, 2017

Thanks for the update @sdtwigg.

@ittaiz
Copy link
Member Author

ittaiz commented Apr 30, 2017

Just to add another motivation to this issue apart from rule hygiene- I just spent a long time trying to debug a "problem" with rules_scala only to find that my simple java code didn't compile (was missing an import). Since intellij doesn't support scala I use sublime for my rules_scala development and I can miss some trivial things like a missing import. The current error messages are really not helpful and just say an error occurred with javac.
I'm guessing java_common.compile will handle this much better so that's my added motivation.

@ittaiz
Copy link
Member Author

ittaiz commented Aug 9, 2017

ok just hit on an actual blocking motivation- strict_scala_deps is unusable for us since javac fails with the trimmed classpath.
java_common.compile supports strict java deps.
I've started work on it and it's unclear if I have access to the jar outputs (regular and ijar) so I'll merge them.
I've posted a question on bazel-discuss.

@johnynek
Copy link
Member

johnynek commented Sep 2, 2017

closed by #269 reopen if not. Nice work @ittaiz !

@johnynek johnynek closed this as completed Sep 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants