Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize PathMapper construction #18098

Closed
wants to merge 7 commits into from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Apr 14, 2023

The new PathMappers helper class is introduced to be the canonical place for actions to create PathMapper instances from. As a result, the particular kind of PathMapper can now be switched out without having to change individual actions.

This requires a larger refactoring of how PathMapper instances are created and passed around, which results in a number of additional improvements:

  • PathMapper instances are now no longer retained in Action, which improves the memory footprint of path mapping. Instances are created only when an Action is converted into a Spawn for execution, with one exception: For testing purposes, aquery causes the creation of a PathMapper that is applied to the command lines it reports.
  • By exposing the active PathMapper via Spawn#getPathMapper, participating executors automatically use the same instance that was used to map paths in the action's command lines, which ensures consistency.
  • Actions now have a simple way to add a fingerprint for the given PathMapper instance, thus ensuring incremental correctness when the value of the --experimental_output_path_mode flag or the implementation of a particular PathMapper changes.
  • PathMappers are no longer passed into builders for structured command lines, but are supplied when expanding command lines. In addition to not retaining them, this also reduces the number of places in which participating actions have to pass in PathMappers.

Apart from completely pure refactorings, two more interesting changes were necessary to prevent PathMapper instances from being constructed during analysis:

  • Native actions are now also opted in by their mnemonic, rather than declaring their support in code. This makes it easier to experiment with path mapping and opens up the possibility to maintain the list of supported mnemonics in a flag, which can be done in a follow-up PR.
  • Instead of hard-coding stripping for execution-time arguments that use location expansion for exec paths (in this case, --javacopts), a new function on PathMapper now allows mapping arbitray string-valued vector args. This is only relevant for native rules as Starlark rules can perform location expansion manually and emit a structured command line. A helper function for this may be provided in the future.

@fmeum fmeum marked this pull request as ready for review April 14, 2023 13:16
@fmeum fmeum requested review from a team, lberki and oquenchil as code owners April 14, 2023 13:16
@fmeum fmeum requested review from aranguyen and removed request for a team April 14, 2023 13:16
@github-actions github-actions bot added awaiting-review PR is awaiting review from an assigned reviewer team-Configurability platforms, toolchains, cquery, select(), config transitions team-Performance Issues for Performance teams team-Rules-Java Issues for Java rules labels Apr 14, 2023
@fmeum
Copy link
Collaborator Author

fmeum commented Apr 14, 2023

@gregestren @aranguyen This is the promised refactoring. I couldn't find a way to split this into more manageable parts, but at least it removes quite a bit of code. :-)

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 14, 2023

Test failures are caused by bazelbuild/continuous-integration#1582 and not this PR.

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 14, 2023

I pushed an additional commit that makes the jdeps rewriting independent of the actual PathMapper implementation (i.e., not specific to stripping anymore). Otherwise the generalization to an arbitrary PathMapper isn't really accurate. Let me know if I should also update the method comment.

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 18, 2023

@justinhorvitz I noticed your recent work on removing fields from SpawnAction. This commit would replace a boolean with an Enum, which I assume is making the situation worse. I could get rid of the field entirely by exposing the value of a flag on ActionExecutionContext and ActionKeyContext. Would you prefer me to do that?

@justinhorvitz
Copy link
Contributor

Thanks for drawing the connection to my recent work. The boolean stripOutputPaths field is on my radar. I was considering pushing it down to subclasses but haven't decided yet. Switching it to an enum won't immediately result in any change in memory footprint since SpawnAction currently has 3 bytes free (the 1-byte boolean is padded to a 4-byte threshold). If it does need to be a tri-state going forward, you can proceed with the change as-is and I will consider how to best optimize with it from a memory prospective.

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 18, 2023

Thanks for drawing the connection to my recent work. The boolean stripOutputPaths field is on my radar. I was considering pushing it down to subclasses but haven't decided yet. Switching it to an enum won't immediately result in any change in memory footprint since SpawnAction currently has 3 bytes free (the 1-byte boolean is padded to a 4-byte threshold). If it does need to be a tri-state going forward, you can proceed with the change as-is and I will consider how to best optimize with it from a memory prospective.

Sounds good, then I will not make any further changes to this PR for now. As we probably want to reference another flag value in the future, at that point I would look into passing in the values rather than storing them in the action (the new value will be a list, so subclassing probably wouldn't help).

@fmeum fmeum force-pushed the extract-path-mapper branch 2 times, most recently from 02c4d87 to 9ec5aaf Compare April 20, 2023 18:10
@oquenchil oquenchil removed their request for review April 24, 2023 10:04
@gregestren gregestren removed team-Performance Issues for Performance teams team-Rules-Java Issues for Java rules labels May 1, 2023
Copy link
Contributor

@gregestren gregestren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big and interesting change.

I really like the idea that you're able to late-instantiate PathMapper now. I'm impressed you got that to work so cleanly.

The main thing I'm wary of is locking further down to a centralized registration fro which actions trigger this. I think we ultimately have to allow actions to declare their participation, and do so in their own code.

I don't mind an experimental flag that can optionally opt in actions. But I'd still like the pattern of a generic --output_paths=strip or --output_paths=map flag, where whatever actions are known to be compatible with that choose to opt themselves in (or maybe one day be the few that opt themselves out).

Can we still preserve that for the native actions?

ImmutableList.Builder<String> arguments = ImmutableList.builder();
for (CommandLineAndParamFileInfo pair : getCommandLines()) {
arguments.addAll(pair.commandLine.arguments(/*artifactExpander=*/ null, stripPaths));
arguments.addAll(pair.commandLine.arguments(/*artifactExpander=*/ null, PathMapper.NOOP));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no longer any path where allArguments() or its callers apply path stripping? What's the implication of this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overloads that do not accept a non-trivial ArtifactExpander are only used during analysis, not during execution (they couldn't possibly handle tree artifacts correctly). I think that path mapping should only apply to execution, otherwise it could break Starlark analysis tests or cause aquery output to vary based on the path stripping setting. That is why I removed the PathStripper arguments, which as an added benefit gets rid of a number of overloads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some spelunking shows me that alArguments(PathMapper) was first added to add aquery support, to support shell-based integration testing: 9ad0a73

That's the non-open sourced test we'e talked about. It relies on aquery output to reveal command lines and check them against path stripping. There's nothing secret in that test so I'm happy to share the contents (recall the only reason it's internal is because it runs on the internal executor).

It's quite possible we don't need to rely on aquery for this kind of test. Can you think of a good alternative, especially with all this change? (again, I know it's hard to evaluate without seeing the test itself).

If we don't have a quick alternative, I'd probably want to keep in the allArguments() overloads so we don't break the existing test coverage.

Or we could fast-track collaborating on better testing.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@fmeum fmeum May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to aquery could be to run a bazel clean in the beginning of the test and then use --subcommands. What do you think about that? Mid-term aquery won't work as PathMappers could use the hashes of input files to form paths, but aquery doesn't have access to this information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely recall --subcommands having problems seeing the same paths as the actual action. Maybe that's not an issue with your latest refactorings? I think this is another case of trying to follow which callers call which variations of arguments expansion.

We could also consider a Java integration test that intercepts the logic just before it goes to an actual executor. That might require a mock executor to work generically. But it could offer universal testing regardless of executor. Then use shell tests for black-box execution coverage, i.e "expect build B works and don't care how the logic is implemented".

What would the story for full Bazel test support look like? We've been leaning toward excluding local execution from path mapping because it's not fundamentally compatible with the feature. But some variations of local execution are compatible, and I believe your intention is that your code naturally works with those implementations? Or would it be easier to support in-test RBE calls than I imagine? Or should we make a generic mock executor? I worry a mock would be complicated since part of "build success" involves knowing Java, etc. compilation runs correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from details there's also process:

I'm not thrilled about it, but we could drop existing testing for the purpose of landing your refactoring. Then build new testing as a hopefully quick followup. Most of @aranguyen 's current testing is running standard CI with a custom Bazel that uses path mapping. She can still do that even if direct tests for this feature are disabled.

OutputPathsMode effectiveOutputPathsMode = getEffectiveOutputPathsMode(outputPathsMode,
mnemonic);
if (effectiveOutputPathsMode == OutputPathsMode.STRIP) {
fingerprint.addString(StrippingPathMapper.GUID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this GUID specifically? Isn't the PathMappers interface more general than just StrippingPathMapper?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more general, but in order to map output path modes to actual instances, we still need to reference them here. I could see adding another layer of indirection once we get more modes, but felt that this would have been too much at this time given that we only really have one.

@@ -112,8 +110,8 @@ public class SpawnAction extends AbstractAction implements CommandAction {

private final ResourceSetOrBuilder resourceSetOrBuilder;
private final ImmutableMap<String, String> executionInfo;
private final Consumer<Pair<ActionExecutionContext, List<SpawnResult>>> resultConsumer;
private final boolean stripOutputPaths;
private final ResultConsumer resultConsumer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this refactoring for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consumer now has to accept a third argument, the PathMapper. There is no ready-made Triple interface, so I crafted my own @FunctionalInterface - that's all there is to this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will go away with the resolution of merge conflicts as @justinhorvitz has gone rid of it.

* <p>Since we don't currently have a proper Starlark API for this, we hard-code support for
* specific actions we're interested in.
*
* <p>The difference between this and {@link PathMapper#mapCustomStarlarkArgs} is this triggers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still relevant or no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially yes. Now that the opt-in logic has moved to PathMapper, it is in the same place as the logic that maps custom Starlark args, which may make this comment less necessary.

args.accept(CommandLineItem.expandToCommandLine(object));
}
};
final boolean isJavaAction =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for the purpose of handling --javacopts, right?

I realize there are some specific action references in this file. Over time I think we want to aim for eliminating such references in preference for embedding them in the action definition itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think that this one will remain the only one that we need to handle specially: Starlarkified native rules already use custom location expansion logic that in large parts doesn't go through ctx.expand_locations. I am working on a pure Starlark alternative that would emit an Args objects rather than strings, which would allow the regular structured command line path mapping to apply here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment to clarify that?

@gregestren
Copy link
Contributor

Oh, this is big enough where I'm unsure of the testing story.

We have some integration tests in Google, which are Google-internal because they're hooked up to the Google executor. When we're comfortable with this we should run this PR by them to check for bugs.

We don't have any testing for just the command-line rewriting (independent of executor), do we? This could be a good opportunity to add some. Would you be interested? I can work with you on that too.

@fmeum
Copy link
Collaborator Author

fmeum commented May 6, 2023

I am interested in adding tests. My follow-up PR includes integration tests for path stripping in Bazel, but I agree that more fine-grained tests would be helpful too. Do you already have a concept for such a test in mind?

@github-actions github-actions bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Oct 3, 2023
@fmeum fmeum deleted the extract-path-mapper branch October 3, 2023 21:20
copybara-service bot pushed a commit that referenced this pull request Oct 3, 2023
- Put path stripping assertion logic into a helper library.
- Open-source Java and in-memory jdeps tests
- Don't yet open-source Android tests (takes more setup to work in Bazel).

All Bazel tests disabled until #18098 (and followup) is merged. That's because we need a compatible executor for path mapping to work.

Partially fixes #19525.

PiperOrigin-RevId: 570518512
Change-Id: Ic8ee9718b93150df3cb071d9e5e977ca6244f382
fmeum added a commit to fmeum/bazel that referenced this pull request Oct 4, 2023
This provides a simple and backwards compatible way for actions to
experiment with path mapping:
* rules authors can add `supports-path-mapping` to the
  `execution_requirements` of a Starlark action
* end users can use tags and `--modify_execution_info` to opt individual
  targets or actions in and out of path mapping.

Also includes StrippingPathMapperTest, which wasn't included in the
merge of bazelbuild#18098, with an additional test for a Starlark rule that opts
in via the mechanisms described above.
fmeum added a commit to fmeum/bazel that referenced this pull request Oct 4, 2023
This provides a simple and backwards compatible way for actions to
experiment with path mapping:
* rules authors can add `supports-path-mapping` to the
  `execution_requirements` of a Starlark action
* end users can use tags and `--modify_execution_info` to opt individual
  targets or actions in and out of path mapping.

Also includes StrippingPathMapperTest, which wasn't included in the
merge of bazelbuild#18098, with an additional test for a Starlark rule that opts
in via the mechanisms described above.
fmeum added a commit to fmeum/bazel that referenced this pull request Oct 4, 2023
This provides a simple and backwards compatible way for actions to
experiment with path mapping:
* rules authors can add `supports-path-mapping` to the
  `execution_requirements` of a Starlark action
* end users can use tags and `--modify_execution_info` to opt individual
  targets or actions in and out of path mapping.

Also includes StrippingPathMapperTest, which wasn't included in the
merge of bazelbuild#18098, with an additional test for a Starlark rule that opts
in via the mechanisms described above.
fmeum added a commit to fmeum/bazel that referenced this pull request Oct 4, 2023
This provides a simple and backwards compatible way for actions to
experiment with path mapping:
* rules authors can add `supports-path-mapping` to the
  `execution_requirements` of a Starlark action
* end users can use tags and `--modify_execution_info` to opt individual
  targets or actions in and out of path mapping.

Also includes StrippingPathMapperTest, which wasn't included in the
merge of bazelbuild#18098, with an additional test for a Starlark rule that opts
in via the mechanisms described above.
fmeum added a commit to fmeum/bazel that referenced this pull request Oct 5, 2023
This provides a simple and backwards compatible way for actions to
experiment with path mapping:
* rules authors can add `supports-path-mapping` to the
  `execution_requirements` of a Starlark action
* end users can use tags and `--modify_execution_info` to opt individual
  targets or actions in and out of path mapping.

Also includes StrippingPathMapperTest, which wasn't included in the
merge of bazelbuild#18098, with an additional test for a Starlark rule that opts
in via the mechanisms described above.
@aranguyen
Copy link
Contributor

@fmeum I observed another interesting case where --experimental_output_path=strip but effective pathmapping is off due to this check [here]

. The input list of the expanded commandline for this case seem to have additional input compared to when --experimental_output_path=off. Any idea on why this could be the case. I'm seeing this for Javac action

@fmeum
Copy link
Collaborator Author

fmeum commented Oct 9, 2023

The input list of the expanded commandline for this case seem to have additional input compared to when --experimental_output_path=off.

Could you explain this in more detail? Does the commandline differ between --experimental_output_path=off and --experimental_output_path=strip but path stripping effectively disabled? Could you share an example of inputs that show up in one case but not the other?

copybara-service bot pushed a commit that referenced this pull request Oct 9, 2023
This provides a simple and backwards compatible way for actions to experiment with path mapping:
* rules authors can add `supports-path-mapping` to the `execution_requirements` of a Starlark action
* end users can use tags and `--modify_execution_info` to opt individual targets or actions in and out of path mapping.

Also includes `StrippingPathMapperTest`, which wasn't included in the merge of #18098, with an additional test for a Starlark rule that opts in via the mechanisms described above.

Closes #19526
Work towards #6526

Closes #19717.

PiperOrigin-RevId: 571934863
Change-Id: Ia94101d10d07599c20a42c958417349bb37d4682
@aranguyen
Copy link
Contributor

aranguyen commented Oct 10, 2023

The input list of the expanded commandline for this case seem to have additional input compared to when --experimental_output_path=off.

Could you explain this in more detail? Does the commandline differ between --experimental_output_path=off and --experimental_output_path=strip but path stripping effectively disabled? Could you share an example of inputs that show up in one case but not the other?

Yes that is the exact situation. Here is an example comparison of the commandlines between the twos builds with binaries having --experimental_output_paths=off and --experimental_output_paths=strip . Notice that the arguments list is the same but the inputs list for --experimental_output_paths=strip + effective path mapping is OFF has additional 14 arguments.

Screenshot diffs commandlines

This particular action failed during reduced classpath which prompted me to look at the replies from the internal executor and they came out looking different, causing readFullOutputJdeps to look different. And this file is being consumed elsewhere so its consumer failed causing the build to fail. Just for the sake of comparison, would you be able to try to replicate the scenario - enable path stripping with an example that would cause this check here to set PathMapper.NOOP and use the external executor? I'm open to do a video call so we can collaborate if you think that's better. My schedule is open tomorrow from (8:30 AM EST)

@fmeum
Copy link
Collaborator Author

fmeum commented Oct 10, 2023

@aranguyen I added test coverage for this case to #18155 (both in config_stripped_outputs_test.sh and path_mapping_test.sh), but couldn't reproduce the problem. I also looked through the call sites and couldn't find one that would result in a change to the number of action inputs. Could this be something specific to the Google executor?

I am available at 8:30 AM EST and later, just send an invite and we can debug this together.

@aranguyen
Copy link
Contributor

@fmeum thanks. Let me take a look at your test in #18155 . It's very possible that this is something internally as you mentioned because the PathMapper in the spawn is also used later to prepare the input mappings for the executor. Let me tripple check the internal side of thing and will report back. If not already done so, would you be able to see if the list of Deps.dependencies is same for both cases here?

So let's hold on the video chat while I verify things internally first.

@fmeum
Copy link
Collaborator Author

fmeum commented Oct 10, 2023

I pushed a new version of the latest commit that ensures that the library with effective path mapping off has a non-trivial jdeps proto and verifies that the proto contains the unmapped path. It does pass for Bazel.

@aranguyen
Copy link
Contributor

@fmeum I finally understand what is going on. Here is the summary:

  • For each of the action, there is a corresponding jdeps file created. Actions that are qualified for path mapping will have mapped path in the jdeps file.
  • In the build I'm seeing with --experimental_output_paths=strip, there are multiple Java actions and some of these are not qualified for path stripping. So when reducedClassPath is calculated here, if one or more dependencyArtifacts (jdeps) file were created with path mapping, I see a mixture of mapped and unmapped paths for the list of directJars
  • And because of this filter here, the reducedJars came out with missing path.
  • And the reason why you're not seeing failures externally is because when reducedClassPath fails, the fallback to use fullClassPath should work but for the target I'm testing, it's failling with fullClassPath at HEAD. Otherwise, we wouldn't see this error at all.
  • I'll work on a fix tomorrow to make it so that getReducedClassPath do not make assumption about the structure of paths so we can take advantage of the reducedClassPath optimization for path mapping as well.

@fmeum
Copy link
Collaborator Author

fmeum commented Oct 13, 2023

Thanks for the investigation and detailed explanation, I think I now understand the problem.

In the build I'm seeing with --experimental_output_paths=strip, there are multiple Java actions and some of these are not qualified for path stripping. So when reducedClassPath is calculated here, if one or more dependencyArtifacts (jdeps) file were created with path mapping, I see a mixture of mapped and unmapped paths for the list of directJars

Could we fix this type of issue by making sure that .jdeps files are always written with the original, unmapped paths, even if the action producing them uses path mapping? Then consuming actions wouldn't need to guess whether paths in the protos were mapped or not.

I added some debugging output to JavaCompileAction and printed the contents of all .jdeps files in test_inmemory_jdeps_support. I noticed that liba.jdeps, which is subject to a fallback to full classpath, still contains unmapped paths. I will investigate why and see whether that can be fixed directly. If so, it may end up fixing the issue you are seeing.

@aranguyen
Copy link
Contributor

Could we fix this type of issue by making sure that .jdeps files are always written with the original, unmapped paths, even if the action producing them uses path mapping? Then consuming actions wouldn't need to guess whether paths in the protos were mapped or not.

We do have this code here to map paths to the original but I am not how it ties with JavaContextCompileAction.getDependencies . The method interacts with the cache which get the dependenciesArtifacts for getReducedClassPath

I added some debugging output to JavaCompileAction and printed the contents of all .jdeps files in test_inmemory_jdeps_support. I noticed that liba.jdeps, which is subject to a fallback to full classpath, still contains unmapped paths. I will investigate why and see whether that can be fixed directly. If so, it may end up fixing the issue you are seeing.

sgtm. Thank you and I'll update on my side as well.

@fmeum
Copy link
Collaborator Author

fmeum commented Oct 13, 2023

@aranguyen I needed to add a4e1d7b#diff-738efdd95b5eedf99d16e391201cb00e284b0ccf3542506f6c88cb01ffb96830 (a new commit in #18155) to ensure that all .jdeps files in the test contained only unmapped paths. However, this change only affects full .jdeps files, which I thought would never be used as compilation inputs. But if it isn't too much effort, you could still try whether this fixes the issue you are seeing.

@aranguyen
Copy link
Contributor

@aranguyen I needed to add a4e1d7b#diff-738efdd95b5eedf99d16e391201cb00e284b0ccf3542506f6c88cb01ffb96830 (a new commit in #18155) to ensure that all .jdeps files in the test contained only unmapped paths. However, this change only affects full .jdeps files, which I thought would never be used as compilation inputs. But if it isn't too much effort, you could still try whether this fixes the issue you are seeing.

@fmeum What is this additional readFullOutputDeps do in comparison with readFullOutputDeps that was called earlier in the same method. I quickly tried your fix but I still see mixed path in the list of directJars

@fmeum
Copy link
Collaborator Author

fmeum commented Oct 13, 2023

@aranguyen The earlier call only applies if the reduced classpath spawn succeeds. If it doesn't, the output deps proto is overwritten with the result of the full spawn, which wasn't rewritten correctly without this change.

@aranguyen
Copy link
Contributor

@aranguyen The earlier call only applies if the reduced classpath spawn succeeds. If it doesn't, the output deps proto is overwritten with the result of the full spawn, which wasn't rewritten correctly without this change.

I was able to resolve the issue by making the logic in getReduceClasspath to not be sensitive to the mixed paths. Like I said in earlier comment. I still see mixed paths for full class path case.

* Strips the configuration prefix from an output artifact's exec path.
*/
private static PathFragment strip(PathFragment execPath) {
return execPath.subFragment(0, 1).getRelative(execPath.subFragment(2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified this to

private static PathFragment strip(PathFragment execPath) {
    String config = "/" + execPath.subFragment(1, 2).getPathString();
    String mappedPath = execPath.getPathString().replace(config, "");
    return PathFragment.create(mappedPath);
 }

to take care of paths in the form blaze-out/k8-fasbuilds/bin/foo/blaze-out/k8-fasbuilds/genfiles/bar. The existing one will make the path look like blaze-out/bin/foo/blaze-out/k8-fasbuilds/genfiles/bar . Let me know if you have any concern

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concern, but I would be very interested in learning where paths like this show up.

// and "-", followed by another slash. This would miss substrings like
// "bazel-out/k8-fastbuild". But those don't represent actual outputs (all outputs would have
// to have names beneath that path). So we're not trying to replace those.
return Pattern.compile(outputRoot + "/[\\w_-]+/");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm updating this to return Pattern.compile(outputRoot + "/[\\w._-]+/"); to account for

Screenshot 2023-10-13 at 9 53 27 PM

@gregestren
Copy link
Contributor

gregestren commented Oct 17, 2023

I think we've clarified pretty well what's going on with the .jdeps concern.

We found out that Google's Java header compiler runs some actions that where the outputs can embed the paths of the inputs. It unfortunately does this in a way that's really hard to capture and post-process on the Bazel side: JavaCompileAction.createFullOutputDeps isn't powerful enough to capture it (the mapping can be present in paths that aren't part of the action's direct inputs, but rather its transitive inputs).

With the failing instance I'm aware of, it turns out it was triggering

// TODO(bazel-team): don't fail on duplicate inputs, i.e. when the same exact exec path
// (including config prefix) is included twice.

It disabled path mapping for that Java compilation when it wasn't actually necessary to disable it.

The whole failure mode is pretty obscure. I think we can heavily minimize it's likelihood by

a) fixing the above TODO (if all Java actions are consistently mapped this problem doesn't occur)
b) applying Ara's change to make deps input filtering more resilient

The only really problematic use case is when we have Java actions of the form: javac bazel-out/arm-fastbuild/mypkg/foo.jar bazel-out/x86-opt/mypkg/foo.jar, which I suspect is rare.

@fmeum
Copy link
Collaborator Author

fmeum commented Oct 18, 2023

I looked into this a bit more and finally understood the root cause better. The Turbine behavior is pretty well explained in google/turbine@f9f2dec: The paths of transitive dependencies are embedded in the header jars so that compiling against the direct deps is sufficient absent annotation processors.

I was able to reproduce failures due to this by compiling Bazel itself with path mapping enabled and stricter checks in JavaCompileAction.createFullOutputDeps (one example is the -hjar.jar of //src/main/java/com/google/devtools/build/lib/skyframe:workspace_info_receiver).

I think that this can be fixed by taking the additional classpath entries into account even if they aren't inputs to the action. I submitted a fix in #19872 that allows me to compile Bazel itself without issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Configurability platforms, toolchains, cquery, select(), config transitions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants