Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically map paths in Args#map_each #19723

Closed
wants to merge 2 commits into from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Oct 4, 2023

When path mapping is enabled, File objects accessed in a user-supplied callback passed to Args#map_each automatically have their paths mapped.

Automatic rewriting is preferable to e.g. passing a rewriter object into the callback: All paths emitted into command lines must be rewritten with path mapping as inputs and outputs are staged at the mapped locations, so the user would need to manually map all paths - there is no choice. As an added benefit, the automatic rewriting ensures that existing rules relying on map_each do not need to be modified to use path mapping.

As part of this change, PathMapper has to become an abstract class as interfaces can't override equals and hashCode.

Work towards #6526

@fmeum
Copy link
Collaborator Author

fmeum commented Oct 4, 2023

Stacked on #19717

@fmeum fmeum force-pushed the starlark-path-mapping branch 2 times, most recently from 08a4169 to a801bf7 Compare October 4, 2023 11:29
@fmeum fmeum marked this pull request as ready for review October 4, 2023 11:29
@fmeum fmeum requested review from a team and lberki as code owners October 4, 2023 11:29
@fmeum fmeum requested review from gregestren and aranguyen and removed request for a team and lberki October 4, 2023 11:29
@github-actions github-actions bot added awaiting-review PR is awaiting review from an assigned reviewer team-Configurability platforms, toolchains, cquery, select(), config transitions team-Performance Issues for Performance teams team-Rules-Java Issues for Java rules labels Oct 4, 2023
@fmeum fmeum requested a review from lberki October 4, 2023 17:58
@fmeum
Copy link
Collaborator Author

fmeum commented Oct 4, 2023

@lberki Could you review the second commit? This is what we talked about at BazelCon 2022 (I know, it's been a while :-) ). I settled on an approach where every access to relevant File fields is remapped automatically, see the PR description for the reasoning. Briefly: users have a single choice and we know what that choice should be.

// It would *not* be correct to just apply PathMapper#map to the exec path of the root: The
// root part of the mapped exec path of this artifact may depend on its complete exec path as
// well as on e.g. the digest of the artifact.
PathFragment mappedExecPath = PathMapper.loadFrom(semantics).map(execPath);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lberki Would you expect the load from semantics to matter CPU-wise? If so, I could implement two versions of the function that are alternated between based on a semantics flag.

Copy link
Contributor

@lberki lberki Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My professional opinion is that I don't know. What I think is that this shouldn't matter much, because it's not called in a very tight loop and thus an extra map lookup, say, per dependency edge shouldn't matter much. When in doubt, I'd rather err on the side of simple code.

@lberki
Copy link
Contributor

lberki commented Oct 5, 2023

New thread, who dis? :)

More seriously, I haven't been following the story of PathMapper very closely, but it looks like at the general idea that .path should return the mapped path instead of the non-mapped path when called from map_each is the least of all the bad approaches:

  1. Adding a mapped_path method to File would require whoever creates mappable actions to change their code to work properly
  2. Adding a new argument to map_each, likewise, although one could argue that the intersection of "uses map_each" and "wants path mapping" is small enough that maybe we could get away with incurring some extra work

If this idea proves to be bad, reverting it and going with (1) or (2) would cost the same as doing them right now, wouldn't it?

.path returning different things based on the context it's called in does look like a smell, though. I guess it comes down to whether we think path mappers should be exposed to Starlark. AFIAU the current goal is that they be as transparent as possible, which is what this change does.

@fmeum
Copy link
Collaborator Author

fmeum commented Oct 5, 2023

If this idea proves to be bad, reverting it and going with (1) or (2) would cost the same as doing them right now, wouldn't it?

Yes, I think so :-) Plus only real-world usage will tell which approach is ultimately better.

Adding a new argument to map_each, likewise, although one could argue that the intersection of "uses map_each" and "wants path mapping" is small enough that maybe we could get away with incurring some extra work

I would argue that the intersection is about as large as the "wants path mapping" set itself: If you want path mapping for a realistically complicated action, you will have to use map_each to get the more complicated arguments path. Without path mapping, you can always flatten the depset and manually construct whatever arguments you want, but with path mapping this doesn't work because e.g. "--resource={}:{}".format(file.short_path, file.path) won't be subject to mapping.

@fmeum fmeum force-pushed the starlark-path-mapping branch 2 times, most recently from fabfd8c to dbedc66 Compare October 9, 2023 20:07
@fmeum
Copy link
Collaborator Author

fmeum commented Oct 9, 2023

@gregestren @aranguyen This is no longer stacked and ready for review.

@fmeum fmeum force-pushed the starlark-path-mapping branch 2 times, most recently from 9cfbb95 to 670f1b3 Compare October 9, 2023 20:26
@aranguyen
Copy link
Contributor

aranguyen commented Oct 10, 2023

@fmeum Speaking of priority for our presentation, since #19717 is merged and considering your comment here #19717 (comment), do we need this merged in order for us to do some demo with rules go? If not, I do want to spend a bit more time on #18098 (comment)

@comius comius removed the team-Rules-Java Issues for Java rules label Oct 10, 2023
@fmeum
Copy link
Collaborator Author

fmeum commented Oct 10, 2023

Yeah, let's solve the issue that came up over at #18098 first. The current PR is required to proclaim general Starlark support and invite ruleset authors to pick it up, which would be pretty nice, but isn't a hard requirement as some Starlark actions will still qualify.

@aranguyen
Copy link
Contributor

aranguyen commented Nov 14, 2023

Not sure where I previously mentioned issue regarding paths get mapped multiple times which results in wrong path with missing /bin or /genfiles. I tried to build the same failing target with this PR hoping it would resolve the issue but it's still failing. Changing this pattern here to Pattern.compile(outputRoot + "/[\\w_-]+[^bin|genfiles]/") to make sure that mapped path doesn't get replaced again seems to work for me with pure luck because I don't think i'm expressing this correctly. Any idea how to express not bin / not genfiles properly?

@fmeum
Copy link
Collaborator Author

fmeum commented Nov 15, 2023

I think I can resolve this by introducing the fixed cfg segment. I will work on a separate PR for that.

@aranguyen
Copy link
Contributor

I think I can resolve this by introducing the fixed cfg segment. I will work on a separate PR for that.

by fixed cfg, do you mean instead of removing the configuration, we decide on a fixed string to replace it with?

@fmeum
Copy link
Collaborator Author

fmeum commented Nov 15, 2023

@aranguyen Yes, exactly. in addition to improved compatibility, this also gives us a way to more easily
recognize mapped paths.

@fmeum fmeum force-pushed the starlark-path-mapping branch 2 times, most recently from 3bcf76c to fb72613 Compare January 10, 2024 06:56
@fmeum fmeum requested a review from a team as a code owner January 10, 2024 06:56
@fmeum
Copy link
Collaborator Author

fmeum commented Mar 5, 2024

@aranguyen I talked to Ivo offline and learned that lazy depset transformations may not be coming soon after all.

Instead, I would add a ctx.actions.path_mapping_manifest function that registers a action to create a File containing the mapping from mapped paths back to original paths. This would allow e.g. rules_kotlin to fix up its jdeps files after the actual compilation action, similar to what JavaCompilationAction does in native rule code.

This can wait for a separate PR though.

When path mapping is enabled, `File` objects accessed in a
user-supplied callback passed to `Args#map_each` automatically have
their paths mapped.

Automatic rewriting is preferable to e.g. passing a rewriter object into
the callback: All paths emitted into command lines must be rewritten
with path mapping as inputs and outputs are staged at the mapped
locations, so the user would need to manually map all paths - there is
no choice. As an added benefit, the automatic rewriting ensures that
existing rules relying on `map_each` do not need to be modified to use
path mapping.
@aranguyen
Copy link
Contributor

@fmeum I fixed the merge conflict internally and will send this out for internal review soon.

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 4, 2024

@aranguyen It looks like this broke the GitHub checks. Do you want me to fix them here?

@copybara-service copybara-service bot closed this in 955b31e Apr 4, 2024
@github-actions github-actions bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Apr 4, 2024
@fmeum fmeum deleted the starlark-path-mapping branch April 4, 2024 21:43
@fmeum
Copy link
Collaborator Author

fmeum commented Apr 4, 2024

@bazel-io fork 7.2.0

Kila2 pushed a commit to Kila2/bazel that referenced this pull request May 13, 2024
When path mapping is enabled, `File` objects accessed in a user-supplied callback passed to `Args#map_each` automatically have their paths mapped.

Automatic rewriting is preferable to e.g. passing a rewriter object into the callback: All paths emitted into command lines must be rewritten with path mapping as inputs and outputs are staged at the mapped locations, so the user would need to manually map all paths - there is no choice. As an added benefit, the automatic rewriting ensures that existing rules relying on `map_each` do not need to be modified to use path mapping.

As part of this change, `PathMapper` has to become an abstract class as interfaces can't override `equals` and `hashCode`.

Work towards bazelbuild#6526

Closes bazelbuild#19723.

PiperOrigin-RevId: 621972618
Change-Id: If76e5beee86a354aa97cb3000315b75b2ed24028
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Configurability platforms, toolchains, cquery, select(), config transitions team-Performance Issues for Performance teams
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants