Add use_default_shell_env = True to all ctx.actions.run_shell rules #44549

Flamefire · 2020-11-03T11:52:21Z

To start subprograms, even simple bash snippets, Bazel uses an
executable process-wrapper which is potentially built using a custom
toolchain and hence requires a set up LD_LIBRARY_PATH.
Ommitting the use_default_shell_env (defaulting to false) clears the
whole environment and the binary may try to use older system libs such
as /lib64/libstdc++.so causing it to fail in case it is (much) older
than the used libstdc++ from the custom toolchain which is very common
in HPC environments.
Hence I added use_default_shell_env = True as already done in e.g.
_local_genrule_impl.

Flamefire · 2020-11-05T16:31:11Z

For some reason I was also seeing the environment dropped from a flatc --python invocation which happens through flatbuffer_py_library. However prior to 2.4 that was working. Checking the compile log for 2.3.1 I don't see any invocation of flatc --python so it seems it was simply not used until 2.4 and hence not compiled which explains why the faulty Bazel rule didn't cause problems before

chsigg · 2020-11-18T19:04:29Z

@buchgr, would you mind to take a look at this change? I fear it might not play nice with RBE.

chsigg · 2020-11-19T07:28:28Z

@buchgr, would you mind to take a look at this change? I fear it might not play nice with RBE.

@buchgr confirmed that this prevent RBE caching between different users, and would cause problems when the host's LD_LIBRARY_PATH doesn't fit the remote environment.

Is there any chance you could ldconfig to work around this in your environment?

gbaned · 2020-11-25T13:13:51Z

@Flamefire Can you please check @chsigg's comments and keep us posted ? Thanks!

Flamefire · 2020-11-27T11:34:19Z

@gbaned If you are referring to

Is there any chance you could ldconfig to work around this in your environment?

No, this is not possible. Usual HPC environments have multiple versions of a software installed and decide via environment variables which is active (in particular LD_LIBRARY_PATH and PATH for binaries, CPATH & LIBRARY_PATH for dependencies)

I'm unsure why this is now a problem to add this as e.g. _local_genrule_impl already uses this.

Unfortunately I don't understand enough about Bazel and especially RBE but for us sysadmins it is a real pain that the environment gets cleared on multiple levels

gbaned · 2020-12-04T17:04:05Z

@chsigg Can you please take a look on above comments from @Flamefire. Thanks!

chsigg · 2020-12-17T20:30:53Z

Another idea, would --action_env or --host_action_env maybe do what you need?

Flamefire · 2020-12-18T07:50:42Z

@chsigg No this does not work. This came up in a recent Bazel issue (bazelbuild/bazel#12579 (comment)) where it was observed that action_env and such are seemingly not passed when use_default_shell_env=True is not set.
I haven't tried --host_action_env yet (it is very new) but I assume --action_env already sets those, especially as we use --distinct_host_configuration=false already. It also seems like there is an as-of-yet unreleased bugfix to the host_action_env flag: bazelbuild/bazel@e667082

gbaned · 2021-01-05T18:43:11Z

@Flamefire Can you please resolve conflicts? Thanks!

chsigg · 2021-01-06T06:53:51Z

To be honest, I don't know how we can resolve this. @buchgr, do you have any suggestions?

There is a need to properly pass environment variables to build actions, but bazel sometimes drops them (due to bugs, IIUC). The change here would work around those bugs, but it doesn't play well with RBE.

Flamefire · 2021-01-06T09:00:54Z

There is a need to properly pass environment variables to build actions, but bazel sometimes drops them (due to bugs, IIUC). The change here would work around those bugs, but it doesn't play well with RBE.

IIUC the use_default_shell_env=True makes Bazel pass the --action_env variables and PATH/LD_LIBRARY_PATH while without (=False) it will pass a cleaned up environment with nothing in it.
So this "sometimes" is the case for use_default_shell_env=False, which seems to me like the design, not a bug.
Having said that: There certainly are bugs, some of which I reported and some seem to be "unfixable" (Bazel contains a binary which requires LD_LIBRARY_PATH itself, "fixed" by building Bazel statically)

In the case of flatc (one of the fixed things here) a binary is called which has dependencies, so LD_LIBRARY_PATH must be passed. And as Bazel doesn't seem to have anything else but use_default_shell_env=True to do so, it should be used.
It could be checked what effect --distinct_host_configuration=true (default) has here in respect to RBE, maybe it already means that the hosts LD_LIBRARY_PATH isn't passed to the RBE even with use_default_shell_env=True? I'd expect so but am not using RBE.

Anyway, 3 strong arguments in favor of this change:

It is only the flatbuffers compilation i.e. a code generation which is likely fast, so even a potential cache miss isn't expensive
Short of patching the source there is no other way (AFAIK) by now to run a flatc build with a different (LD_)LIBRARY_PATH (which most likely happens when it is modified at all)
The same is already used for protoc, where it is seemingly ok. So why the discussion here?

gbaned · 2021-08-24T13:44:46Z

@Flamefire Can you please resolve conflicts? Thanks!

To compile flatbuffers definition Bazel starts the flatbuffers compiler flatc which is potentially built using a custom toolchain and hence requires a set up LD_LIBRARY_PATH. Ommitting the `use_default_shell_env` (defaulting to false) clears the whole environment and the binary may try to use older system libs such as /lib64/libstdc++.so causing it to fail in case it is (much) older than the used libstdc++ from the custom toolchain which is very common in HPC environments. Hence I added `use_default_shell_env = True` as already done in e.g. `_local_genrule_impl`.

Flamefire · 2021-08-30T09:25:25Z

@gbaned Rebased

Flamefire · 2021-08-30T09:27:18Z

Had to close+reopen due to Github not detecting the change...

When building a tool in a non-standard environment (e.g. custom compiler path -> LD_LIBRARY_PATH set) then `use_default_shell_env = True` is required to run that tool in the same environment or otherwise the build will fail due to missing symbols. See jax-ml/jax#7842 for this issue and tensorflow/tensorflow#44549 for related fix in TF

When building a tool in a non-standard environment (e.g. custom compiler path -> LD_LIBRARY_PATH set) then `use_default_shell_env = True` is required to run that tool in the same environment or otherwise the build will fail due to missing symbols. See jax-ml/jax#7842 for this issue and tensorflow/tensorflow#44549 for related fix in TF. Reviewed By: GMNGeoffrey Differential Revision: https://reviews.llvm.org/D109873

PiperOrigin-RevId: 394106593 Change-Id: I2226f80fd3bef5e04a42174a8a519f612ae39985

…_without_soft_link Merge pull request tensorflow#44549 from Flamefire:add_default_shell_env

google-ml-butler bot added the size:XS CL Change Size: Extra Small label Nov 3, 2020

google-cla bot added the cla: yes label Nov 3, 2020

gbaned self-assigned this Nov 3, 2020

gbaned added the comp:core issues related to core part of tensorflow label Nov 3, 2020

gbaned requested a review from chsigg November 3, 2020 13:23

gbaned added the awaiting review Pull request awaiting review label Nov 5, 2020

Flamefire force-pushed the add_default_shell_env branch from c53eff5 to 03fc39e Compare November 5, 2020 16:17

Flamefire force-pushed the add_default_shell_env branch from 03fc39e to f0e9a20 Compare November 6, 2020 10:59

Flamefire requested a review from perfinion as a code owner November 6, 2020 10:59

gbaned requested a review from akuegel November 18, 2020 15:04

tensorflowbutler removed the awaiting review Pull request awaiting review label Nov 21, 2020

gbaned added awaiting review Pull request awaiting review stat:awaiting response Status - Awaiting response from author and removed awaiting review Pull request awaiting review labels Nov 25, 2020

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Nov 29, 2020

gbaned added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Dec 4, 2020

tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Dec 20, 2020

gbaned added the stat:awaiting response Status - Awaiting response from author label Jan 5, 2021

gbaned added the stat:awaiting response Status - Awaiting response from author label Aug 27, 2021

Flamefire added 2 commits August 30, 2021 11:24

Add use_default_shell_env to ctx.actions.run

7268be5

Flamefire force-pushed the add_default_shell_env branch from 8aed0d6 to 7268be5 Compare August 30, 2021 09:26

Flamefire closed this Aug 30, 2021

Flamefire reopened this Aug 30, 2021

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 30, 2021

mihaimaruseac approved these changes Aug 30, 2021

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Aug 30, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Aug 30, 2021

copybara-service bot merged commit 52aa0e9 into tensorflow:master Aug 31, 2021

google-ml-butler bot removed the ready to pull PR ready for merge process label Aug 31, 2021

Flamefire deleted the add_default_shell_env branch September 1, 2021 07:34

This was referenced Sep 1, 2021

Allow use of non-system compiler to build mlir_generated tests #50874

Closed

MLIR tests fail to build with non-system gcc and later glibc #50873

Closed

Flamefire mentioned this pull request Sep 16, 2021

Add use_default_shell_env = True to ctx.actions.run llvm/llvm-project#344

Closed

rsanthanam-amd pushed a commit to ROCm/tensorflow-upstream that referenced this pull request Aug 29, 2022

Merge pull request tensorflow#44549 from Flamefire:add_default_shell_env

1cb3486

PiperOrigin-RevId: 394106593 Change-Id: I2226f80fd3bef5e04a42174a8a519f612ae39985

rsanthanam-amd added a commit to ROCm/tensorflow-upstream that referenced this pull request Aug 29, 2022

Merge pull request #1802 from ROCmSoftwarePlatform/fix_for_unit_tests…

f0948db

…_without_soft_link Merge pull request tensorflow#44549 from Flamefire:add_default_shell_env

Flamefire mentioned this pull request Dec 24, 2022

--action_env ignored when cfg = "exec" is used bazelbuild/bazel#17062

Closed

Flamefire mentioned this pull request Aug 16, 2023

Add use_default_shell_env = True to ctx.actions.run #61591

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add use_default_shell_env = True to all ctx.actions.run_shell rules #44549

Add use_default_shell_env = True to all ctx.actions.run_shell rules #44549

Flamefire commented Nov 3, 2020

Flamefire commented Nov 5, 2020

chsigg commented Nov 18, 2020

chsigg commented Nov 19, 2020

gbaned commented Nov 25, 2020

Flamefire commented Nov 27, 2020

gbaned commented Dec 4, 2020

chsigg commented Dec 17, 2020

Flamefire commented Dec 18, 2020

gbaned commented Jan 5, 2021

chsigg commented Jan 6, 2021

Flamefire commented Jan 6, 2021

gbaned commented Aug 24, 2021

Flamefire commented Aug 30, 2021

Flamefire commented Aug 30, 2021

Add use_default_shell_env = True to all ctx.actions.run_shell rules #44549

Add use_default_shell_env = True to all ctx.actions.run_shell rules #44549

Conversation

Flamefire commented Nov 3, 2020

Flamefire commented Nov 5, 2020

chsigg commented Nov 18, 2020

chsigg commented Nov 19, 2020

gbaned commented Nov 25, 2020

Flamefire commented Nov 27, 2020

gbaned commented Dec 4, 2020

chsigg commented Dec 17, 2020

Flamefire commented Dec 18, 2020

gbaned commented Jan 5, 2021

chsigg commented Jan 6, 2021

Flamefire commented Jan 6, 2021

gbaned commented Aug 24, 2021

Flamefire commented Aug 30, 2021

Flamefire commented Aug 30, 2021