Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENV not propagated to process-wrapper #4137

Closed
yliu120 opened this issue Nov 20, 2017 · 33 comments
Closed

ENV not propagated to process-wrapper #4137

yliu120 opened this issue Nov 20, 2017 · 33 comments
Labels
team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website under investigation untriaged

Comments

@yliu120
Copy link

yliu120 commented Nov 20, 2017

Please provide the following information. The more we know about your system and use case, the more easily and likely we can help.

Description of the problem / feature request / question:

I have a problem using bazel to build the rules_scala on a custom platform. I am sure it is because the shell envs are not propagated correctly to a subprocess that is spawned by some actions of bazel. Please see the log information that I provide in the last section.

If possible, provide a minimal example to reproduce the problem:

Environment info

  • Operating System:
    CentOS 6.7

  • Bazel version (output of bazel info release):
    0.8.1

  • If bazel info release returns "development version" or "(@non-git)", please tell us what source tree you compiled Bazel from; git commit hash is appreciated (git rev-parse HEAD):

Have you found anything relevant by searching the web?

(e.g. StackOverflow answers,
GitHub issues,
email threads on the bazel-discuss Google group)

Anything else, information or logs or outputs that would be helpful?

(If they are large, please upload as attachment or provide link).

The command I ran to build rules_scala is bazel build -s --verbose_failures --sandbox_debug //src/...

>>>>> # //src/scala/io/bazel/rules_scala/tut_support:tut_compiler [action 'scala //src/scala/io/bazel/rules_scala/tut_support:tut_compiler']
(cd /home-4/yliu120@jhu.edu/.cache/bazel/_bazel_yliu120@jhu.edu/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
  exec env - \
  /bin/bash -c '
rm -f bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/tut_compiler.jar
external/bazel_tools/tools/zip/zipper/zipper c bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/tut_compiler.jar @bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/tut_compiler.jar_zipper_args
')
>>>>> # @scala//:scala-reflect [action 'Extracting interface @scala//:scala-reflect [for host]']
(cd /home-4/yliu120@jhu.edu/.cache/bazel/_bazel_yliu120@jhu.edu/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
  exec env - \
    LD_LIBRARY_PATH=/cm/shared/gcc/6.4.0/lib64:/cm/shared/gcc/6.4.0/lib:/cm/shared/apps/mpc/1.0.3/lib:/cm/shared/apps/gcc/4.9.2/gmp-6.0/lib:/cm/shared/apps/mpfr/3.1.3/lib:/cm/shared/apps/sqlite3/3.15.0/lib:/cm/shared/apps/libevent/2.1.5-beta/lib:/cm/shared/apps/cudnn/6.0/lib64:/cm/shared/apps/java/jdk1.8.0_112/lib:/cm/shared/apps/python/3.6.0/lib:/cm/shared/apps/cuda/8.0/lib64:/cm/shared/apps/binutils/2.25/src/lib:/cm/shared/apps/slurm/current/lib/slurm:/cm/shared/apps/slurm/current/lib:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/compiler/lib/intel64:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/mkl/lib/intel64:/home-4/yliu120@jhu.edu/lib:/home-4/yliu120@jhu.edu/opt/lib \
    PATH=/cm/shared/gcc/6.4.0/bin:/home-4/yliu120@jhu.edu/go/bin:/home-4/yliu120@jhu.edu/.local/bin:/home-4/yliu120@jhu.edu/opt/go/bin:/home-4/yliu120@jhu.edu/maven/bin:/home-4/yliu120@jhu.edu/arcanist/bin:/home-4/yliu120@jhu.edu/opt/bin:/cm/shared/apps/sqlite3/3.15.0/bin:/cm/shared/apps/tmux/2.1/bin:/cm/shared/apps/libevent/2.1.5-beta/bin:/cm/shared/apps/java/jdk1.8.0_112/bin:/cm/shared/apps/python/3.6.0/bin:/cm/shared/apps/cuda/8.0/bin:/cm/shared/apps/binutils/2.25/src/bin:/cm/shared/apps/binutils:/cm/shared/apps/slurm/current/sbin:/cm/shared/apps/slurm/current/bin:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/bin/intel64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/3.2.10/bin:/opt/dell/srvadmin/bin \
  external/bazel_tools/tools/jdk/ijar/ijar external/scala/lib/scala-reflect.jar bazel-out/host/genfiles/external/scala/_ijar/scala-reflect/external/scala/lib/scala-reflect-ijar.jar)
>>>>> # @scala//:scala-reflect [action 'Extracting interface @scala//:scala-reflect']
(cd /home-4/yliu120@jhu.edu/.cache/bazel/_bazel_yliu120@jhu.edu/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
  exec env - \
    LD_LIBRARY_PATH=/cm/shared/gcc/6.4.0/lib64:/cm/shared/gcc/6.4.0/lib:/cm/shared/apps/mpc/1.0.3/lib:/cm/shared/apps/gcc/4.9.2/gmp-6.0/lib:/cm/shared/apps/mpfr/3.1.3/lib:/cm/shared/apps/sqlite3/3.15.0/lib:/cm/shared/apps/libevent/2.1.5-beta/lib:/cm/shared/apps/cudnn/6.0/lib64:/cm/shared/apps/java/jdk1.8.0_112/lib:/cm/shared/apps/python/3.6.0/lib:/cm/shared/apps/cuda/8.0/lib64:/cm/shared/apps/binutils/2.25/src/lib:/cm/shared/apps/slurm/current/lib/slurm:/cm/shared/apps/slurm/current/lib:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/compiler/lib/intel64:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/mkl/lib/intel64:/home-4/yliu120@jhu.edu/lib:/home-4/yliu120@jhu.edu/opt/lib \
    PATH=/cm/shared/gcc/6.4.0/bin:/home-4/yliu120@jhu.edu/go/bin:/home-4/yliu120@jhu.edu/.local/bin:/home-4/yliu120@jhu.edu/opt/go/bin:/home-4/yliu120@jhu.edu/maven/bin:/home-4/yliu120@jhu.edu/arcanist/bin:/home-4/yliu120@jhu.edu/opt/bin:/cm/shared/apps/sqlite3/3.15.0/bin:/cm/shared/apps/tmux/2.1/bin:/cm/shared/apps/libevent/2.1.5-beta/bin:/cm/shared/apps/java/jdk1.8.0_112/bin:/cm/shared/apps/python/3.6.0/bin:/cm/shared/apps/cuda/8.0/bin:/cm/shared/apps/binutils/2.25/src/bin:/cm/shared/apps/binutils:/cm/shared/apps/slurm/current/sbin:/cm/shared/apps/slurm/current/bin:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/bin/intel64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/3.2.10/bin:/opt/dell/srvadmin/bin \
  external/bazel_tools/tools/jdk/ijar/ijar external/scala/lib/scala-reflect.jar bazel-out/local-fastbuild/genfiles/external/scala/_ijar/scala-reflect/external/scala/lib/scala-reflect-ijar.jar)
>>>>> # @scala//:scala-compiler [action 'Extracting interface @scala//:scala-compiler']
(cd /home-4/yliu120@jhu.edu/.cache/bazel/_bazel_yliu120@jhu.edu/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
  exec env - \
    LD_LIBRARY_PATH=/cm/shared/gcc/6.4.0/lib64:/cm/shared/gcc/6.4.0/lib:/cm/shared/apps/mpc/1.0.3/lib:/cm/shared/apps/gcc/4.9.2/gmp-6.0/lib:/cm/shared/apps/mpfr/3.1.3/lib:/cm/shared/apps/sqlite3/3.15.0/lib:/cm/shared/apps/libevent/2.1.5-beta/lib:/cm/shared/apps/cudnn/6.0/lib64:/cm/shared/apps/java/jdk1.8.0_112/lib:/cm/shared/apps/python/3.6.0/lib:/cm/shared/apps/cuda/8.0/lib64:/cm/shared/apps/binutils/2.25/src/lib:/cm/shared/apps/slurm/current/lib/slurm:/cm/shared/apps/slurm/current/lib:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/compiler/lib/intel64:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/mkl/lib/intel64:/home-4/yliu120@jhu.edu/lib:/home-4/yliu120@jhu.edu/opt/lib \
    PATH=/cm/shared/gcc/6.4.0/bin:/home-4/yliu120@jhu.edu/go/bin:/home-4/yliu120@jhu.edu/.local/bin:/home-4/yliu120@jhu.edu/opt/go/bin:/home-4/yliu120@jhu.edu/maven/bin:/home-4/yliu120@jhu.edu/arcanist/bin:/home-4/yliu120@jhu.edu/opt/bin:/cm/shared/apps/sqlite3/3.15.0/bin:/cm/shared/apps/tmux/2.1/bin:/cm/shared/apps/libevent/2.1.5-beta/bin:/cm/shared/apps/java/jdk1.8.0_112/bin:/cm/shared/apps/python/3.6.0/bin:/cm/shared/apps/cuda/8.0/bin:/cm/shared/apps/binutils/2.25/src/bin:/cm/shared/apps/binutils:/cm/shared/apps/slurm/current/sbin:/cm/shared/apps/slurm/current/bin:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/bin/intel64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/3.2.10/bin:/opt/dell/srvadmin/bin \
  external/bazel_tools/tools/jdk/ijar/ijar external/scala/lib/scala-compiler.jar bazel-out/local-fastbuild/genfiles/external/scala/_ijar/scala-compiler/external/scala/lib/scala-compiler-ijar.jar)
>>>>> # @io_bazel_rules_scala_org_tpolecat_tut_core//jar:jar [action 'Extracting interface @io_bazel_rules_scala_org_tpolecat_tut_core//jar:jar']
(cd /home-4/yliu120@jhu.edu/.cache/bazel/_bazel_yliu120@jhu.edu/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
  exec env - \
    LD_LIBRARY_PATH=/cm/shared/gcc/6.4.0/lib64:/cm/shared/gcc/6.4.0/lib:/cm/shared/apps/mpc/1.0.3/lib:/cm/shared/apps/gcc/4.9.2/gmp-6.0/lib:/cm/shared/apps/mpfr/3.1.3/lib:/cm/shared/apps/sqlite3/3.15.0/lib:/cm/shared/apps/libevent/2.1.5-beta/lib:/cm/shared/apps/cudnn/6.0/lib64:/cm/shared/apps/java/jdk1.8.0_112/lib:/cm/shared/apps/python/3.6.0/lib:/cm/shared/apps/cuda/8.0/lib64:/cm/shared/apps/binutils/2.25/src/lib:/cm/shared/apps/slurm/current/lib/slurm:/cm/shared/apps/slurm/current/lib:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/compiler/lib/intel64:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/mkl/lib/intel64:/home-4/yliu120@jhu.edu/lib:/home-4/yliu120@jhu.edu/opt/lib \
    PATH=/cm/shared/gcc/6.4.0/bin:/home-4/yliu120@jhu.edu/go/bin:/home-4/yliu120@jhu.edu/.local/bin:/home-4/yliu120@jhu.edu/opt/go/bin:/home-4/yliu120@jhu.edu/maven/bin:/home-4/yliu120@jhu.edu/arcanist/bin:/home-4/yliu120@jhu.edu/opt/bin:/cm/shared/apps/sqlite3/3.15.0/bin:/cm/shared/apps/tmux/2.1/bin:/cm/shared/apps/libevent/2.1.5-beta/bin:/cm/shared/apps/java/jdk1.8.0_112/bin:/cm/shared/apps/python/3.6.0/bin:/cm/shared/apps/cuda/8.0/bin:/cm/shared/apps/binutils/2.25/src/bin:/cm/shared/apps/binutils:/cm/shared/apps/slurm/current/sbin:/cm/shared/apps/slurm/current/bin:/cm/shared/apps/parallel_studio_xe_2015_update2/composer_xe_2015.2.164/bin/intel64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/3.2.10/bin:/opt/dell/srvadmin/bin \
  external/bazel_tools/tools/jdk/ijar/ijar external/io_bazel_rules_scala_org_tpolecat_tut_core/jar/tut-core_2.11-0.4.8.jar bazel-out/local-fastbuild/genfiles/external/io_bazel_rules_scala_org_tpolecat_tut_core/jar/_ijar/jar/external/io_bazel_rules_scala_org_tpolecat_tut_core/jar/tut-core_2.11-0.4.8-ijar.jar)
ERROR: /home-4/yliu120@jhu.edu/rules_scala/src/scala/scripts/BUILD:41:1: error executing shell command: '
rm -f bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar
external/bazel_tools/tools/zip/zipper/zipper c bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar @ba...' failed (Exit 1): process-wrapper failed: error executing command
  (cd /home-4/yliu120@jhu.edu/.cache/bazel/_bazel_yliu120@jhu.edu/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
  exec env - \
  /home-4/yliu120@jhu.edu/.cache/bazel/_bazel_yliu120@jhu.edu/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala/_bin/process-wrapper '--timeout=-1' '--kill_delay=15' /bin/bash -c '
rm -f bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar
external/bazel_tools/tools/zip/zipper/zipper c bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar @bazel-out/local-fastbuild/bin/src/scala/scripts/bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar_zipper_args
').
/home-4/yliu120@jhu.edu/.cache/bazel/_bazel_yliu120@jhu.edu/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala/_bin/process-wrapper: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /home-4/yliu120@jhu.edu/.cache/bazel/_bazel_yliu120@jhu.edu/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala/_bin/process-wrapper)
INFO: Elapsed time: 0.820s, Critical Path: 0.14s

From the log message, you can see the last command doesn't have PATH and LD_LIBRARY_PATH propagated but the others have. I highlighted the command that reports the error:

(cd /home-4/yliu120@jhu.edu/.cache/bazel/_bazel_yliu120@jhu.edu/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
  exec env - \
  /home-4/yliu120@jhu.edu/.cache/bazel/_bazel_yliu120@jhu.edu/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala/_bin/process-wrapper '--timeout=-1' '--kill_delay=15' /bin/bash -c '
rm -f bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar
external/bazel_tools/tools/zip/zipper/zipper c bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar @bazel-out/local-fastbuild/bin/src/scala/scripts/bazel-out/local-fastbuild/bin/src/scala/scripts/scalapb_generator.jar_zipper_args
')
@meteorcloudy
Copy link
Member

Do you mean this command?

(cd /home-4/yliu120@jhu.edu/.cache/bazel/_bazel_yliu120@jhu.edu/12dd3863654b107695e643fa774ca856/execroot/io_bazel_rules_scala && \
  exec env - \
  /bin/bash -c '
rm -f bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/tut_compiler.jar
external/bazel_tools/tools/zip/zipper/zipper c bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/tut_compiler.jar @bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/bazel-out/local-fastbuild/bin/src/scala/io/bazel/rules_scala/tut_support/tut_compiler.jar_zipper_args
')

Why does it require PATH and LD_LIBRARY_PATH env vars?

@yliu120
Copy link
Author

yliu120 commented Nov 21, 2017

Hi, I am sorry about the confusion. I have edited the original post to clarify your confusion. In fact, I would like to make two points:

  1. The command I pointed out at last causes the error. Apparently this is because process-wrapper is a 'cc_binary' and compiled with a customized tool chain. All the libraries, like libstdc++ is in the ${LD_LIBRARY_PATH}. However when creating this action, the exec env - clears the parent process's env vars so that the process-wrapper binary can't be linked to the correct stdc++ library. Actually, the command wrapped by the process-wrapper could even execute successfully by itself since the java command doesn't need any external cc libraries.

  2. In general, I found all bazel_tools implemented with cc (https://github.com/bazelbuild/bazel/tree/9d9ac15b69530edd83c1b95f98a70efa8f98a27a/src/main/tools) needs the use_default_shell_env = True. This is because all of them are linked to libstdc++. If you simply do exec env -, you will probably mess up the runtime linkage.

@meteorcloudy
Copy link
Member

meteorcloudy commented Nov 22, 2017

@aehlig Can we propagate PATH and LD_LIBRARY_PATH for every action running with process-wrapper?

@yliu120
Copy link
Author

yliu120 commented Nov 22, 2017

A simple question: why are some rules wrapped with process-wrapper but some aren’t? Is this because of sandboxing? I don’t want to scan a bunch of codebase so just ask directly to you guys.

@yliu120
Copy link
Author

yliu120 commented Dec 19, 2017

@meteorcloudy @aehlig I tried to add --action_env to force the bazel executor to take PATH and LD_LIBRARY_PATH yesterday but I didn't succeed. It turns out that sometimes 'action_env' works but sometimes not. I am just wondering why bazel doesn't honor the shell envs in java rules if bazel needs to build some cc utilities like 'process_wrapper'. I tested process_wrapper with cc rules and python rules. They work with it but when I am trying to build java header jars with process_wrapper, bazel doesn't honor the PATH and LD_LIBRARY_PATH envs.

I appreciate the idea of sandboxing everything into an isolated environment. But a successful practice of this would be either

  1. move the source of cc compiler and libraries fully inside the workspace.
  2. creates symbolic links of those libraries or binaries to some folder inside the bazel workspace. In this case, ENVs could be dishonored.

@ulfjack
Copy link
Contributor

ulfjack commented Dec 20, 2017

We're not going to forward env variables by default - doing so is fundamentally incompatible with remote execution and remote caching, which are both important to us (and many of our users). There is a separate issue that --action_env is not forwarded to all actions at #3320.

@yliu120
Copy link
Author

yliu120 commented Dec 20, 2017

@ulfjack Then why do you make cc_rules respect shell envs? I mean this behavior will break a lot of local compilation in customized environments. And at user level, there is no way to control whether envs could be passed to a specific rule so that there is no way to fix it.

Actually the problem here is to compile a java header jar, which doesn’t need host envs at all. However, when wrapping the rule by process-wrapper, the process-wrapper tool itself needs to link the cc library. This is a little ridiculous since the helper destroys the work that it is helping with.

I don’t really understand why passing Envs would disturb remote execution. Assume if users make remote environment identical with local, why is it a problem?

@ulfjack
Copy link
Contributor

ulfjack commented Jan 10, 2018

It's a two-step process - there's a piece of code that's responsible for configuring the C++ toolchain that's separate from the rules. The rules themselves don't respect shell envs. The code that's configuring the toolchain can be swaped out as a whole for code that does not look at the local envs. If there are specific issues with the existing code for local execution, then we can discuss that separately.

It's correct that action_env is not working for all actions right now. There's a separate bug for that.

That said, it seems pretty unusual to require LD_LIBRARY_PATH to run basic binaries.

The problem isn't the remote environment, but the local one. Any env variable that we forward to the remote machines as part of remote execution poisons the remote cache. For example, if you and your colleague have an env variable USERNAME, and that's forward to the remote cache, then you cannot get any cache hits from your colleague and vice versa. Even worse if you have env variables that are more volatile (changing quickly), you can't even get cache hits from yourself.

@lisendong
Copy link

lisendong commented Jan 23, 2018

I think the problem is pretty serious.
when i install brazel from source by compile.sh


ERROR: /home/mpi/tensorflow/downloads/bazel-0.5.4/src/main/protobuf/BUILD:70:1: error executing shell command: 'cp 'bazel-out/local-opt/bin/src/main/protobuf/command_server_java_grpc_srcs.jar' 'bazel-out/local-opt/bin/src/main/protobuf/command_server_java_grpc_srcs.srcjar'' failed (Exit 127): bash failed: error executing command
  (cd /tmp/bazel_DUyJtkXd/out/execroot/io_bazel && \
  exec env - \
  /bin/bash -c 'cp '\''bazel-out/local-opt/bin/src/main/protobuf/command_server_java_grpc_srcs.jar'\'' '\''bazel-out/local-opt/bin/src/main/protobuf/command_server_java_grpc_srcs.srcjar'\''').
/bin/bash: cp: command not found
Target //src:bazel failed to build
INFO: Elapsed time: 49.533s, Critical Path: 30.28s
+ fail 'Could not build Bazel'
+ local exitCode=1
+ [[ 1 = \0 ]]

I ’m sure I have set the "build --action_env"
but it seems not propagate to the "exec env - " command !!!

@lisendong
Copy link

@ulfjack is there any progress about the bug "action_env is not working for all actions" ??
because of the bug, compile.sh doesn't work. I think it's a very serious bug.

@ulfjack
Copy link
Contributor

ulfjack commented Jan 24, 2018

What makes you think that the error you posted is due to env variables?

@lisendong
Copy link

@ulfjack
exec env - /bin/bash -c
this command clear all the env variables, and discard any action_env I set.
basic "cp" can not be found

@ulfjack
Copy link
Contributor

ulfjack commented Jan 25, 2018

What kind of setup do you have such that cp isn't in /bin or /usr/bin?

You should check if this works with a more recent bazel release. If I read the code correctly, this is from a genrule, and they should already forward the action env, even if not all actions do.

@lisendong
Copy link

lisendong commented Feb 14, 2018

@ulfjack
I 'm sure "cp" command is not under /bin or /usr/bin. And I do not have root privilege. so could you please help me how to propogate "$PATH" to bazel BUILD.
I have tried every solution in the google (include --action_env、use_default_shell_env), but does not work

@ulfjack
Copy link
Contributor

ulfjack commented Feb 14, 2018

Did you try with a more recent Bazel release?

@lisendong
Copy link

@ulfjack , I used bazel 0.10.0.

@emusand
Copy link

emusand commented Jun 1, 2018

Are there any news about this issue?

We have to build bazel with another gcc than /usr/bin/gcc. Then we run into the problem where process-wrapper fails for targets, since the local /usr/lib64/libstdc++.so is too old:

.../execroot/flexbs/_bin/process-wrapper: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found
.../execroot/flexbs/_bin/process-wrapper: /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found
...

Our machine park runs several different Linux distributions, and the local gcc is generally too old for Bazel. So building Bazel with the local compiler and install it locally is not an option for us. Instead we build all our tools with our own gcc, installed on a network disk, which produces binaries that work on all other platforms.

Is it possible to build Bazel in a way where process-wrapper and other internal tools become independent of the environment, for instance by linking them statically or passing --rpath to the linker?

@emusand
Copy link

emusand commented Jun 1, 2018

My problem turned out to have a simple solution: We can link process-wrapper statically when we build Bazel (by updating src/main/tools/BUILD).

@yliu120
Copy link
Author

yliu120 commented Jun 1, 2018

@emusand That’s what I did. Apparently, current Bazel implementations are not in favor of customized Linux environments. And I don’t think this will be fixed.

@ulfjack
Copy link
Contributor

ulfjack commented Jun 4, 2018

I have recently updated a bunch of actions to correctly take --action_env into account - the changes should all be in 0.14.0. If you know about specific actions that still don't do it correctly, please do let me know.

I would also be open to merging a patch that makes process-wrapper be statically linked by default. (And possibly change it so that it's a pure C binary and doesn't need to link against libstdc++.)

I certainly want Bazel to work better with unusual setups, within reason, but I won't be able to do all the work myself.

@rahul-nitkkr
Copy link

@emusand : can you share how you were able to get around this, what you had to add in the BUILD file?

@emusand
Copy link

emusand commented Aug 15, 2018 via email

@29x10
Copy link

29x10 commented Oct 16, 2018

Any update?

@jin jin added area-EngProd Bazel CI, infrastructure, bootstrapping, release, and distribution tooling untriaged labels Sep 3, 2019
@jin jin added team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website and removed area-EngProd Bazel CI, infrastructure, bootstrapping, release, and distribution tooling labels Sep 3, 2019
@timkpaine
Copy link

any update? This is still an issue as of latest release

@ulfjack
Copy link
Contributor

ulfjack commented Dec 5, 2019

I think @philwo was suggesting that local actions get the local PATH forwarded, even if it's not used for remote actions.

@ulfjack
Copy link
Contributor

ulfjack commented Dec 5, 2019

(That wouldn't help for LD_LIBRARY_PATH.)

If someone has a repro for me, I may be able to take a look.

@timkpaine
Copy link

timkpaine commented Dec 5, 2019

repro is a bit tough, shove a really old gcc in /usr/ and install a new one in /usr/local/. Even if you set path to be /usr/local/bin and LD_LIBRARY_PATH to /usr/local/lib you'll hit certain steps that try to grab libstdc++ from /usr/lib (I have hit this with ray and tensor flow)

If you start from a centos 6.5 docker image, yum install gcc (4.8) into /usr/bin, then manually install gcc 4.9.3 into /usr/local/bin, you'll be unable to compile tensorflow 1.* or ray, even though both are compatible with 4.9.3 (and even if you setup your paths and still them to point to /usr/local/bin/gcc)

@pgraf
Copy link

pgraf commented Feb 13, 2020

I am experiencing this issue (building ray via Bazel on ppc64le machine (Summit at Oak Ridge)). I tried the "link static" suggestion and my system did not like that (all sorts of missing libraries). I am wondering if there is a different workaround? E.g., if I wanted to hack the Bazel code/configuration (I am building Bazel from source), what would I change to implement the suggestion above to simply pass PATH and LD_LIBRARY_PATH to the step that uses process-wrapper? There is a hint above about "use_default_shell_env = True", but I'm not sure I can follow it. Would I just have to turn it to true in every call to run_shell()/run() in the Bazel's "*.bzl" files? (I tried something like that to no effect, but I may have done it wrong.) Something more? Thank you!

@forestliurui
Copy link

forestliurui commented Apr 7, 2020

I have encountered very similar problems with building Ray via Bazel (v1.1.0). I tried different methods, and successfully solved it in some way (no need to hack Bazel code, just a bunch of ENV variable settings. Details are shared as below), which I believe can be applied to build other projects like TensorFlow. I am on a ppc64le machine using a gcc (v7.3.0) at a customized location, because the gcc at /usr/bin or /usr/local/bin is too old and I have no root privilege to upgrade it. Before we continue, make sure that PATH and LD_LIBRARY_PATH are properly set for the new gcc:

$ export PATH=/private/var/packages/gcc/7.3.0/bin:$PATH
$ export LD_LIBRARY_PATH=/private/var/packages/gcc/7.3.0/lib64:$LD_LIBRARY_PATH

Rebuild Bazel (in order to statically link process-wrapper with libstdc++)

The first (perhaps the most important) problem that I need to deal with is `GLIBCXX_3.4.21' not found error. An example of the error info is:

/private/var/.cache/bazel/_bazel_name/install/863382820ae9540178f3de18543a9280/_embedded_binaries/process-wrapper: 
/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found 
(required by /private/var/.cache/bazel/_bazel_name/install/863382820ae9540178f3de18543a9280/_embedded_binaries/process-wrapper)

As indicated by this very github issue, this is caused by ENV var (esp. LD_LIBRARY_PATH) not propagated to process-wrapper, when we use Bazel to build some project like Ray. My workaround solution is to make process-wrapper statically linked with libstdc++. To achieve this, I need to rebuild Bazel itself because process-wrapper is a helper tool that is bundled into the Bazel executable and is placed in the install base, the first time when Bazel is launched (a good explanation can be found here).

Let's dive into detailed steps. First, before we rebuild Bazel, we can check that process-wrapper is indeed dynamically linked to libstdc++ using ldd command:

$ cd /private/var/.cache/bazel/_bazel_name/install/863382820ae9540178f3de18543a9280/_embedded_binaries/
$ ldd process-wrapper
        linux-vdso64.so.1 =>  (0x00003fff9b490000)
        libstdc++.so.6 => /private/var/packages/gcc/7.3.0/lib64/libstdc++.so.6 (0x00003fff9b260000)
        libm.so.6 => /lib64/libm.so.6 (0x00003fff9b150000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00003fff9b110000)
        libgcc_s.so.1 => /private/var/packages/gcc/7.3.0/lib64/libgcc_s.so.1 (0x00003fff9b0d0000)
        libc.so.6 => /lib64/libc.so.6 (0x00003fff9aee0000)
        /lib64/ld64.so.2 (0x00003fff9b4b0000)

The fact that libstdc++.so.6 appears in the output of ldd confirmed that process-wrapper is dynamically linked to libstdc++ (similarly for libgcc_s.so.1). We need to remove its dynamic dependencies on libstdc++.so.6 and libgcc_s.so.1 by rebuilding Bazel from source. After following step 1 and 2.1 in the instruction, I used the following command for step 2.2:

$ env EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" BAZEL_LINKOPTS=-static-libstdc++:-static-libgcc BAZEL_LINKLIBS=-l%:libstdc++.a:-lm  bash ./compile.sh

There are a few things to note in the above command. (1) I used the ENV variable BAZEL_LINKOPTS to instruct the building process to make Bazel statically linked to libstdc++ and libgcc. (2) I used ENV variable BAZEL_LINKLIBS to specify the necessary the libraries. It's important to explicitly specify the static libstdc++ library -l%:libstdc++.a here, because gcc will still dynamically link the output binary to libstdc++ even if the options -static-libstdc++ and -lstdc++ are given (a more detailed explanation about this can be found here). After Bazel is successfully rebuilt, we could then launch the new Bazel executable and check the lib dependencies of process-wrapper, which is placed at a different directory than before:

$ cd /private/var/.cache/bazel/_bazel_name/install/9d07c1e5d1f16ee5678323cb375d7c07/_embedded_binaries/
$ ldd process-wrapper
        linux-vdso64.so.1 =>  (0x00003fffad4c0000)
        libm.so.6 => /lib64/libm.so.6 (0x00003fffad3b0000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00003fffad370000)
        libc.so.6 => /lib64/libc.so.6 (0x00003fffad180000)
        /lib64/ld64.so.2 (0x00003fffad4e0000)

Notice that both libstdc++.so.6 and libgcc_s.so.1 disappeared. We succeeded! It means that we don't need LD_LIBRARY_PATH to correctly invoke process-wrapper anymore.

Build Ray (using the new Bazel we just rebuilt)

Now we can build Ray (v0.7.7) using the new Bazel that we just rebuilt from previous step. Assuming that other library dependencies of Ray, such as Apache Arrow, have been properly installed (detailed on how to install them can be found here), I used the following command to build ray:

$ cd ray/python
$ env BAZEL_LINKOPTS=-static-libstdc++:-static-libgcc BAZEL_LINKLIBS=-l%:libstdc++.a:-lm BAZEL_CXXOPTS=-std=gnu++0x python setup.py bdist_wheel

Notice that I am still using ENV variables BAZEL_LINKOPTS and BAZEL_LINKLIBS to instruct Bazel to statically link libstdc++ and libgcc when building Ray. This is because Bazel will create many intermediate binaries, and we still need to make sure they are all statically linked to libstdc++ and libgcc. I also used another new ENV variable BAZEL_CXXOPTS to instruct Bazel to use option -std=gnu++0x for gcc. This is because by default gcc will use a higher C++ standard (i.e., gnu++14), which would result in compilation errors when it tries to compile the plasma (for details, see here).

When the above step is successful, you will see a .whl file called ray-0.7.7-cp37-cp37m-linux_ppc64le.whl in the directory ray/python/dist. The following command can be used to install Ray as a python library:

$ cd ray/python/dist
$ pip install ray-0.7.7-cp37-cp37m-linux_ppc64le.whl

Now, you can use Ray in python!

@timkpaine
Copy link

@forestliurui this looks SUPER helpful, I will try to use the same process to build ray in my environment

@forestliurui
Copy link

@forestliurui this looks SUPER helpful, I will try to use the same process to build ray in my environment

@timkpaine Thanks. I am happy if this could be helpful to others. @pgraf Maybe this is also helpful in your case.

@Flamefire
Copy link
Contributor

I just ran into this again: The generate-xml.sh shell script contained in Bazel is also run via the process-wrapper but any stdout/stderr is supressed. Hence all I got was:

ERROR: /dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/TensorFlow/tensorflow-r2.4/tensorflow/c/BUILD:613:11:  failed (Exit 1): generate-xml.sh failed: error executing command 
  (cd /dev/shm/s3248973-EasyBuild/TensorFlow/2.4.0/fosscuda-2019b-Python-3.7.4/tmpb7zFlQ-bazel-tf/20db8ac50b74c328e6dea9b20829b459/execroot/org_tensorflow && \
  exec env - \
    PATH=/usr/bin:/bin \
    TEST_BINARY=tensorflow/c/c_test \
    TEST_NAME=//tensorflow/c:c_test \
    TEST_SHARD_INDEX=0 \
    TEST_TOTAL_SHARDS=0 \
  /dev/shm/output_user_root/20db8ac50b74c328e6dea9b20829b459/execroot/org_tensorflow/external/bazel_tools/tools/test/generate-xml.sh bazel-out/ppc-opt/testlogs/tensorflow/c/c_test/test.log bazel-out/ppc-opt/testlogs/tensorflow/c/c_test/test.xml 21 0)

So only a very generic failure. After days of digging I modified the Bazel sources so I found that there are some log files with that executions stdout/stderr and that told me it is the process-wrapper again.

So take this as another datapoint to do something about this please

meteorcloudy added a commit to meteorcloudy/bazel that referenced this issue Dec 8, 2020
Previously, we hardcode the envs of the xml generation action, which
caused problem for process-wrapper because it's dynamically linked to
some system library and the required PATH or LD_LIBRARY_PATH are not
set.

This change propagate the envs we set for the actual test action to the
xml file generation action to make sure the env vars are correctly set and
can also be controlled by --action_env and --test_env.

Fixes bazelbuild#4137
meteorcloudy added a commit to meteorcloudy/bazel that referenced this issue Dec 8, 2020
Previously, we hardcode the envs of the xml generation action, which
caused problem for process-wrapper because it's dynamically linked to
some system library and the required PATH or LD_LIBRARY_PATH are not
set.

This change propagate the envs we set for the actual test action to the
xml file generation action to make sure the env vars are correctly set and
can also be controlled by --action_env and --test_env.

Fixes bazelbuild#4137
coeuvre pushed a commit to coeuvre/bazel that referenced this issue Jul 15, 2021
Previously, we hardcode the envs of the xml generation action, which
caused problem for process-wrapper because it's dynamically linked to
some system libraries and the required PATH or LD_LIBRARY_PATH are not
set.

This change propagate the envs we set for the actual test action to the
xml file generation action to make sure the env vars are correctly set and
can also be controlled by --action_env and --test_env.

Fixes bazelbuild#4137
Fixes bazelbuild#12579

Closes bazelbuild#12659.

PiperOrigin-RevId: 347596753
coeuvre pushed a commit to coeuvre/bazel that referenced this issue Jul 15, 2021
Previously, we hardcode the envs of the xml generation action, which
caused problem for process-wrapper because it's dynamically linked to
some system libraries and the required PATH or LD_LIBRARY_PATH are not
set.

This change propagate the envs we set for the actual test action to the
xml file generation action to make sure the env vars are correctly set and
can also be controlled by --action_env and --test_env.

Fixes bazelbuild#4137
Fixes bazelbuild#12579

Closes bazelbuild#12659.

PiperOrigin-RevId: 347596753
coeuvre pushed a commit to coeuvre/bazel that referenced this issue Jul 15, 2021
Previously, we hardcode the envs of the xml generation action, which
caused problem for process-wrapper because it's dynamically linked to
some system libraries and the required PATH or LD_LIBRARY_PATH are not
set.

This change propagate the envs we set for the actual test action to the
xml file generation action to make sure the env vars are correctly set and
can also be controlled by --action_env and --test_env.

Fixes bazelbuild#4137
Fixes bazelbuild#12579

Closes bazelbuild#12659.

PiperOrigin-RevId: 347596753
coeuvre pushed a commit to coeuvre/bazel that referenced this issue Jul 15, 2021
Previously, we hardcode the envs of the xml generation action, which
caused problem for process-wrapper because it's dynamically linked to
some system libraries and the required PATH or LD_LIBRARY_PATH are not
set.

This change propagate the envs we set for the actual test action to the
xml file generation action to make sure the env vars are correctly set and
can also be controlled by --action_env and --test_env.

Fixes bazelbuild#4137
Fixes bazelbuild#12579

Closes bazelbuild#12659.

PiperOrigin-RevId: 347596753
coeuvre pushed a commit to coeuvre/bazel that referenced this issue Jul 16, 2021
Previously, we hardcode the envs of the xml generation action, which
caused problem for process-wrapper because it's dynamically linked to
some system libraries and the required PATH or LD_LIBRARY_PATH are not
set.

This change propagate the envs we set for the actual test action to the
xml file generation action to make sure the env vars are correctly set and
can also be controlled by --action_env and --test_env.

Fixes bazelbuild#4137
Fixes bazelbuild#12579

Closes bazelbuild#12659.

PiperOrigin-RevId: 347596753
@Apteryks
Copy link

For the record, the isolation feature that causes the environment variables to be unset has to do with the following code:

          (add-after 'unpack 'disable-isolation
            (lambda _
              ;; XXX: By default, Bazel clears all the environment variables
              ;; but PATH, which causes GCC to not find its include files.
              (substitute* "src/main/java/com/google/devtools/build/lib/util/\
CommandFailureUtils.java"               ;this is purely cosmetic
                (("\\? \"env - ")
                 "? \"env "))
              (substitute* "src/main/java/com/google/devtools/build/lib/shell/\
JavaSubprocessFactory.java"
                (("builder.environment\\().clear\\();" all)
                 (string-append "// (disabled by Guix)" all)))))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website under investigation untriaged
Projects
None yet