-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incompatible_do_not_split_linking_cmdline: Order of C++ link flags affected #7687
Comments
@oquenchil, there don't seem to be associated PRs submitted. Does the flag exist already? If yes, by when do you want people to migrate? If the flag exists, it will be available in the currently being cut release, 0.25. The earliest you can require people to migrate by is the 0.26 release. Is that the plan? |
@oquenchil Seems like the flag does exist and that it was flipped (http://google3/third_party/bazel/src/main/java/com/google/devtools/build/lib/rules/cpp/CppOptions.java?l=836&rcl=241059021). I think that the flag flip needs to be rolled back because there wasn't a migration window. |
You're looking into the wrong flag :) This is the change that introduced this flag: 3010e57. I'm changing the labels accordingly. |
Oh, the title of the issue is inconsistent with the beginning of the content of the issue which is why I got confused :) |
Updated, sorry for the confusion. |
Tested with Bazel CI downstream projects. These fail for unrelated reasons. Both together show that everything is green with my change: https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/971 https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/969 This change makes it unnecessary to patch most CppActionConfig features only in Linux for now. The build_interface_libraries feature is still patched. Once Bazel is released with this change: 4eb7683 we can stop patching it. Also a separate change should do the same for Windows and Mac. This also unblocks flipping --incompatible_do_not_split_linking_cmdline #8303, #6926, #7687 RELNOTES:none PiperOrigin-RevId: 248117783
GH issue: #7687 The flag is now true by default in Bazel. Tested: https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/996 Envoy is showing up as broken. Udp breakage is due to a different reason. The Remote Execution is also fine to break with this change according to buchgr@. That project is using an old version of the default crosstool which still patches features from Bazel. For Envoy I filed envoyproxy/envoy#6968, suggested a fix and they will fix it soonish. If they don't, please disable Envoy downstream or I might miss the 0.27 cut. RELNOTES: PiperOrigin-RevId: 250252235
Tested with Bazel CI downstream projects. These fail for unrelated reasons. Both together show that everything is green with my change: https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/971 https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/969 This change makes it unnecessary to patch most CppActionConfig features only in Linux for now. The build_interface_libraries feature is still patched. Once Bazel is released with this change: bazelbuild@4eb7683 we can stop patching it. Also a separate change should do the same for Windows and Mac. This also unblocks flipping --incompatible_do_not_split_linking_cmdline bazelbuild#8303, bazelbuild#6926, bazelbuild#7687 RELNOTES:none PiperOrigin-RevId: 248117783
GH issue: bazelbuild#7687 The flag is now true by default in Bazel. Tested: https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/996 Envoy is showing up as broken. Udp breakage is due to a different reason. The Remote Execution is also fine to break with this change according to buchgr@. That project is using an old version of the default crosstool which still patches features from Bazel. For Envoy I filed envoyproxy/envoy#6968, suggested a fix and they will fix it soonish. If they don't, please disable Envoy downstream or I might miss the 0.27 cut. RELNOTES: PiperOrigin-RevId: 250252235
Enabling
|
GH issue: bazelbuild#7687 The flag is now true by default in Bazel. Tested: https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/996 Envoy is showing up as broken. Udp breakage is due to a different reason. The Remote Execution is also fine to break with this change according to buchgr@. That project is using an old version of the default crosstool which still patches features from Bazel. For Envoy I filed envoyproxy/envoy#6968, suggested a fix and they will fix it soonish. If they don't, please disable Envoy downstream or I might miss the 0.27 cut. RELNOTES: PiperOrigin-RevId: 250252235
Hey Nikolaus, there is no way to control the order except in cc_config. |
@oquenchil Is there a way in cc_config to do what I tried to show in my previous question? That is, can I put any of my flags before the list of libraries? If not, this is a breaking change for us that I don't know how we'll work around. We have this flag disabled for now. Should I open a new ticket? |
Yes, you can control in cc_config the order of flags. But for custom flags that are coming in from individual rules with the attribute By cc_config I have assumed the whole time that you have your custom file like this: bazel/tools/cpp/unix_cc_toolchain_config.bzl Line 1026 in 8daa514
There you can change the order. For example there you can put user_link_flags_feature ( |
Yeah, I have a file similar to that, but I don't have any section that looks like this one
That is to say, I don't appear to have a feature anywhere that's actually adding the library paths to the linker command line. I assumed that something inside of Bazel was adding them automatically. I'm not sure how they're being added. |
Some features get patched in by Bazel by default when they are not present in the config file. See: bazel/src/main/java/com/google/devtools/build/lib/rules/cpp/CppActionConfigs.java Line 626 in bd4ca0b
So all you have to do is copy over everything you see in CppActionConfigs.java between the ifLinux() brackets into your own config file. Or just copy it from the unix_cc_toolchain_config.bzl. Then once you have the libraries_to_link feature in the config file explicitly, just place it before or after any other feature you want. |
Awesome! Thanks so much for the help. I'll add it in. |
Bazel's change to legacy_whole_archive behavior is not the cause for TF's linking issues with protobuf. Protobuf's implementation and runtime are correctly being linked into TF here: https://github.com/tensorflow/tensorflow/blob/da5765ebad2e1d3c25d11ee45aceef0b60da499f/tensorflow/core/platform/default/build_config.bzl#L239 and https://github.com/tensorflow/tensorflow/blob/da5765ebad2e1d3c25d11ee45aceef0b60da499f/third_party/protobuf/protobuf.patch#L18, and I've confirmed that protobuf symbols are still present in libtensorflow_framework.so via nm. After examining the linker flags that bazel passes to gcc, https://gist.github.com/bmzhao/f51bbdef50e9db9b24acd5b5acc95080, I discovered that the order of the linker flags was what was causing the undefined reference. See https://eli.thegreenplace.net/2013/07/09/library-order-in-static-linking/ and https://stackoverflow.com/a/12272890. Basically linkers discard the objects they've been asked to link if those objects do not export any symbols that the linker currently has kept track as "undefined". To prove this was the issue, I was able to successfully link after moving the linking shared object flag (-l:libtensorflow_framework.so.2) to the bottom of the flag order, and manually invoking g++. This change uses cc_import to to link against a .so in the "deps" of tf_cc_binary, rather than as the "srcs" of tf_cc_binary. This technique was inspired by the comment here: https://github.com/bazelbuild/bazel/blob/387c610d09b99536f7f5b8ecb883d14ee6063fdd/examples/windows/dll/windows_dll_library.bzl#L47-L48 Successfully built on vanilla Ubuntu 18.04 VM: bmzhao@bmzhao-tf-build-failure-reproing:~/tf-fix/tf$ bazel build -c opt --config=cuda --config=v2 --host_force_python=PY3 //tensorflow/tools/pip_package:build_pip_package Target //tensorflow/tools/pip_package:build_pip_package up-to-date: bazel-bin/tensorflow/tools/pip_package/build_pip_package INFO: Elapsed time: 2067.380s, Critical Path: 828.19s INFO: 12942 processes: 51 remote cache hit, 12891 local. INFO: Build completed successfully, 14877 total actions The root cause might instead be bazelbuild/bazel#7687, which is pending further investigation. PiperOrigin-RevId: 281341817 Change-Id: Ia240eb050d9514ed5ac95b7b5fb7e0e98b7d1e83
Bazel's change to legacy_whole_archive behavior is not the cause for TF's linking issues with protobuf. Protobuf's implementation and runtime are correctly being linked into TF here: https://github.com/tensorflow/tensorflow/blob/da5765ebad2e1d3c25d11ee45aceef0b60da499f/tensorflow/core/platform/default/build_config.bzl#L239 and https://github.com/tensorflow/tensorflow/blob/da5765ebad2e1d3c25d11ee45aceef0b60da499f/third_party/protobuf/protobuf.patch#L18, and I've confirmed that protobuf symbols are still present in libtensorflow_framework.so via nm. After examining the linker flags that bazel passes to gcc, https://gist.github.com/bmzhao/f51bbdef50e9db9b24acd5b5acc95080, I discovered that the order of the linker flags was what was causing the undefined reference. See https://eli.thegreenplace.net/2013/07/09/library-order-in-static-linking/ and https://stackoverflow.com/a/12272890. Basically linkers discard the objects they've been asked to link if those objects do not export any symbols that the linker currently has kept track as "undefined". To prove this was the issue, I was able to successfully link after moving the linking shared object flag (-l:libtensorflow_framework.so.2) to the bottom of the flag order, and manually invoking g++. This change uses cc_import to to link against a .so in the "deps" of tf_cc_binary, rather than as the "srcs" of tf_cc_binary. This technique was inspired by the comment here: https://github.com/bazelbuild/bazel/blob/387c610d09b99536f7f5b8ecb883d14ee6063fdd/examples/windows/dll/windows_dll_library.bzl#L47-L48 Successfully built on vanilla Ubuntu 18.04 VM: bmzhao@bmzhao-tf-build-failure-reproing:~/tf-fix/tf$ bazel build -c opt --config=cuda --config=v2 --host_force_python=PY3 //tensorflow/tools/pip_package:build_pip_package Target //tensorflow/tools/pip_package:build_pip_package up-to-date: bazel-bin/tensorflow/tools/pip_package/build_pip_package INFO: Elapsed time: 2067.380s, Critical Path: 828.19s INFO: 12942 processes: 51 remote cache hit, 12891 local. INFO: Build completed successfully, 14877 total actions The root cause might instead be bazelbuild/bazel#7687, which is pending further investigation. PiperOrigin-RevId: 281341817 Change-Id: Ia240eb050d9514ed5ac95b7b5fb7e0e98b7d1e83
Flag: --incompatible_do_not_split_linking_cmdline
Available since: 0.25 (March 2019 release)
Will be flipped in: 0.26 (April 2019 release)
Tracking issue: #7670
This changes how flags are passed to the linker. If you don't have your own custom CROSSTOOL, then you don't need to migrate anything, but you should still test with this flag to see that nothing breaks. If it does, please let us know.
If you have your own custom CROSSTOOL:
If the CROSSTOOL doesn't support linker parameter files, then you don't have to migrate anything. If it supports linker parameter file but you do not specify the feature 'linker_param_file' in your CROSSTOOL, then you don't have to migrate anything.
If you specify the feature 'linker_param_file' and the linker is using gcc-style linker flags, then you should modify the feature so that instead of having a parameter file specified like this:
flag: '-Wl,@%{linker_param_file}'
it is like this:
flag: '@%{linker_param_file}'
In other words, the parameter file should not be passed directly to the linker with
-Wl,
. In addition to this you should specify the feature 'do_not_split_linking_cmdline' in your CROSSTOOL, whether it's enabled or disabled it does not matter, we only check its presence. This will tell Bazel to use the new behavior with this toolchain. After the flip ofincompatible_disable_legacy_cc_provider
this feature will do nothing, it can safely be deleted.This change will put every flag inside the parameter file whereas before only flags starting with
-l
were put there, i.e.: all libraries. This might cause the order of link flags to change with respect to what you originally had in your CROSSTOOL, if it does you will have to re-order the corresponding flags. If you have to make any changes to the CROSSTOOL, make sure that you add the feature 'do_not_split_linking_cmdline' too. Example of change:Old:
New:
The text was updated successfully, but these errors were encountered: