From 40bdd8ae4d89d2ea1f466c579d56f2c9ca1b014d Mon Sep 17 00:00:00 2001 From: Peter Andreas Entschev Date: Fri, 22 Sep 2023 19:20:18 +0200 Subject: [PATCH] Pin to `aws-sdk-cpp<1.11` (#14173) Pin conda packages to `aws-sdk-cpp<1.11`. The recent upgrade in version `1.11.*` has caused several issues with cleaning up (more details on changes can be read in [this link](https://github.com/aws/aws-sdk-cpp#version-111-is-now-available)), leading to Distributed and Dask-CUDA processes to segfault. The stack for one of those crashes looks like the following: ``` (gdb) bt #0 0x00007f5125359a0c in Aws::Utils::Logging::s_aws_logger_redirect_get_log_level(aws_logger*, unsigned int) () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so #1 0x00007f5124968f83 in aws_event_loop_thread () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-io.so.1.0.0 #2 0x00007f5124ad9359 in thread_fn () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1 #3 0x00007f519958f6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00007f5198b1361f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ``` Such segfaults now manifest frequently in CI, and in some cases are reproducible with a hit rate of ~30%. Given the approaching release time, it's probably the safest option to just pin to an older version of the package while we don't pinpoint the exact cause for the issue and a patched build is released upstream. The `aws-sdk-cpp` is statically-linked in the `pyarrow` pip package, which prevents us from using the same pinning technique. cuDF is currently pinned to `pyarrow=12.0.1` which seems to be built against `aws-sdk-cpp=1.10.*`, as per [recent build logs](https://github.com/apache/arrow/actions/runs/6276453828/job/17046177335?pr=37792#step:6:1372). Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Ray Douglass (https://github.com/raydouglass) URL: https://github.com/rapidsai/cudf/pull/14173 --- conda/environments/all_cuda-118_arch-x86_64.yaml | 1 + conda/environments/all_cuda-120_arch-x86_64.yaml | 1 + conda/recipes/libcudf/conda_build_config.yaml | 3 +++ conda/recipes/libcudf/meta.yaml | 2 ++ dependencies.yaml | 1 + 5 files changed, 8 insertions(+) diff --git a/conda/environments/all_cuda-118_arch-x86_64.yaml b/conda/environments/all_cuda-118_arch-x86_64.yaml index d4abc28cf13..9fb991f9075 100644 --- a/conda/environments/all_cuda-118_arch-x86_64.yaml +++ b/conda/environments/all_cuda-118_arch-x86_64.yaml @@ -9,6 +9,7 @@ channels: - nvidia dependencies: - aiobotocore>=2.2.0 +- aws-sdk-cpp<1.11 - benchmark==1.8.0 - boto3>=1.21.21 - botocore>=1.24.21 diff --git a/conda/environments/all_cuda-120_arch-x86_64.yaml b/conda/environments/all_cuda-120_arch-x86_64.yaml index 9a98e400e6d..9ba0dd8dc38 100644 --- a/conda/environments/all_cuda-120_arch-x86_64.yaml +++ b/conda/environments/all_cuda-120_arch-x86_64.yaml @@ -9,6 +9,7 @@ channels: - nvidia dependencies: - aiobotocore>=2.2.0 +- aws-sdk-cpp<1.11 - benchmark==1.8.0 - boto3>=1.21.21 - botocore>=1.24.21 diff --git a/conda/recipes/libcudf/conda_build_config.yaml b/conda/recipes/libcudf/conda_build_config.yaml index 25b3f19de77..b1f5b083e06 100644 --- a/conda/recipes/libcudf/conda_build_config.yaml +++ b/conda/recipes/libcudf/conda_build_config.yaml @@ -22,6 +22,9 @@ gbench_version: gtest_version: - ">=1.13.0" +aws_sdk_cpp_version: + - "<1.11" + libarrow_version: - "=12" diff --git a/conda/recipes/libcudf/meta.yaml b/conda/recipes/libcudf/meta.yaml index 627065817ba..28357f0d96d 100644 --- a/conda/recipes/libcudf/meta.yaml +++ b/conda/recipes/libcudf/meta.yaml @@ -74,6 +74,7 @@ requirements: - gtest {{ gtest_version }} - gmock {{ gtest_version }} - zlib {{ zlib_version }} + - aws-sdk-cpp {{ aws_sdk_cpp_version }} outputs: - name: libcudf @@ -107,6 +108,7 @@ outputs: - dlpack {{ dlpack_version }} - gtest {{ gtest_version }} - gmock {{ gtest_version }} + - aws-sdk-cpp {{ aws_sdk_cpp_version }} test: commands: - test -f $PREFIX/lib/libcudf.so diff --git a/dependencies.yaml b/dependencies.yaml index 376e43094a7..5586f54348c 100644 --- a/dependencies.yaml +++ b/dependencies.yaml @@ -218,6 +218,7 @@ dependencies: - libkvikio==23.10.* - output_types: conda packages: + - aws-sdk-cpp<1.11 - fmt>=9.1.0,<10 - &gbench benchmark==1.8.0 - >est gtest>=1.13.0