Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Error building cudf #8617

Closed
eyalhir74 opened this issue Jun 27, 2021 · 13 comments
Closed

[QST] Error building cudf #8617

eyalhir74 opened this issue Jun 27, 2021 · 13 comments
Labels
CMake CMake build issue question Further information is requested

Comments

@eyalhir74
Copy link

eyalhir74 commented Jun 27, 2021

I'm running ./build.sh libcudf and get the following error:

`
[ 33%] Building CXX object _deps/arrow-build/src/arrow/CMakeFiles/arrow_objlib.dir/filesystem/s3fs.cc.o

/home/eyal/ThirdParties/cudf/cpp/build/_deps/arrow-src/cpp/src/arrow/filesystem/s3fs.cc:38:10: fatal error: aws/core/Aws.h: No such file or directory
38 | #include <aws/core/Aws.h>
| ^~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [_deps/arrow-build/src/arrow/CMakeFiles/arrow_objlib.dir/build.make:1885: _deps/arrow-
build/src/arrow/CMakeFiles/arrow_objlib.dir/filesystem/s3fs.cc.o] Error 1

make[2]: *** Waiting for unfinished jobs....

make[1]: *** [CMakeFiles/Makefile2:1199: _deps/arrow-build/src/arrow/CMakeFiles/arrow_objlib.dir/all] Error 2

make: *** [Makefile:156: all] Error 2
`

cmake --version
cmake version 3.20.5

Seems like the cmake process is fine
`
-- Found OpenSSL Crypto Library: /usr/lib/x86_64-linux-gnu/libcrypto.so

-- Building with OpenSSL (Version: 1.1.1f) support

-- Found hdfs.h at: /home/eyal/ThirdParties/cudf/cpp/build/_deps/arrow-src/cpp/thirdparty/hadoop/include/hdfs.h

-- Found AWS SDK headers:

-- Found AWS SDK libraries:

-- All bundled static libraries:

`

Any idea?

@eyalhir74 eyalhir74 added Needs Triage Need team to review and classify question Further information is requested labels Jun 27, 2021
@shwina
Copy link
Contributor

shwina commented Jun 28, 2021

Hi, thank you for reporting! Could you try disabling S3 support as described here?

@shwina shwina added CMake CMake build issue and removed Needs Triage Need team to review and classify labels Jun 28, 2021
@eyalhir74
Copy link
Author

Thanks @shwina , it looks like now after a few minutes the compilation process is stuck.
Furthermore, there seems to be errors in the compilation as well:

`home/eyal/ThirdParties/cudf/cpp/src/io/json/reader_impl.cu:499:54: required from here

/usr/include/c++/9/bits/stl_tree.h:2199:8: error: no matching function for call to ‘std::pair<std::_Rb_tree_node_base*,
std::_Rb_tree_node_base*>::pair(int, std::_Rb_tree_node_base*&)’

2199 | return _Res(0, _M_rightmost());

  |        ^~~~~~~~~~~~~~~~~~~~~~~

/usr/include/c++/9/bits/stl_pair.h:434:1: note: candidate: ‘template<class ... _Args1, long unsigned int ..._Indexes1, class ...
_Args2, long unsigned int ..._Indexes2> std::pair<_T1, _T2>::pair(std::tuple<_Args1 ...>&, std::tuple<_Args2 ...>&, std::_Index_tuple<_Indexes1 ...>, std::_Index_tuple<_Indexes2 ...>)’

434 | template<typename... _Args1, std::size_t... _Indexes1,

/usr/include/c++/9/bits/stl_tree.h:2208:8: error: no matching function for call to ‘std::pair<std::_Rb_tree_node_base*,
std::_Rb_tree_node_base*>::pair(std::_Rb_tree_node_base*&, std::_Rb_tree_node_base*&)’
2208 | return _Res(_M_leftmost(), _M_leftmost());

  |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`

@eyalhir74
Copy link
Author

eyalhir74 commented Jun 29, 2021

@shwina I'm also getting the following error message when I try to use conda:

conda install -c rapidsai -c nvidia -c numba -c conda-forge cudf=21.06 python=3.7 cudatoolkit=11.0
Fetching package metadata ...................

InvalidSpecError: Invalid spec: =1.2.2.5

@beckernick
Copy link
Member

beckernick commented Jun 29, 2021

We recommend creating a fresh environment, as installing into an existing environment can sometimes cause tricky dependency resolution. https://rapids.ai/start#get-rapids

For the community to best assist you with compiling (if that is your goal), please include all of your environment details as noted in the build guide.

@eyalhir74
Copy link
Author

Thanks @beckernick
I did try to use the conda path as described in the link you've provided but it also failed.
I'd actually be happy with just the libcudf.so and librmm.so (and any other pre-built binaries needed).
I'm trying to use cudf and rmm via C++. I'd be happy not to compile everything from scratch if the required binaries exists somewhere (which I might have missed).
Using CUDA 11.3

@benfred
Copy link
Member

benfred commented Jun 30, 2021

We've also been seeing errors building cudf with CUDA 11.3.

We're building cudf and all of its dependencies from source (not using conda at all to avoid increased container sizes), using the DLFW containers. This build process worked fine on cuda 11.2 with the DLFW 21.03 containers - but we're trying to update to the 21.06 DLFW version which is on CUDA11.3 and seeing issues.

There are at least three issues that I've hit so far:

  1. cpp/src/io/json/reader_impl.cu fails to compile on both the 21.06 and 21.08 branches. The errors seem roughly in line with what @eyalhir74 was seeing . I have a build fix for this, I'll submit a PR for shortly.

  2. cpp/src/cpp/src/replace/clamp.cu also fails to compile in 21.06 and fails with:

/workspace/build-env/cpp/src/replace/clamp.cu:113:27: error: ‘__T291’ was not declared in this scope
  113 |   auto offsets_column(std::move(offset_and_char.first));
      |                     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~      
/workspace/build-env/cpp/src/replace/clamp.cu:114:25: error: ‘__T292’ was not declared in this scope
  114 |   auto chars_column(std::move(offset_and_char.second));

This seems like its working in 21.08 though, and I can make 21.06 compile by changing those two lines to match 21.08:

  auto [offsets_column, chars_column] =
    form_offsets_and_char_column(d_input, null_count, offsets_transformer, stream, mr);

edit: fix in 21.08 is #8525

  1. cpp/src/copying/gather.cu and cpp/src/quantiles/quantiles.cu take hours to compile. I didn't time it, but I'd estimate it took over 8 hours to compile just the gather.cu file. quantiles.cu has been building just the sm80 version on my system now for the last two hours, and once finished will have to build all SM70/SM60 etc too.

Note: we built RMM 21.06 with this patch on top first rapidsai/rmm#809.

Edit: filed a PR for the first error here #8635

@eyalhir74
Copy link
Author

@benfred Thanks for the answer. So what is the suggested way to compile the code for 11.3 now?
Also, I still didn't understand whether there are binaries I can just download from somewhere or I must build?

Any idea why the gather.cu takes hours? Is there some sort of multi template pattern hidden somewhere ?

@jrhemstad
Copy link
Contributor

@robertmaynard tried to build with 11.3 and saw many of the same issues as @benfred described (long compile times on some files). I believe there were some compiler bugs with 11.3 that will be solved with 11.4.

@robertmaynard
Copy link
Contributor

@robertmaynard tried to build with 11.3 and saw many of the same issues as @benfred described (long compile times on some files). I believe there were some compiler bugs with 11.3 that will be solved with 11.4.

Correct, the above regression in 11.3 ( as outlined by @benfred ) have been resolved with 11.4.

@ochan1
Copy link
Contributor

ochan1 commented Jul 8, 2021

Compiler Issue 2 about the 'std::move' with temporaries as referenced by @benfred isn't fixed in CUDA 11.4, but works with the latest fixes in the pull request mentioned

@ochan1
Copy link
Contributor

ochan1 commented Jul 8, 2021

I think it would be best capping the CUDA version support in the READMEs of the stable builds until either this fix gets moved to the stable builds or CUDA Toolkit fixes the std::move with temporaries (in this case, mention it doesn't support 11.3 and 11.4 at least)

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@robertmaynard
Copy link
Contributor

Closing this issue as cudf now requires CUDA Toolkit 11.5+

rapids-bot bot pushed a commit that referenced this issue Aug 8, 2022
This PR is a breaking change that disables Arrow S3 support by default. Enabling this feature by default has caused build issues for many downstream consumers, all of whom (to my knowledge) manually disable support for this feature. Most commonly, that build error appears as `fatal error: aws/core/Aws.h: No such file or directory`. In my understanding, several downstream consumers of cudf no longer rely on Arrow S3 support from this library and instead get S3 access via fsspec. I am not aware of any users of libcudf who rely on this being enabled by default (or enabled at all).

See related issues and discussions: #8617, #11333, #8867, #10644 (comment), NVIDIA/spark-rapids#2827. Build errors caused by this default behavior have also been reported internally.

cc: @rjzamora @beckernick @jdye64 @randerzander @robertmaynard @jlowe @quasiben if you have comments following our previous discussion.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #11470
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants