Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test updates of CCCL (thrust, cub, libcudacxx) to 2.1.0. #5388

Closed
wants to merge 3 commits into from

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Apr 26, 2023

This PR tests a rapids-cmake branch with CCCL (thrust, cub, libcudacxx) updated to 2.1.0. Do not merge this PR. The changes will be merged upstream in rapids-cmake after all libraries pass CI.

@bdice
Copy link
Contributor Author

bdice commented Apr 27, 2023

Currently wheel builds are failing because of changes in libcudacxx 2.1.0, but conda C++ builds should be failing too. This is because libcuml doesn't express that it has a direct dependency on CCCL, and instead inherits that dependency from the raft target (this inheritance isn't possible in wheel builds, which is why they fail as expected). I am going to add an explicit CMake dependency on Thrust (which properly brings in all of CCCL in version 2.1.0), then rmm, then raft, so that customizations that happen at the rapids-cmake level (like my test branch for CCCL 2.1.0) can correctly be applied to cuML.

The correct way is shown here in raft for thrust/rmm. We need to do this same thing for thrust/rmm/raft in order: https://github.com/rapidsai/raft/blob/082be6ecd4437d180bf34d5ba5d691a27b21141f/cpp/CMakeLists.txt#L153-L155

Comment on lines +224 to +225
include(cmake/thirdparty/get_thrust.cmake)
include(cmake/thirdparty/get_rmm.cmake)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a flag or something to enable this conditionally and then be able to do these test PRs without adding the get_dependency.cmake files every time>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed with @robertmaynard about this, since I was missing some historical context here. It sounds like this was tried before in #4643 which was closed without much discussion. I think I would argue that cuML should explicitly depend on thrust and rmm. As I understand it, the only reason not to do so (and to instead get those dependencies via raft) was because of patches that raft needed on thrust/rmm. However, it doesn't look like raft has patches for thrust/rmm at present (see raft's patches directory).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, the historical reasons for the current state (no explicit dependencies) may be outdated and we may want to add explicit dependencies on thrust/rmm so that rapids-cmake is able to properly control these versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other context from @robertmaynard:
The original PR that added thrust and rmm to cuml: #3844
PR that yanked thrust/rmm/etc in 22.02: #4256

@dantegd dantegd added the 5 - DO NOT MERGE Hold off on merging; see PR for details label May 22, 2023
@bdice bdice changed the base branch from branch-23.06 to branch-23.10 July 27, 2023 15:41
@bdice bdice force-pushed the cccl-update-2.1.0 branch 2 times, most recently from e06f987 to cd79dff Compare September 13, 2023 18:46
@bdice
Copy link
Contributor Author

bdice commented Sep 14, 2023

This PR is ready to pass off to a cuML C++ dev for completion. We are planning to ship CCCL 2.1.0 support (possibly CCCL 2.2.0) in RAPIDS 23.12. The corresponding rapids-cmake changes will be merged early in the development cycle for 23.12. We are aiming for the PRs to every RAPIDS library to be ready to merge changes needed for CCCL 2 support when 23.10 burndown begins and the 23.12 branch is created. The general readiness of RAPIDS libraries for this CCCL major version bump is being tracked here: rapidsai/rapids-cmake#399 (comment)

The most common changes that are needed are to wrap device lambdas where the return type is needed on the host with cuda::proclaim_return_type<ReturnType>([...] __device__ (...){ ... });. This is most often needed for thrust::transform calls, or when constructing transform iterators, though other Thrust/CUB algorithms also have this requirement. Algorithms like thrust::for_each, however, do not require the return type to be known by the host -- so don't assume that every __device__ lambda needs to proclaim a return type. (Most of the time, I believe the difference has to do with whether the working memory or shared memory sizes of that algorithm depend on the return type.) Please refer to rapidsai/cudf#13222 for more examples.

Be aware that some build errors might be due to other API changes in CCCL 2.1.0, so there may be additional changes necessary besides device lambda return types. For instance, the error below might be due to a CUB API change, but I wasn't immediately able to identify what was wrong:

$SRC_DIR/cpp/src/hdbscan/detail/stabilities.cuh(103): error: no instance of function template "ML::HDBSCAN::detail::Utils::cub_segmented_reduce" matches the argument list
            argument types are: (float *, float *, int, int *, rmm::cuda_stream_view, <unknown-type>)

Consult the changelogs below for more information:

@bdice
Copy link
Contributor Author

bdice commented Dec 8, 2023

Closed in favor of #5623.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - DO NOT MERGE Hold off on merging; see PR for details CMake CUDA/C++
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants