Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that all CUDA kernels in cudf have hidden visibility. #14726

Merged

Conversation

robertmaynard
Copy link
Contributor

Description

To correct potential issues when using a static cuda runtime, we mark all kernels with internal linkage via the static keyword or hidden visibility.

Note: This doesn't fix dependencies, but focuses just on the CUDA kernels in cudf directly.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@robertmaynard robertmaynard requested review from a team as code owners January 9, 2024 15:31
@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Jan 9, 2024
@bdice bdice changed the title Ensuree that all CUDA kernels in cudf have hidden visibility. Ensure that all CUDA kernels in cudf have hidden visibility. Jan 9, 2024
@github-actions github-actions bot removed the CMake CMake build issue label Jan 9, 2024
@robertmaynard robertmaynard added bug Something isn't working non-breaking Non-breaking change labels Jan 9, 2024
cpp/include/cudf/types.hpp Outdated Show resolved Hide resolved
@@ -1024,7 +1024,7 @@ __device__ int parse_gzip_header(uint8_t const* src, size_t src_size)
* @param parse_hdr If nonzero, indicates that the compressed bitstream includes a GZIP header
*/
template <int block_size>
__global__ void __launch_bounds__(block_size)
CUDF_KERNEL void __launch_bounds__(block_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we take this opportunity to normalize the order of launch bounds, CUDF_KERNEL, and the return type across all kernels in libcudf? Some put launch bounds first, others put it last.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to make everything consistent as part of the PR, and we can always discuss offline/follow up what style we want. I don't want to hold up the entire PR over a style issue though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can defer on this. It's easy enough to change later. Just wanted to raise that question in case you had a clear preference. I don't know which one I prefer. Maybe CUDF_KERNEL __launch_bounds__(...) void, but that doesn't align with any of the kernels that I saw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could further future proof it (and potentially enable wider compatibility by making a version that takes the bounds as parameters

CUDF_KERNEL_WITH_LAUNCH_BOUNDS(...) void foo(...)

Or even make all kernels use the same macro with varargs.

CUDF_KERNEL(...) void foo(...)

cpp/src/join/mixed_join_kernel.cuh Show resolved Hide resolved
cpp/tests/error/error_handling_test.cu Outdated Show resolved Hide resolved
@robertmaynard robertmaynard force-pushed the bug/mark_kernels_as_static branch 2 times, most recently from 967a9e1 to 3968a89 Compare January 9, 2024 16:15
Comment on lines +24 to +27
/**
* @brief Indicates that the function is a CUDA kernel
*/
#define CUDF_KERNEL __global__ static
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note (non-blocking): I don't think anyone builds libcudf with rdc=true, but if you wanted to be extra pedantic, then CUDF_KERNEL should expand to __attribute__ ((visibility ("hidden"))) when __CUDACC_RDC__ is defined in order to preserve the binary size improvements that come from symbol deduplication within the DLL with rdc=true.

@harrism
Copy link
Member

harrism commented Jan 11, 2024

This looks like a PR that will need to be duplicated across RAPIDS. So I think it should have a rapidsai/build-planning issue with a checklist of per-repo issues.

@robertmaynard
Copy link
Contributor Author

This looks like a PR that will need to be duplicated across RAPIDS. So I think it should have a rapidsai/build-planning issue with a checklist of per-repo issues.

You are correct, we can track the meta issue at: rapidsai/build-planning#12

@robertmaynard robertmaynard added the Spark Functionality that helps Spark RAPIDS label Jan 11, 2024
@robertmaynard
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 6abef4a into rapidsai:branch-24.02 Jan 17, 2024
66 of 67 checks passed
@robertmaynard robertmaynard deleted the bug/mark_kernels_as_static branch January 17, 2024 15:12
PointKernel added a commit to NVIDIA/cuCollections that referenced this pull request Jan 19, 2024
This marks all kernels in CUCO as `static` so that they have internal
linkage and won't conflict when used by multiple DSOs.

I didn't see a single shared/common header in cuco where I could place a
`CUCO_KERNEL` macro so I modified each instance instead.
While `cccl` went with a `__attribute__ ((visibility ("hidden")))`
approach to help reduce RDC size, this approach seemed very invasive for
cuco. This is due to the fact that we would need to pragma push and pop
both gcc warnings and nvcc warnings in each cuco header so that we don't
introduce any warnings. This is needed as the compiler incorrectly state
that the `__attribute__ ((visibility ("hidden")))` has no side-effect.

Context:
rapidsai/cudf#14726
NVIDIA/cccl#166
rapidsai/raft#1722

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yunsong Wang <yunsongw@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants