Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests ensuring that cudf's default stream is always used #11875

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Oct 7, 2022

Description

This PR ensures that cudf's default stream is properly passed to all kernel launches so that nothing implicitly runs on the CUDA default stream. It adds a small library that is built during the tests and overloads CUDA functions to throw an exception when usage of the default stream is detected. It also fixes all remaining usage of anything other than cudf's default stream (I fixed most of the issues in previous PRs, but I found a few others when finalizing this one).

Resolves #11929
Resolves #11942

Important notes for reviewers:

  • The changeset is deceptively large. The vast majority of the changes are just a global find-and-replace of cudf::get_default_stream() for cudf::default_stream_value, as well as a few smaller fixes such as missing CUDF_TEST_PROGRAM_MAIN in a couple of tests and usage of rmm::cuda_stream_default. The meaningful changes are:
    • The new default stream getter/setter in default_stream.[hpp|cpp]
    • The addition of cpp/tests/utilities/identify_stream_usage
    • The changes to the base testing fixture in cpp/include/cudf_test/base_fixture.hpp to inject the custom stream.
    • The changes to CI in ci/gpu/build.sh to build and use the new library.
  • This PR is a breaking change because it moves the default stream into the detail namespace. Going forward the default stream may only be accessed using the public accessor cudf::get_default_stream(). I have added a corresponding setter, but it is also in the detail namespace since I do not want to publicly support changing the default stream yet, only for the purpose of testing. Reviewers, please leave comments if you disagree with those choices.
  • I have made getting and setting the default stream thread-safe, but there is still only a single stream. In multi-threaded applications we may want to support a stream per thread so that users could manually achieve PTDS with more fine-tuned control. Is this worthwhile? Even if it is, I'm inclined to wait for a subsequent PR to implement this unless someone feels strongly otherwise.
  • I'm currently only overloading cudaLaunchKernel. I can add overloads for other functions as well, but I didn't want to go through the effort of overloading every possible API. If reviewers have a minimal set that they'd like to see overloaded, let me know. I've included links to all the relevant pages of the CUDA runtime API in the identify_stream_usage.cu file if someone wants to look through them.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vyasr vyasr added 3 - Ready for Review Ready for review by team tests Unit testing for project libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function breaking Breaking change labels Oct 7, 2022
@vyasr vyasr self-assigned this Oct 7, 2022
@vyasr vyasr requested review from a team as code owners October 7, 2022 18:04
@vyasr vyasr requested a review from bdice October 7, 2022 18:04
Copy link
Contributor

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have significant concerns about adding acquiring a mutex at so many places throughout the code just to enable this test case. Vyas and I will be meeting to discuss alternative solutions, but requesting changes for now to prevent merging as I do not think the pros outweigh the cons in the current solution.

@vyasr
Copy link
Contributor Author

vyasr commented Oct 19, 2022

I have significant concerns about adding acquiring a mutex at so many places throughout the code just to enable this test case. Vyas and I will be meeting to discuss alternative solutions, but requesting changes for now to prevent merging as I do not think the pros outweigh the cons in the current solution.

I've resolved this issue by instead overloading get_default_stream in the identify_stream_usage library with a version that creates a function local static. The resulting stream is initialized after main so that it does not trigger the CUDA UB that I was running into in earlier iterations, and it obviates the need for the set_default_stream function, which has now been removed. Since the default stream is no longer settable, the mutex is no longer necessary. Also, the default stream is once again const.

@vyasr
Copy link
Contributor Author

vyasr commented Oct 21, 2022

I just merged branch-22.12 locally, rebuilt, and reran the tests to make sure that nothing has been merged since the last test run that would use a default stream and break this, so I think this is ready to merge and shouldn't cause any test failures afterwards.

@vyasr vyasr dismissed robertmaynard’s stale review October 21, 2022 14:44

Review has been addressed, and there are enough other approvals now

@vyasr
Copy link
Contributor Author

vyasr commented Oct 21, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit dec8bde into rapidsai:branch-22.12 Oct 21, 2022
@vyasr vyasr deleted the feature/default_stream_usage_identification branch October 21, 2022 14:45
rapids-bot bot pushed a commit that referenced this pull request Oct 27, 2022
This PR moves the `output_builder` and `split_device_span` classes out of `multibyte_split` and adds an iterator for the `split_device_span`, enabling it to be used directly in Thrust algorithms.

I also included a fix from #11875 to make the integration easier once that is merged.

Authors:
  - Tobias Ribizel (https://github.com/upsj)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #11945
rapids-bot bot pushed a commit that referenced this pull request Feb 10, 2023
This PR reenables the preload library introduced for verifying stream usage in libcudf in #11875. This library was disabled during the GitHub Actions migration.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - AJ Schmidt (https://github.com/ajschmidt8)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Bradley Dice (https://github.com/bdice)

URL: #12714
rapids-bot bot pushed a commit that referenced this pull request Mar 14, 2023
This PR builds on #11875 and partially addresses #11943. This PR allows us to run all tests on a precise stream (the newly introduced `cudf::test::get_default_stream()`) and then verify that all CUDA APIs end up invoked on that stream. This implements the feature required in #11943, but to apply it universally across libcudf will require the API changes that will expose streams so I plan to make those changes incrementally after this PR is merged.

The preload library is now compiled twice, once to overload `cudf::get_default_stream` and once to overload `cudf::test::get_default_stream`. For now there is still some manual coordination associated with determining which one should be used with a given test, but once #12451 is merged and we start running all tests via ctest instead of direct invocation of the test executables we can start encoding this information in the CMake configuration of the tests by associating the require environment variables directly with the test executable using `set_tests_properties`.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Ray Douglass (https://github.com/raydouglass)
  - Nghia Truong (https://github.com/ttnghia)

URL: #12089
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change CMake CMake build issue improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. tests Unit testing for project
Projects
None yet
9 participants