-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use nosync policy in gather and scatter implementations. #12038
Conversation
c2f5d27
to
c91d0d0
Compare
Codecov ReportBase: 88.09% // Head: 88.12% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-22.12 #12038 +/- ##
================================================
+ Coverage 88.09% 88.12% +0.03%
================================================
Files 133 133
Lines 22003 22003
================================================
+ Hits 19383 19390 +7
+ Misses 2620 2613 -7
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@@ -318,7 +318,7 @@ std::unique_ptr<cudf::column> gather( | |||
|
|||
// check total size is not too large | |||
size_t const total_bytes = thrust::transform_reduce( | |||
rmm::exec_policy(stream), | |||
rmm::exec_policy_nosync(stream), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the result from this returned so this will require a sync inside reduce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thrust handles that for us. If a sync is required for the algorithm's return value (or some other part of its correctness), Thrust is responsible for the sync regardless of the execution policy. nosync
really means "avoid syncing if possible."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference: https://github.com/NVIDIA/thrust/releases/tag/1.16.0
par_nosync
is a hint to the Thrust execution engine that any non-essential internal synchronizations should be skipped and that an explicit synchronization will be performed by the caller before accessing results.
(The return value of a reduction is considered an essential synchronization.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking a comment may help here?
I'm worried for the future generations (including myself) who see this (or forgot this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a couple underlying issues here and a couple of approaches to address them. First, par_nosync
is pretty new, so not all Thrust developers have begun to use it and know its conventions. I think that once developers know the conventions and recognize that nosync
is a safe choice in many cases in libcudf (but not all cases!), this will not be a point of confusion. I plan to make nosync
changes across the entire libcudf codebase over time, so I am unsure if a code comment in every location is appropriate. There are around 60 instances of thrust::reduce
alone -- and quite a few other algorithms fall into the same boat of mandating a final synchronization for host value return. Instead, I would propose expanding our developer docs on stream synchronization to explain when nosync
is (or is not) appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I suppose looking at this code it certainly it appears that an internal synch must be occurring otherwise the reduce would not return the correct result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, I would propose expanding our developer docs on stream synchronization to explain when nosync is (or is not) appropriate.
Maybe a better code talk as well? :)
The explanation in the release notes is not very detailed. Does this mean that it will synchronize only when returning results on the host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! That's a great idea. I have seen a better explanation than in those release notes someplace (perhaps the PR where nosync
was introduced?) but I didn't find it last time I looked. I'll sign up for a future Better Code talk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vuule November 30. Mark your calendar. 😉
BenchmarksComparing this PR (a14ae56) to branch-22.12 (2a58ff6). Broadly, both gather and scatter show significant performance improvements, on the order of 40-50% faster for 1024 rows, 5-10% faster for 1M rows, and no change for very large data sizes (the sync penalty is much smaller relative to the kernel runtime).
|
Is there an issue for this? |
I am preparing #12086 with more details and context (still writing/editing heavily, not in its final form yet). This will be one of several PRs that fall under that issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great 👍
@gpucibot merge |
Description
This PR uses
rmm::exec_policy_nosync
in libcudf's gather and scatter functions. These changes are motivated by performance improvements seen previously in #11577.Checklist