-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor compare exchange optimization #1001
Labels
libcu++
For all items related to libcu++
Comments
I am wondering whether we can reuse the comparison? That said, the code is definitely much worse inline __host__ __device__ int __stronger_order_cuda(int __a, int __b) {
if (__a > __b) {
if (__a != _ATOMIC_RELEASE) {
return __a;
}
return __ATOMIC_ACQ_REL - ((__b & 1) == (__b >> 1));
}
if (__b != _ATOMIC_RELEASE) {
return __b;
}
return __ATOMIC_ACQ_REL - ((__a & 1) == (__a >> 1));
} |
Minor optimizations you say? Count me in! How about this?
|
Nothing better than nerdsniping 😹 I am slightly worried that the new object is affecting stack space and also the constructor of pair is non trivial. That said I need to put it into goodbolt |
jrhemstad
added
thrust
For all items related to Thrust.
libcu++
For all items related to libcu++
and removed
thrust
For all items related to Thrust.
labels
Feb 22, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently,
compare_exchange_strong
is using__stronger_order_cuda
:The code above leads to actual memory loads. We can consider the following optimization:
The change leads to about 4% better performance of compare exchange on mobile 3070 ti when memory ordering is not known at compile time:
When the memory ordering is known at compile time, there's no difference in generated SASS for both versions. Here's the benchmark:
The text was updated successfully, but these errors were encountered: