Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Replace cuco::static_multimap by cuco::static_map in semi-anti-join #11313

Closed
ttnghia opened this issue Jul 20, 2022 · 1 comment
Closed
Assignees
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue Spark Functionality that helps Spark RAPIDS

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Jul 20, 2022

The implementation of semi-anti-join was refactored in #11100. One of the changes was to use cuco::static_multimap, which was later discovered that it has performance issue when the input tables have too many duplicate rows (#11299).

We should use cuco::static_map to avoid the performance issue. However, this is not just a simple change in implementation but needs a new FEA from cuco which adds pair_contains into static_map: NVIDIA/cuCollections#191.

@ttnghia ttnghia added feature request New feature or request 0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Jul 20, 2022
@ttnghia ttnghia self-assigned this Jul 20, 2022
@ttnghia ttnghia changed the title [FEA] Replace cuco::static_multimap by cuco::static_map [FEA] Replace cuco::static_multimap by cuco::static_map in semi-anti-join Jul 20, 2022
@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 26, 2022

Had been addressed in #11330. Close for now.

@ttnghia ttnghia closed this as completed Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

No branches or pull requests

1 participant