-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] split function+ repartition result in "ai.rapids.cudf.CudaException: device-side assert triggered" #2048
Comments
Note that this fails slightly differently with 0.5 and cudf 0.19:
It looks like the repartition is causing a partition on all columns. The partitioning strategy added late in 0.4 to match Spark CPU hashing required sorting on the partition keys, which means sorting on all columns for this use-case. In this case, one of the columns is a list-of-strings which cudf does not support as a sort-key column. The RAPIDS Accelerator should not allow this repartition to occur since it requires partitioning on list-of-string and we cannot do that if that means sorting on it as well. |
Describe the bug
A clear and concise description of what the bug is.
If we use split function and then repartition afterwards, it will trigger an error:
ai.rapids.cudf.CudaException: device-side assert triggered
Full stacktrace is:
Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.
Minimum reproduce in pyspark:
Expected behavior
A clear and concise description of what you expected to happen.
It should work fine as CPU mode.
Environment details (please complete the following information)
Rapids for Spark: 0.4.1
Apache Spark 3.1.1
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: