[BUG] very large shuffles can fail #45

revans2 · 2020-05-29T14:25:35Z

Describe the bug
Spark has a limit of 2GB 2^31 for storing a single shuffle element. In some cases we can go over this, and we need to make sure that when we shuffle the data that the largest batch we are going to serialize is < 2GB. We cannot do this in the serializer, because it is too late at that point. We need to do it in the shuffle executor.

* add initialization notebooks for databricks examples * Remove spark.stop() from example notebook

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

* workable version without tests Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * doc Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * fix scala 2.13 Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * fix compile Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * fix it Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * enable it Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * metric name Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * minor Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * change seed Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * fix comments Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * minor Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> --------- Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> Co-authored-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify SQL part of the SQL/Dataframe plugin labels May 29, 2020

revans2 mentioned this issue Jun 29, 2020

[FEA] Better Memory Management for BroadcastNestedLoopJoin #302

Closed

sameerz added P1 Nice to have for release and removed ? - Needs Triage Need team to review and classify labels Aug 18, 2020

wjxiz1992 pushed a commit to wjxiz1992/spark-rapids that referenced this issue Oct 29, 2020

Databricks example notebook changes (NVIDIA#45)

06d13c2

* add initialization notebooks for databricks examples * Remove spark.stop() from example notebook

revans2 mentioned this issue Mar 4, 2021

[TASK] Big Reliability Epic #1870

Closed

14 tasks

revans2 added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Apr 12, 2022

revans2 self-assigned this Aug 2, 2023

revans2 mentioned this issue Aug 4, 2023

Don't go over shuffle limits on CPU #8935

Merged

revans2 closed this as completed in #8935 Aug 7, 2023

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023

Update submodule cudf to 8af4e84 (NVIDIA#45)

72cb04d

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] very large shuffles can fail #45

[BUG] very large shuffles can fail #45

revans2 commented May 29, 2020

[BUG] very large shuffles can fail #45

[BUG] very large shuffles can fail #45

Comments

revans2 commented May 29, 2020