Fixed a few issues with out of core sort #2209

revans2 · 2021-04-20T22:17:48Z

This fixes an off by one error when and entire batch is already sorted. It fixes an issue when inserting batches into the pending queue, and it fixes an issue when sorting only rows, no columns. Any of what could lead to data corruption.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 · 2021-04-20T22:17:55Z

build

integration_tests/src/main/python/sort_test.py

revans2 · 2021-04-20T22:45:37Z

build

abellina

small comments

integration_tests/src/main/python/sort_test.py

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala

revans2 · 2021-04-20T23:39:12Z

build

gerashegalov · 2021-04-20T23:41:36Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala

@@ -284,7 +283,7 @@ case class GpuOutOfCoreSortIterator(
    // Protect ourselves from large rows when there are small targetSizes
    val targetRowCount = Math.max((targetBatchSize/averageRowSize).toInt, 1024)

-    if (sortedOffset == rows - 1) {
+    if (sortedOffset == rows) {


I think the code would read easier if we renamed sortedOffset to sortedRows or numSortedRows

I personally think of it in terms of offsets instead of number of rows. The fact that they end up being equal is just because the sorted values are at the first part of the batch.

Not a blocking issue.

Just to explain my thingking: I think of an offset as the start position of an array, or a range the way it's used on L300 [sortedOffset, rows). Since it describes the unsorted range, if we wanted to use the term offset, I'd call it unsortedOffset. On the other hand, we can view it as the definition of the sorted area [0, sortedOffset) in which case sortedRows works better for me.

pxLi · 2021-04-21T03:19:51Z

build

revans2 · 2021-04-21T15:02:03Z

@gerashegalov is it OK if I merge this? or do you want me to make changes because the memory model is not OK

gerashegalov

it's more of my mental model. no problem either way.

gerashegalov · 2021-04-21T17:14:08Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala

@@ -284,7 +283,7 @@ case class GpuOutOfCoreSortIterator(
    // Protect ourselves from large rows when there are small targetSizes
    val targetRowCount = Math.max((targetBatchSize/averageRowSize).toInt, 1024)

-    if (sortedOffset == rows - 1) {
+    if (sortedOffset == rows) {


Not a blocking issue.

Just to explain my thingking: I think of an offset as the start position of an array, or a range the way it's used on L300 [sortedOffset, rows). Since it describes the unsorted range, if we wanted to use the term offset, I'd call it unsortedOffset. On the other hand, we can view it as the definition of the sorted area [0, sortedOffset) in which case sortedRows works better for me.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

Fixed a few issues with out of core sort

80eb5e6

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 added the bug Something isn't working label Apr 20, 2021

revans2 added this to the Apr 12 - Apr 23 milestone Apr 20, 2021

revans2 self-assigned this Apr 20, 2021

jlowe reviewed Apr 20, 2021

View reviewed changes

integration_tests/src/main/python/sort_test.py Outdated Show resolved Hide resolved

Addressed review comments

ed441e3

abellina requested changes Apr 20, 2021

View reviewed changes

integration_tests/src/main/python/sort_test.py Outdated Show resolved Hide resolved

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala Outdated Show resolved Hide resolved

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala Show resolved Hide resolved

Addressed review comments

1a008f9

gerashegalov reviewed Apr 20, 2021

View reviewed changes

abellina approved these changes Apr 21, 2021

View reviewed changes

jlowe approved these changes Apr 21, 2021

View reviewed changes

gerashegalov approved these changes Apr 21, 2021

View reviewed changes

revans2 merged commit f60d11d into NVIDIA:branch-0.5 Apr 21, 2021

revans2 deleted the out_of_core_sort_fix branch April 21, 2021 18:31

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Fixed a few issues with out of core sort (NVIDIA#2209)

cefd8c7

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Fixed a few issues with out of core sort (NVIDIA#2209)

a2cd161

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed a few issues with out of core sort #2209

Fixed a few issues with out of core sort #2209

revans2 commented Apr 20, 2021

revans2 commented Apr 20, 2021

revans2 commented Apr 20, 2021

abellina left a comment

revans2 commented Apr 20, 2021

gerashegalov Apr 20, 2021

revans2 Apr 21, 2021

gerashegalov Apr 21, 2021

pxLi commented Apr 21, 2021

revans2 commented Apr 21, 2021

gerashegalov left a comment

gerashegalov Apr 21, 2021

Fixed a few issues with out of core sort #2209

Fixed a few issues with out of core sort #2209

Conversation

revans2 commented Apr 20, 2021

revans2 commented Apr 20, 2021

revans2 commented Apr 20, 2021

abellina left a comment

Choose a reason for hiding this comment

revans2 commented Apr 20, 2021

gerashegalov Apr 20, 2021

Choose a reason for hiding this comment

revans2 Apr 21, 2021

Choose a reason for hiding this comment

gerashegalov Apr 21, 2021

Choose a reason for hiding this comment

pxLi commented Apr 21, 2021

revans2 commented Apr 21, 2021

gerashegalov left a comment

Choose a reason for hiding this comment

gerashegalov Apr 21, 2021

Choose a reason for hiding this comment