mergeSort late batch materialization and free already merged batches eagerly #6931

abellina · 2022-10-27T18:24:27Z

Signed-off-by: Alessandro Bellina abellina@nvidia.com

Contributes to #6758

This PR changes the mergeSort function to mergeSortAndClose in order to eagerly close input data to the merge. This helps alleviate memory pressure.

In order to quantify the reduced memory pressure, I ran NDSv2 at 3TB in our performance cluster, configured with a max memory pool of 10GB (1/4th the usual) and with a single task allowed on the GPU (again, 1/4th the usual). Running in this mode I found that query67/query53 were the only queries that spilled. I then isolated the memory usage of the spilling stage and found that the sort was the main culprit, specifically our call to cudf::merge. This function in cuDF requires the input to be kept alive and it can incur at worst case 3x the input. We were calling cudf::merge with several tables at time (I saw up to 8 tables at once). If instead we call cudf::merge to merge pairs of tables from the plugin, we can eagerly close the input.

The above has the added benefit we only need to materialize on the GPU 2 input batches from the spill store.

Overall Query67 went from ~70GiB spilled from GPU memory to ~35GiB in the 1/4th memory setup.

…eagerly Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

abellina · 2022-10-27T19:21:47Z

build

abellina · 2022-10-27T19:23:06Z

build

sql-plugin/src/main/scala/com/nvidia/spark/rapids/SortUtils.scala

abellina · 2022-10-27T22:06:49Z

build

abellina · 2022-10-27T22:07:38Z

Fixed one leak above. I am seeing a number of other leaks in the tests that don't seem to be related to my change, some included HostMemoryBuffer. I'll take a look at that soon, but it likely will be a different PR.

abellina · 2022-10-27T22:09:08Z

build

abellina · 2022-10-28T22:08:15Z

build

gerashegalov · 2022-10-28T22:39:36Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Arm.scala

@@ -15,6 +15,8 @@
 */
 package com.nvidia.spark.rapids

+import java.util


I would remove this import. When reading Scala code below it's looks more useful to see explicitly that the method operates on Java API

gerashegalov · 2022-10-28T22:40:09Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Arm.scala

+      (r: util.AbstractCollection[T])
+      (block: util.AbstractCollection[T] => V): V = {


Suggested change

(r: util.AbstractCollection[T])

(block: util.AbstractCollection[T] => V): V = {

(r: java.util.AbstractCollection[T])

(block: java.util.AbstractCollection[T] => V): V = {

gerashegalov · 2022-10-28T22:40:33Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Arm.scala

+      (r: util.AbstractCollection[T])
+      (block: util.AbstractCollection[T] => V): V = {


Suggested change

(r: util.AbstractCollection[T])

(block: util.AbstractCollection[T] => V): V = {

(r: java.util.AbstractCollection[T])

(block: java.util.AbstractCollection[T] => V): V = {

gerashegalov · 2022-10-28T22:50:51Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala

@@ -421,29 +422,18 @@ case class GpuOutOfCoreSortIterator(
    while (!pending.isEmpty && sortedSize < targetSize) {
      // Keep going until we have enough data to return
      var bytesLeftToFetch = targetSize
-      val mergedBatch = withResource(ArrayBuffer[SpillableColumnarBatch]()) { pendingSort =>
+      val pendingSort = new util.ArrayDeque[SpillableColumnarBatch]()


https://www.scala-lang.org/files/archive/api/2.12.15/scala/collection/mutable/ArrayStack.html ?

I believe we should stay with the java one. ArrayStack was removed in scala 2.13.x, so I'd rather not introduce code that we'll need to remove if we start to build for scala 2.13.

Not a blocker but we won't have to change it for 2.13. Scalac can cross-compile it to 2.13 without a code change at the expense of a deprecation warning.

warning: value ArrayStack in package mutable is deprecated (since 2.13.0): Use Stack instead of ArrayStack; it now uses an array-based implementation

abellina · 2022-10-31T19:53:40Z

build

abellina · 2022-11-01T13:05:48Z

@gerashegalov mind taking another look please?

gerashegalov

LGTM

gerashegalov · 2022-11-01T18:44:56Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSortExec.scala

@@ -421,29 +422,18 @@ case class GpuOutOfCoreSortIterator(
    while (!pending.isEmpty && sortedSize < targetSize) {
      // Keep going until we have enough data to return
      var bytesLeftToFetch = targetSize
-      val mergedBatch = withResource(ArrayBuffer[SpillableColumnarBatch]()) { pendingSort =>
+      val pendingSort = new util.ArrayDeque[SpillableColumnarBatch]()


Not a blocker but we won't have to change it for 2.13. Scalac can cross-compile it to 2.13 without a code change at the expense of a deprecation warning.

warning: value ArrayStack in package mutable is deprecated (since 2.13.0): Use Stack instead of ArrayStack; it now uses an array-based implementation

abellina · 2022-11-02T13:46:13Z

@gerashegalov moved to ArrayStack with 6c143e9

abellina · 2022-11-02T13:46:18Z

build

abellina · 2022-11-02T14:03:58Z

build

revans2 · 2022-11-02T16:10:35Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/SortUtils.scala

+                // In the current version of cudf merge does not work for lists and maps.
+                // This should be fixed by https://github.com/rapidsai/cudf/issues/8050
+                // Nested types in sort key columns is not supported either.
+                if (hasNestedInKeyColumns || hasUnsupportedNestedInRideColumns) {


This corner case is common enough that I think we should optimize this a bit. A full sort is a lot more expensive than a merge sort, and now we are doing N-1 full sorts where as before we were only doing 1 full sort. Could we move this up so the code is more like

if (spillableBatches.size == 1) { } else if (hasNestedInKeyColumns || hasUnsupportedNestedInRideColumns) { // Unspill all of the input batches // concat the input batches // close the input batches // sort the concated batch // close the concated batch } else { ... }

@revans2 that should be fixed now

…tches

abellina · 2022-11-02T18:55:28Z

build

…#9102) This PR adds in retry support for more operations in GpuOutOfCoreSortIterator, including computing the split offset and bringing the data back to GPU to remove the projected columns. Besides, to keep being eager to close the input batches in the mergeSortAndClose function (introduced by #6931), instead of retrying the call to the whole mergeSortAndClose function, we retry the operations inside it, including bringing the data back to GPU, concatenating tables, sort the concatenated table and merging the input tables. It also covers a small followup change in GpuColumnToRowExec for PR #9088. --------- Signed-off-by: Firestarman <firestarmanllc@gmail.com>

mergeSort late batch materialization and free already merged batches …

59805aa

…eagerly Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

Minor changes

a7cdd8d

jlowe reviewed Oct 27, 2022

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/SortUtils.scala Outdated Show resolved Hide resolved

Fix leak

a936610

jlowe previously approved these changes Oct 28, 2022

View reviewed changes

sameerz added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Oct 28, 2022

Use util.ArrayDeque instead of ArrayBuffer

76606ea

abellina dismissed jlowe’s stale review via 76606ea October 28, 2022 21:36

Add some comments

b774e0c

jlowe previously approved these changes Oct 28, 2022

View reviewed changes

gerashegalov reviewed Oct 28, 2022

View reviewed changes

Apply review comments

a695f83

abellina dismissed jlowe’s stale review via a695f83 October 31, 2022 18:23

HaoYang670 assigned abellina Nov 1, 2022

gerashegalov previously approved these changes Nov 1, 2022

View reviewed changes

Use ArrayStack instead of ArrayDeque

6c143e9

abellina dismissed gerashegalov’s stale review via 6c143e9 November 2, 2022 13:38

scalastyle

f404da5

gerashegalov previously approved these changes Nov 2, 2022

View reviewed changes

revans2 reviewed Nov 2, 2022

View reviewed changes

When falling back to concatenate/sort do it once for all spillable ba…

e50f879

…tches

abellina dismissed gerashegalov’s stale review via e50f879 November 2, 2022 18:48

Remove testing code

36aaafa

revans2 approved these changes Nov 2, 2022

View reviewed changes

abellina merged commit 6cbbe3f into NVIDIA:branch-22.12 Nov 2, 2022

abellina deleted the oom/reduce_memory_usage_in_merge_sort branch November 2, 2022 21:18

firestarman mentioned this pull request Aug 24, 2023

Add retry support to GpuOutOfCoreSortIterator.mergeSortEnoughToOutput #9102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mergeSort late batch materialization and free already merged batches eagerly #6931

mergeSort late batch materialization and free already merged batches eagerly #6931

abellina commented Oct 27, 2022 •

edited

Loading

abellina commented Oct 27, 2022

abellina commented Oct 27, 2022

abellina commented Oct 27, 2022

abellina commented Oct 27, 2022

abellina commented Oct 27, 2022

abellina commented Oct 28, 2022

gerashegalov Oct 28, 2022

gerashegalov Oct 28, 2022

gerashegalov Oct 28, 2022

gerashegalov Oct 28, 2022

abellina Oct 31, 2022

gerashegalov Nov 1, 2022 •

edited

Loading

abellina commented Oct 31, 2022

abellina commented Nov 1, 2022

gerashegalov left a comment

gerashegalov Nov 1, 2022 •

edited

Loading

abellina commented Nov 2, 2022

abellina commented Nov 2, 2022

abellina commented Nov 2, 2022

revans2 Nov 2, 2022

abellina Nov 2, 2022

abellina commented Nov 2, 2022

		(r: util.AbstractCollection[T])
		(block: util.AbstractCollection[T] => V): V = {

mergeSort late batch materialization and free already merged batches eagerly #6931

mergeSort late batch materialization and free already merged batches eagerly #6931

Conversation

abellina commented Oct 27, 2022 • edited Loading

abellina commented Oct 27, 2022

abellina commented Oct 27, 2022

abellina commented Oct 27, 2022

abellina commented Oct 27, 2022

abellina commented Oct 27, 2022

abellina commented Oct 28, 2022

gerashegalov Oct 28, 2022

Choose a reason for hiding this comment

gerashegalov Oct 28, 2022

Choose a reason for hiding this comment

gerashegalov Oct 28, 2022

Choose a reason for hiding this comment

gerashegalov Oct 28, 2022

Choose a reason for hiding this comment

abellina Oct 31, 2022

Choose a reason for hiding this comment

gerashegalov Nov 1, 2022 • edited Loading

Choose a reason for hiding this comment

abellina commented Oct 31, 2022

abellina commented Nov 1, 2022

gerashegalov left a comment

Choose a reason for hiding this comment

gerashegalov Nov 1, 2022 • edited Loading

Choose a reason for hiding this comment

abellina commented Nov 2, 2022

abellina commented Nov 2, 2022

abellina commented Nov 2, 2022

revans2 Nov 2, 2022

Choose a reason for hiding this comment

abellina Nov 2, 2022

Choose a reason for hiding this comment

abellina commented Nov 2, 2022

abellina commented Oct 27, 2022 •

edited

Loading

gerashegalov Nov 1, 2022 •

edited

Loading

gerashegalov Nov 1, 2022 •

edited

Loading