Add sequence support [databricks] #4376

wbo4958 · 2021-12-16T13:04:26Z

This is to close #3512. and this PR depends on rapidsai/cudf#9839 and #4376

For now, the PR only supports sequence on IntegerType.

Signed-off-by: Bobby Wang <wbo4958@gmail.com>

revans2

Overall looking really good. I assume all of this issues I am pointing out are just because this is still a work in progress.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/collectionOperations.scala

wbo4958 · 2021-12-17T00:43:30Z

Overall looking really good. I assume all of this issues I am pointing out are just because this is still a work in progress.

@revans2, Thx for the review. Yeah, this PR is still WIP, but it can work for IntegerType. Will refine this and add more types for support. But I'd not like to add TimeStamp and DateType for support in this PR, since the size calculation may be quite different which may cause this PR pretty big.

ttnghia · 2021-12-17T04:31:40Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/collectionOperations.scala

+      Seq[ColumnVector] = {
+    withResource(stop.sub(start)) { difference =>
+      withResource(Scalar.fromInt(1)) { scalarOne =>


Should be GpuScalar(1, dataType) or similar, so we can support various types not just integer.

ttnghia · 2021-12-17T04:33:15Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/collectionOperations.scala

+      withResource(difference.floorDiv(step)) { quotient =>
+        withResource(Scalar.fromInt(1)) { scalarOne =>
+          withResource(quotient.add(scalarOne)) { sizeWithNegative =>
+            withResource(Scalar.fromInt(0)) { scalarZero =>
+              withResource(sizeWithNegative.greaterOrEqualTo(scalarZero)) { pred =>
+                withResource(pred.ifElse(sizeWithNegative, scalarZero)) { tmpSize =>
+                  // when start==stop, step==0, size will be 0. but we should change size to 1
+                  withResource(difference.equalTo(scalarZero)) { diffHasZero =>
+                    step match {
+                      case stepScalar: Scalar =>
+                        withResource(ColumnVector.fromScalar(stepScalar, rows)) { stepV =>
+                          withResource(stepV.equalTo(scalarZero)) { stepHasZero =>
+                            withResource(diffHasZero.and(stepHasZero)) { predWithZero =>
+                              predWithZero.ifElse(scalarOne, tmpSize)


Is there any way to get rid of such nested withResource? Otherwise, this looks so cluster.

Hmm, refined this.

jlowe · 2021-12-17T14:59:36Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/collectionOperations.scala

+        withResource(numberScalar(dt, 1)) { one =>
+          withResource(quotient.add(one)) { sizeWithNegative =>


When withResource nests this deeply, that's usually an indication that we're holding onto one or more GPU results longer than necessary, adding undesired and avoidable memory pressure. For example, we compute quotient here and only need it to compute sizeWithNegative, yet we hold onto the GPU memory for the quotient result until after the entire calculation completes. The memory can be freed earlier with something like this:

withResource(numberScalar(dt, 1)) { one => val sizeWithNegative = withResource(difference.floorDiv(step)) { quotient => quotient.add(one) } withResource(sizeWithNegative) { sizeWithNegative => ....

Thx @jlowe. Changed this accordingly.

ttnghia · 2022-01-04T22:10:07Z

FYI: cudf PR has been merged.

wbo4958 · 2022-01-05T00:17:29Z

Thx for the information

wbo4958 · 2022-01-05T05:42:52Z

build

wbo4958 · 2022-01-05T05:44:26Z

build

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/collectionOperations.scala

wbo4958 · 2022-01-06T03:13:50Z

build

wbo4958 · 2022-01-06T03:25:22Z

build

pxLi · 2022-01-06T03:44:45Z

build

Add sequence support

ecf3b9e

Signed-off-by: Bobby Wang <wbo4958@gmail.com>

revans2 reviewed Dec 16, 2021

View reviewed changes

ttnghia reviewed Dec 17, 2021

View reviewed changes

Add byte/short/long support and resolve comments

70cd19a

jlowe reviewed Dec 17, 2021

View reviewed changes

sameerz added the feature request New feature or request label Dec 29, 2021

resolve comments

27c3409

wbo4958 marked this pull request as ready for review January 5, 2022 05:40

wbo4958 changed the title ~~[DRAFT] Add sequence support~~ Add sequence support [databricks] Jan 5, 2022

wbo4958 mentioned this pull request Jan 5, 2022

[FEA] Add TimeStamp/Date type support for sequence #4457

Open

revans2 reviewed Jan 5, 2022

View reviewed changes

wbo4958 added 2 commits January 6, 2022 11:09

resolve comments

f91c453

update doc

616ce5f

update copyright

15d43e2

wbo4958 requested review from revans2, ttnghia and jlowe January 6, 2022 09:01

revans2 approved these changes Jan 6, 2022

View reviewed changes

wbo4958 merged commit 9283e84 into NVIDIA:branch-22.02 Jan 6, 2022

wbo4958 deleted the sequence branch February 17, 2022 00:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sequence support [databricks] #4376

Add sequence support [databricks] #4376

wbo4958 commented Dec 16, 2021 •

edited

Loading

revans2 left a comment

wbo4958 commented Dec 17, 2021 •

edited

Loading

ttnghia Dec 17, 2021

wbo4958 Jan 5, 2022

ttnghia Dec 17, 2021

wbo4958 Jan 5, 2022

jlowe Dec 17, 2021

wbo4958 Jan 5, 2022

ttnghia commented Jan 4, 2022

wbo4958 commented Jan 5, 2022

wbo4958 commented Jan 5, 2022

wbo4958 commented Jan 5, 2022

wbo4958 commented Jan 6, 2022

wbo4958 commented Jan 6, 2022

pxLi commented Jan 6, 2022

		withResource(numberScalar(dt, 1)) { one =>
		withResource(quotient.add(one)) { sizeWithNegative =>

Add sequence support [databricks] #4376

Add sequence support [databricks] #4376

Conversation

wbo4958 commented Dec 16, 2021 • edited Loading

revans2 left a comment

Choose a reason for hiding this comment

wbo4958 commented Dec 17, 2021 • edited Loading

ttnghia Dec 17, 2021

Choose a reason for hiding this comment

wbo4958 Jan 5, 2022

Choose a reason for hiding this comment

ttnghia Dec 17, 2021

Choose a reason for hiding this comment

wbo4958 Jan 5, 2022

Choose a reason for hiding this comment

jlowe Dec 17, 2021

Choose a reason for hiding this comment

wbo4958 Jan 5, 2022

Choose a reason for hiding this comment

ttnghia commented Jan 4, 2022

wbo4958 commented Jan 5, 2022

wbo4958 commented Jan 5, 2022

wbo4958 commented Jan 5, 2022

wbo4958 commented Jan 6, 2022

wbo4958 commented Jan 6, 2022

pxLi commented Jan 6, 2022

wbo4958 commented Dec 16, 2021 •

edited

Loading

wbo4958 commented Dec 17, 2021 •

edited

Loading