Support TakeOrderedAndProject #1579

revans2 · 2021-01-25T22:15:52Z

This fixes #1575 and fixes #103

I did not test topN with nested types going along for the ride (not being sorted on, but in the table that is being sorted), so I didn't turn them on. They should work, but it was not a requirement.

My initial performance tests showed a little bit of a performance gain over not supporting take ordered and project, but it was not a huge amount of data so it is hard to tell.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 · 2021-01-25T22:16:03Z

build

docs/FAQ.md

integration_tests/src/main/python/data_gen.py

integration_tests/src/main/python/sort_test.py

sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillableColumnarBatch.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala

sameerz · 2021-01-26T02:39:46Z

docs/FAQ.md

+plan often has more metrics than the CPU versions do, and when we tried to combine all of these
+operations into a single stage the metrics where confusing to understand what was happening. Instead
+we split the single stage up into multiple smaller parts so the metrics are clearer.


Suggested change

plan often has more metrics than the CPU versions do, and when we tried to combine all of these

operations into a single stage the metrics where confusing to understand what was happening. Instead

we split the single stage up into multiple smaller parts so the metrics are clearer.

plan often has more metrics than the CPU versions do, and when we tried to combine all of these

operations into a single stage the metrics were confusing to understand. Instead we split the single

stage into multiple smaller parts so the metrics are clearer.

sameerz · 2021-01-26T02:43:59Z

docs/FAQ.md

+entire stage in the plan. Code generation is typically used to reduce the cost of processing data
+one row at a time. The GPU plan processes the data in a columnar format, so the costs are different.
+
+* ColumnarToRow and RowToColumnar Transitions - The CPU version of Spark plans typically process data in a row based format. The main exception to


Should ColumnarToRow and RowToColumnar be backquoted, like, for example, WholeStageCodeGen is in the previous section?

Suggested change

* ColumnarToRow and RowToColumnar Transitions - The CPU version of Spark plans typically process data in a row based format. The main exception to

* `ColumnarToRow` and `RowToColumnar` transitions - The CPU version of Spark plans typically process data in a row based format. The main exception to

docs/FAQ.md

revans2 · 2021-01-26T14:58:27Z

I addressed the review comments and updated the metrics a bit. I was seeing much linger times than I expected for GpuTopN total time and I found that I had iter.next() included in the total time, which resulted in also including the time it took for the upstream batch in some cases to be built.

revans2 · 2021-01-26T14:58:37Z

build

jlowe · 2021-01-27T16:15:20Z

build

revans2 · 2021-01-27T20:17:08Z

@sameerz please take another look

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

Support TakeOrderedAndProject

4062003

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 added the feature request New feature or request label Jan 25, 2021

revans2 added this to the Jan 18 - Jan 29 milestone Jan 25, 2021

revans2 self-assigned this Jan 25, 2021

jlowe requested changes Jan 26, 2021

View reviewed changes

sameerz reviewed Jan 26, 2021

View reviewed changes

revans2 added 2 commits January 26, 2021 08:55

Addressed review comments and updated metrics

5756f76

Merge branch 'branch-0.4' into take_ordered

6f288ea

jlowe approved these changes Jan 26, 2021

View reviewed changes

sameerz approved these changes Jan 27, 2021

View reviewed changes

revans2 merged commit 5e9ca10 into NVIDIA:branch-0.4 Jan 27, 2021

revans2 deleted the take_ordered branch January 27, 2021 21:27

gerashegalov pushed a commit to gerashegalov/spark-rapids that referenced this pull request Jan 29, 2021

Support TakeOrderedAndProject (NVIDIA#1579)

9c5441c

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Support TakeOrderedAndProject (NVIDIA#1579)

3aadea7

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Support TakeOrderedAndProject (NVIDIA#1579)

0b1c745

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support TakeOrderedAndProject #1579

Support TakeOrderedAndProject #1579

revans2 commented Jan 25, 2021

revans2 commented Jan 25, 2021

sameerz Jan 26, 2021

sameerz Jan 26, 2021

revans2 commented Jan 26, 2021

revans2 commented Jan 26, 2021

jlowe commented Jan 27, 2021

revans2 commented Jan 27, 2021

	* ColumnarToRow and RowToColumnar Transitions - The CPU version of Spark plans typically process data in a row based format. The main exception to
	* `ColumnarToRow` and `RowToColumnar` transitions - The CPU version of Spark plans typically process data in a row based format. The main exception to

Support TakeOrderedAndProject #1579

Support TakeOrderedAndProject #1579

Conversation

revans2 commented Jan 25, 2021

revans2 commented Jan 25, 2021

sameerz Jan 26, 2021

Choose a reason for hiding this comment

sameerz Jan 26, 2021

Choose a reason for hiding this comment

revans2 commented Jan 26, 2021

revans2 commented Jan 26, 2021

jlowe commented Jan 27, 2021

revans2 commented Jan 27, 2021