-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support TakeOrderedAndProject #1579
Conversation
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillableColumnarBatch.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillableColumnarBatch.scala
Show resolved
Hide resolved
docs/FAQ.md
Outdated
plan often has more metrics than the CPU versions do, and when we tried to combine all of these | ||
operations into a single stage the metrics where confusing to understand what was happening. Instead | ||
we split the single stage up into multiple smaller parts so the metrics are clearer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plan often has more metrics than the CPU versions do, and when we tried to combine all of these | |
operations into a single stage the metrics where confusing to understand what was happening. Instead | |
we split the single stage up into multiple smaller parts so the metrics are clearer. | |
plan often has more metrics than the CPU versions do, and when we tried to combine all of these | |
operations into a single stage the metrics were confusing to understand. Instead we split the single | |
stage into multiple smaller parts so the metrics are clearer. |
docs/FAQ.md
Outdated
entire stage in the plan. Code generation is typically used to reduce the cost of processing data | ||
one row at a time. The GPU plan processes the data in a columnar format, so the costs are different. | ||
|
||
* ColumnarToRow and RowToColumnar Transitions - The CPU version of Spark plans typically process data in a row based format. The main exception to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ColumnarToRow and RowToColumnar be backquoted, like, for example, WholeStageCodeGen
is in the previous section?
* ColumnarToRow and RowToColumnar Transitions - The CPU version of Spark plans typically process data in a row based format. The main exception to | |
* `ColumnarToRow` and `RowToColumnar` transitions - The CPU version of Spark plans typically process data in a row based format. The main exception to |
I addressed the review comments and updated the metrics a bit. I was seeing much linger times than I expected for |
build |
build |
@sameerz please take another look |
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
This fixes #1575 and fixes #103
I did not test topN with nested types going along for the ride (not being sorted on, but in the table that is being sorted), so I didn't turn them on. They should work, but it was not a requirement.
My initial performance tests showed a little bit of a performance gain over not supporting take ordered and project, but it was not a huge amount of data so it is hard to tell.