-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cost-based optimizer #1616
Cost-based optimizer #1616
Conversation
sql-plugin/src/main/scala/com/nvidia/spark/rapids/CostBasedOptimizer.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/CostBasedOptimizer.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/CostBasedOptimizer.scala
Outdated
Show resolved
Hide resolved
if (numTransitions > 0) { | ||
if (plan.canThisBeReplaced) { | ||
// transition from CPU to GPU | ||
val transitionCost = numTransitions * costModel.transitionToGpuCost(plan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to be clearer on what transitionToGpuCost
is actually doing. Right now it is mostly a place holder, but in the future it could look at the schema to determine the cost of transitioning. We might also want to guess at the size of the data that will transfer. To me it feels like we would want to do something like.
plan.childPlans.filter(_.canThisBeReplaced != plan.canThisBeReplaces).map(costModel.transitionToGpuCost(_)).sum
So that the transition cost is looking at the output of the plan, so it knows what to transition instead of some vague thing about what the input might be. Or not pass in the plan at all if we just want it to be a single value and we can change things when the models get better.
if (plan.canThisBeReplaced) { | ||
plan.willNotWorkOnGpu(reason) | ||
} | ||
plan.childPlans.foreach(plan => forceOnCpuRecursively(plan, reason)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this to stop before it hits the end or if the last stage should not be on the GPU the entire query will be on the CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I have completely addressed your concern here, but I changed it to only replace sections of the plan if the final operator could have been replaced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the limitations of this approach now that I am testing with TPC-DS and I am rewriting this part. The current approach was too naive and we want to selectively force sections of the plan back onto CPU, not everything below a certain point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we put everything back onto the CPU until we hit something that is already on the CPU. I think that works fine for most of the stupid things we do now (like put something on the GPU just to rearrange columns). But I don't think it will handle cases where something is more expensive on the GPU than the CPU and we may want to exclude just a part of the plan (instead of all of it). But I think we can tackle that problem when we see it in practice.
Signed-off-by: Andy Grove <andygrove@nvidia.com>
5d64d0d
to
3433846
Compare
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
…as part of EXPLAIN output Signed-off-by: Andy Grove <andygrove@nvidia.com>
I like the new explain output, but I'm wondering if it could quickly get very verbose on large plans. We may want the ability to show the fact that the CBO removed an operation from the GPU but suppress the gory details that come out later. On a large plan, there could easily be a ton of "after" output by the CBO that will be a chore to line up with the original. Seems like it would be easy to extend the existing |
…o GpuOverrides Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Could we rename things then and have |
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsMeta.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Andy Grove <andygrove@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a decent starting point worth checkpointing, but it would be good to hear from @revans2 before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is time to checkpoint this. I think the next steps really need to be towards getting it to the point that we can turn it on by default so we stop doing stupid things (and document it). After that we can really dig into what it would take to have it make smart choices.
build |
build |
The tests in this PR depend on |
Signed-off-by: Andy Grove andygrove@nvidia.com
This PR implements a simple cost-based optimizer that will attempt to avoid transitioning from CPU to GPU when there is no benefit to running on GPU.
There are two rules implemented:
There are two aspects to the work here - a cost model, and a mechanism for applying the cost model. We may want to experiment with different cost models or even consider allowing users to provide their own cost models tuned to their workloads.
Explain output
There is a new indicator char of
$
to show which operators/expressions have been forced onto CPU by the cost-based optimizer:This is followed by a summary of the optimizations that were applied:
This is still WIP but getting closer.