Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support ScalarSubquery #1600

Closed
sperlingxx opened this issue Jan 27, 2021 · 1 comment · Fixed by #1639
Closed

[FEA] Support ScalarSubquery #1600

sperlingxx opened this issue Jan 27, 2021 · 1 comment · Fixed by #1639
Assignees
Labels
feature request New feature or request

Comments

@sperlingxx
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
ScalarSubquery, which turning the output of a (sub)query into a single scalar value, has yet supported under GPU. This feature is important for better performance.

Describe the solution you'd like
Implement GPU overrides for org.apack.spark.sql.execution.SubqueryExec and org.apack.spark.sql.execution.ScalarSubquery.

@sperlingxx sperlingxx added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jan 27, 2021
@revans2
Copy link
Collaborator

revans2 commented Jan 27, 2021

I am not sure we need to implement SubqueyExec for this use case. ScalarSubquery will run a query that produces a single result. One row of one column. It will run that sub-query and collect the result back to the driver.

https://github.com/apache/spark/blob/5718d64f3104f7a24a9d4b619748bcca03031c48/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala#L82-L96

But then it keeps the data cached in the expression and just returns that value as if it were a literal.

There is really almost no value at all in trying to make the various SubqueryExecBase implementations run on the GPU because it is returning just a single value. If it was more data then it might make since to do something like BroadcastExchangeExec so we can keep a large amount of the data columnar. Here the amount of data is likely small enough that there is no point.

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Feb 2, 2021
@sameerz sameerz added this to the Feb 1 - Feb 12 milestone Feb 2, 2021
@sameerz sameerz linked a pull request Feb 2, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants