Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match GPU overwritten functions with SQL functions from FunctionRegistry #332

Merged
merged 6 commits into from
Jul 9, 2020

Conversation

abellina
Copy link
Collaborator

@abellina abellina commented Jul 8, 2020

This change would add a dynamic table of Spark SQL functions leveraging Spark's FunctionRegistry, and then tying it to the override rules in the plugin.

A different markdown page may be nice for this too, it is roughtly 300 functions. We can pull more info from the registry also, like function signatures, but I am not sure how far we want to go. The main thing is that it would change automatically and if someone wants to find whether a SQL function has potential GPU support, it is a quick lookup.


Update: Instead of a different table, we decided to take the Expressions table we already had and augment it with the sql function from the FunctionRegistry to begin with.

@abellina abellina added the documentation Improvements or additions to documentation label Jul 8, 2020
@revans2
Copy link
Collaborator

revans2 commented Jul 8, 2020

I'm not sure this is going to do what we want. For example null_if, nvl, and nvl2 are marked as not supported, but they are supported, because they are instances of RuntimeReplaceable and are translated into other expressions. the problem with this approach is that RuntimeReplaceable requires you to actually instantiate it to know what it is going to replace it with.

@abellina
Copy link
Collaborator Author

abellina commented Jul 8, 2020

We discussed more on this and will change it where we find the Spark SQL function(s) for expressions that we do support because of what you brought up @revans2. We also discussed trying to annotate our functions, not unlike the registry, to be able to expose things like datatype support. When we have annotations we can then bubble them up to configs.md in a similar place. This seems to me like a different issue though, so will find a good example and file that separately.

@abellina
Copy link
Collaborator Author

abellina commented Jul 9, 2020

@jlowe @revans2 thanks for the suggestions, please take a look at this rendered version: https://github.com/NVIDIA/spark-rapids/blob/230edc57c084ed2f3d0ca9a6c4e470e4a5eac7fc/docs/configs.md, and see if this is more in line to what you were thinking.

@abellina abellina changed the title WIP: Autogenerate a list of SQL functions that are supported in the GPU Match GPU overwritten functions with SQL functions from FunctionRegistry Jul 9, 2020
@abellina
Copy link
Collaborator Author

abellina commented Jul 9, 2020

I see they ended up in the anchor tag too. Let me change that.

@abellina
Copy link
Collaborator Author

abellina commented Jul 9, 2020

Ok, sorry about that, looks better now.

val sqlFunctions = sqlFunctionsByClass.get(expr.getCanonicalName)
if (sqlFunctions.isDefined) {
val nameWithSqlFunctions = s"${rule.confKey}${sqlFunctions.get.mkString(" (", ", ", ")")}"
rule.confHelp(asTable, nameWithSqlFunctions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like the sql functions with the name of the config. It feels like we are mixing concerns here. It should either be a part of the description or it should be its own separate field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think this is interesting enough on its own to be a separate field, SQL Function or something similar.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

kuhushukla
kuhushukla previously approved these changes Jul 9, 2020
kuhushukla
kuhushukla previously approved these changes Jul 9, 2020
@abellina
Copy link
Collaborator Author

abellina commented Jul 9, 2020

@kuhushukla sorry I am done with my changes :)

Here's what it looks like now: https://github.com/NVIDIA/spark-rapids/blob/e424b117c86f9cac238a880618b3979ee3f2ddc1/docs/configs.md

kuhushukla
kuhushukla previously approved these changes Jul 9, 2020
Copy link
Collaborator

@kuhushukla kuhushukla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

@revans2
Copy link
Collaborator

revans2 commented Jul 9, 2020

I assume that you didn't do this, but it looks like the tables for a lot of the other operators is messed up. The description is not being printed.

@abellina
Copy link
Collaborator Author

abellina commented Jul 9, 2020

@revans2 good eye, it was probably me. Sorry @kuhushukla

@revans2
Copy link
Collaborator

revans2 commented Jul 9, 2020

build

@abellina
Copy link
Collaborator Author

abellina commented Jul 9, 2020

Ok now only the Expressions table changed. Note, I am a little worried that the rendered jekyll site will truncate it further. We may need to widen the pages there, or remove something from the expressions table (like the spark.rapids prefix). I think we want to widen the page before removing the spark.rapids prefix.

@revans2
Copy link
Collaborator

revans2 commented Jul 9, 2020

Removing the spark.rapids prefix makes the table much less usable. I cannot copy and paste any more.

@abellina
Copy link
Collaborator Author

abellina commented Jul 9, 2020

Yeah we can follow up with jekyll style updates, different pr I'd say. #340

@abellina abellina merged commit bc8c03c into NVIDIA:branch-0.2 Jul 9, 2020
@abellina abellina deleted the config_gen branch July 9, 2020 22:04
@sameerz sameerz added this to the Jul 6 - Jul 17 milestone Jul 17, 2020
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…try (NVIDIA#332)

* Autogenerate a list of SQL functions that are supported in the GPU

* Add the sql function seq to the expression config table

* Ensure tags are unchanged, only containing the confKey

* Adding a SQL Functions column to the expressions table

* SQL Function => SQL Function(s)

* Add missing pipe
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…try (NVIDIA#332)

* Autogenerate a list of SQL functions that are supported in the GPU

* Add the sql function seq to the expression config table

* Ensure tags are unchanged, only containing the confKey

* Adding a SQL Functions column to the expressions table

* SQL Function => SQL Function(s)

* Add missing pipe
pxLi pushed a commit to pxLi/spark-rapids that referenced this pull request May 12, 2022
…A#332)

* initial checkin of code with job management with admin commands

* make revisions to address comments in PR

* Update __init__.py

fix ci

* Update file_transfer.py

fix typo

* fix ci
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants