Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add limited support for captured vars and athrow #5487

Merged
merged 4 commits into from
May 17, 2022

Conversation

seanprime7
Copy link
Contributor

Only the primitive type variables captured from a method are supported.
athrow is supported only with SparkException and is converted to raise_error.

Following additional changes have also been made:

  • A check to reject lambdas with void return type has been added.
  • An operand stack bug fix for consturctor calls has been added.

This commit also simplifies the code that handles method calls.

Signed-off-by: Sean Lee selee@nvidia.com

Only the primitive type variables captured from a method are supported.
athrow is supported only with SparkException and is converted to raise_error.

Following additional changes have also been made:
* A check to reject lambdas with void return type has been added.
* An operand stack bug fix for consturctor calls has been added.

This commit also simplifies the code that handles method calls.

Signed-off-by: Sean Lee <selee@nvidia.com>
@seanprime7 seanprime7 requested a review from abellina May 13, 2022 18:02
@seanprime7 seanprime7 marked this pull request as draft May 13, 2022 18:03
@seanprime7
Copy link
Contributor Author

I have created it as a draft PR because we need to decide what to do about RaiseError.
The changes in the PR maps SparkException to RaiseError, which throws RuntimeException instead of SparkException
This affects users who expect SparkException to be thrown instead of RuntimeException

Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass thinking about the type of catalyst exception we should convert the exception to.

// Empty the stack and convert the internal representation of
// org.apache.spark.SparkException object to RaiseError, then push it to the
// stack.
State(locals, List(RaiseError(top.asInstanceOf[Repr.SparkExcept].message)), cond, expr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe one approach here would be to code up a new catalyst node RaiseErrorV2 or RaiseSpecificError or something like this, where it has the CPU eval and doGenCode, as it would be very much like RaiseError, but it has an extra parameter which is the specific type of exception to throw. It would then instantiate this exception and throw the specific type.

For the gpu, we could replace this new node with the GPU specific type, and for the CPU it would work the way the caller intended. @jlowe @revans2 for some 👀

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @revans2 we were thinking it's OK if the exceptions are not the same for now. Could we make a note of this in: https://github.com/NVIDIA/spark-rapids/blob/branch-22.06/docs/additional-functionality/udf-to-catalyst-expressions.md?

We may need to revisit this if someone needs SparkException thrown in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated udf-to-catalyst-expressions.md

Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abellina
Copy link
Collaborator

@seanprime7 ok to take this PR off draft mode?

@abellina
Copy link
Collaborator

build

@seanprime7 seanprime7 marked this pull request as ready for review May 17, 2022 18:08
@seanprime7
Copy link
Contributor Author

took this PR off draft mode.

@abellina
Copy link
Collaborator

Thanks @seanprime7

@abellina abellina merged commit 5f449cd into branch-22.06 May 17, 2022
@abellina abellina deleted the seanprime7/udf-compiler-athrow branch May 17, 2022 19:40
anthony-chang pushed a commit to anthony-chang/spark-rapids that referenced this pull request May 17, 2022
* Add limited support for captured vars and athrow

Only the primitive type variables captured from a method are supported.
athrow is supported only with SparkException and is converted to raise_error.

Following additional changes have also been made:
* A check to reject lambdas with void return type has been added.
* An operand stack bug fix for consturctor calls has been added.

This commit also simplifies the code that handles method calls.

Signed-off-by: Sean Lee <selee@nvidia.com>

* Update copyright years

* Catch RuntimeException instead of Throwable

* Update udf-to-catalyst doc
@sameerz sameerz added the feature request New feature or request label May 19, 2022
@sameerz sameerz added this to the May 2 - May 20 milestone May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants