Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GpuSequence refactor[databricks] #4520

Merged
merged 4 commits into from
Jan 20, 2022

Conversation

firestarman
Copy link
Collaborator

@firestarman firestarman commented Jan 13, 2022

This refactor mainly covers:

  • Merge the GpuSequence and GpuSequenceWithStep into one GpuSequence to avoid some duplicate code. And get ready for the shim layers that will come with time/date types support.
  • Let GpuSequence extends from GpuExpression instead of the original binary or ternary expression to handle the children evaluation directly.
  • Follow the Spark algorithm to compute the sequence lengths, supporting the same illegal inputs check.
  • Update the tests, and add the tests for illegal cases.

Fixes #4506
Fixes #4499

Signed-off-by: Firestarman firestarmanllc@gmail.com

And update the tests

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman requested review from wbo4958 and sperlingxx and removed request for wbo4958 January 13, 2022 08:46
@firestarman
Copy link
Collaborator Author

build

@sameerz sameerz added the bug Something isn't working label Jan 13, 2022
@sameerz sameerz added this to the Jan 10 - Jan 28 milestone Jan 13, 2022
withResource(stops) { _ =>
closeOnExcept(starts) { _ =>
closeOnExcept(nonNullOption) { _ =>
closeOnExcept(stepsOption.getOrElse(defaultSteps(starts, stops))) { steps =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to keep the defaultSteps in the memory?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a must, I think.
If there is an improvement, I want to handle it in the following PR.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

@github-actions
Copy link

👎 Promotion blocked, new vulnerability found

Vulnerability report

Component Vulnerability Description Severity
Protocol Buffer Java API CVE-2021-22569 An issue in protobuf-java allowed the interleaving of com.google.protobuf.UnknownFieldSet fields in such a way that would be processed out of order. A small malicious payload can occupy the parser for several minutes by creating large numbers of short-lived objects that cause frequent, repeated pauses. We recommend upgrading libraries beyond the vulnerable versions. MEDIUM

@firestarman
Copy link
Collaborator Author

build

@firestarman firestarman changed the title GpuSequence refactor[databricks] GpuSequence refactor Jan 18, 2022
@firestarman
Copy link
Collaborator Author

build

@firestarman firestarman changed the title GpuSequence refactor GpuSequence refactor[databricks] Jan 18, 2022
@firestarman
Copy link
Collaborator Author

build

@firestarman
Copy link
Collaborator Author

Add [databricks] back which was removed before due to #4548

(IntegerGen(min_val=-10, max_val=20, special_cases=[]),
IntegerGen(min_val=20, max_val=50, special_cases=[]),
IntegerGen(min_val=1, max_val=5, special_cases=[])),
(LongGen(min_val=-10, max_val=20, special_cases=[None]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like your test case can't cover below scenario?

start, stop, step
2, 10, 1
10, 1, -1
2, 2, 0

Copy link
Collaborator Author

@firestarman firestarman Jan 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does.
You can refer to the gens in sequence_normal_integral_gens. There are comments to describe each case indivdually.

Copy link
Collaborator Author

@firestarman firestarman Jan 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, you mean the 3 cases mixed in a single dataset. Added test_sequence_with_step_mixed_cases.

@pxLi
Copy link
Collaborator

pxLi commented Jan 18, 2022

failed blossom PVC lock. let me retrigger

@pxLi
Copy link
Collaborator

pxLi commented Jan 18, 2022

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default steps handling via a lazy val of a function is a bit odd, but I don't think it's a blocker for this to be merged.

@firestarman
Copy link
Collaborator Author

Default steps handling via a lazy val of a function is a bit odd, but I don't think it's a blocker for this to be merged.

The first thought of this is to avoid running the case match for each batch. Different element types may have different default steps. I will update this accordingly in the future when supporting date/timestamp.

@firestarman firestarman merged commit 3c59706 into NVIDIA:branch-22.02 Jan 20, 2022
@firestarman firestarman deleted the sequence-dt branch January 20, 2022 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA]Improve GpuSequence [BUG]GpuSequence blows up when nulls exist in any of the inputs (start, stop, step)
5 participants