-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Support for Lazy Quantifier [databricks] #10208
Added Support for Lazy Quantifier [databricks] #10208
Conversation
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
I will be adding more fuzzer tests |
build |
Databricks cluster failed to start. [2024-01-17T21:16:34.414Z] subprocess.CalledProcessError: Command 'ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -p 2200 -i **** ubuntu@4.155.42.241 'SPARKSRCTGZ=/home/ubuntu/spark-rapids-ci.tgz BASE_SPARK_VERSION=3.2.1 BASE_SPARK_VERSION_TO_INSTALL_DATABRICKS_JARS=3.2.1 MVN_OPT= EXTRA_ENVS= bash /home/ubuntu/build.sh 2>&1 | tee buildout; if [ |
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RegexParser.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I ran this locally and manually verified that the results from the integration test looked correct. I also tried some different datagen patterns to see if I could break the test but did not find anything.
…scala Co-authored-by: Andy Grove <andygrove73@gmail.com>
build |
If Databricks keeps failing I will just run the test manually and skip Databricks from the premerge |
build |
2 similar comments
build |
build |
Looking at recent blossom failure:
It looks like cuDF doesn't support the |
Thanks for looking into this. I am struggling to see why the pattern is being generated |
It's because it's actually a valid pattern in Java (and Python fwiw). CUDF is the only thing throwing an invalid pattern here. Since it's valid in Java, and we have not detected to fallback to CPU, that's why you are getting the test error here. |
I was able to get past the failing test with this diff. Note that this is only checking the right side of the choice and we would need to look at the left as well.
|
build |
Thanks @andygrove |
There are more test failures, I am looking into them |
// rr = lazyQuantifier inside a choice | ||
(_, RegexSequence(ListBuffer(RegexRepetition( | ||
RegexRepetition(_, SimpleQuantifier('?')), SimpleQuantifier('?'))))) => | ||
throw new RegexUnsupportedException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are creating exceptions, we need to document these in https://github.com/NVIDIA/spark-rapids/blob/branch-24.02/docs/compatibility.md#regular-expressions
build |
(ll, rr) match { | ||
// ll = lazyQuantifier inside a choice | ||
case (RegexSequence(ListBuffer(RegexRepetition( | ||
RegexRepetition(_, SimpleQuantifier('?')), SimpleQuantifier('?')))), _) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent seems off here? I would expect this second line to be indented more that the case
in the prev line. Same comment for other uses of this pattern in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Lett a comment about formatting.
This PR adds the lazy quantifier to the RegexParser
Changes Made:
Tests:
Integration tests were run locally
fixes #8806