Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp_replace with back-references should fall back to CPU #4556

Merged

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Jan 18, 2022

Closes #4503 and #4559

Follow-on issue to support back-references on the GPU: #4557

@andygrove andygrove added the bug Something isn't working label Jan 18, 2022
@andygrove andygrove added this to the Jan 10 - Jan 28 milestone Jan 18, 2022
@andygrove andygrove self-assigned this Jan 18, 2022
@andygrove andygrove changed the title Regexp replace backref fallback regexp_replace with back-references should fall back to CPU Jan 18, 2022
revans2
revans2 previously approved these changes Jan 18, 2022
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not thrilled with using a regular expression to find the back references, but I could not figure out a better way to do it either.


expr.rep match {
case Literal(s: UTF8String, DataTypes.StringType) if s != null =>
val backrefPattern = Pattern.compile("\\$[0-9]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this can create false-positives. A Spark regex replace string of "\\$1" isn't asking for a group but this pattern matcher will think it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was working on a fix for this but that exposed another issue that I am going to have to fix first, so I will mark this PR as a draft for now.

@andygrove andygrove marked this pull request as draft January 18, 2022 23:36
@andygrove andygrove changed the title regexp_replace with back-references should fall back to CPU WIP: regexp_replace with back-references should fall back to CPU Jan 18, 2022
@andygrove andygrove changed the title WIP: regexp_replace with back-references should fall back to CPU regexp_replace with back-references should fall back to CPU Jan 19, 2022
@andygrove andygrove marked this pull request as ready for review January 19, 2022 21:55
@andygrove
Copy link
Contributor Author

@revans2 @jlowe This is ready for another review when you have time

@jlowe
Copy link
Member

jlowe commented Jan 19, 2022

build

@andygrove andygrove merged commit f89b19e into NVIDIA:branch-22.02 Jan 20, 2022
@andygrove andygrove deleted the regexp-replace-backref-fallback branch January 20, 2022 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants