Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Allow } and }} to be transpiled to static strings #9224

Closed
revans2 opened this issue Sep 12, 2023 · 2 comments · Fixed by #9239
Closed

[FEA] Allow } and }} to be transpiled to static strings #9224

revans2 opened this issue Sep 12, 2023 · 2 comments · Fixed by #9239
Assignees
Labels
feature request New feature or request

Comments

@revans2
Copy link
Collaborator

revans2 commented Sep 12, 2023

Is your feature request related to a problem? Please describe.
We recently had a customer that is trying to use } and }} for splitting very long strings. (See rapidsai/cudf#14087) Our transpiler didn't catch this as being the same as a static string because they are not escaped. But because they do not have a matching { character, they end up being treated as if they were escaped when processing the regular expression. We should update this if possible. Might be good to look at other special characters that need to match like ] and ).

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Sep 12, 2023
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Sep 12, 2023
@NVnavkumar
Copy link
Collaborator

Should mention that the related customer issue manifests in a hang, so this is a little more complicated than what might be described.

We should certainly update the simple string transpilation optimization that avoids regex to handle cases like these. However, for hanging )]} in valid regexes, these might need to be escaped in the transpiler if we detect a non-matching pair, which will require detecting matching pairs in the transpiler for valid regexes.

@NVnavkumar
Copy link
Collaborator

A couple of interesting notes. On the CPU, Spark uses java.util.regex.Pattern to perform regular expression operations, so a these cases throw exceptions on the CPU:

  • ( - "Unclosed group"
  • ) - "Unmatched closing ')'"
  • { - "Illegal repetition"
  • [ - "Unclosed character class"

So really need to handle ] and ]] and } and }} mostly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants