Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for POSIX predefined character classes #5692

Merged

Conversation

anthony-chang
Copy link
Contributor

Closes #4413

This PR adds support for \p{...} for these character classes defined here under "POSIX character classes (US-ASCII only)"

\p{Lower} | A lower-case alphabetic character: [a-z]
\p{Upper} | An upper-case alphabetic character:[A-Z]
\p{ASCII} | All ASCII:[\x00-\x7F]
\p{Alpha} | An alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit} | A decimal digit: [0-9]
\p{Alnum} | An alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct} | Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{\|}~
\p{Graph} | A visible character: [\p{Alnum}\p{Punct}]
\p{Print} | A printable character: [\p{Graph}\x20]
\p{Blank} | A space or a tab: [ \t]
\p{Cntrl} | A control character: [\x00-\x1F\x7F]
\p{XDigit} | A hexadecimal digit: [0-9a-fA-F]
\p{Space} | A whitespace character: [ \t\n\x0B\f\r]

Signed-off-by: Anthony Chang antchang@nvidia.com

Signed-off-by: Anthony Chang <antchang@nvidia.com>
@anthony-chang
Copy link
Contributor Author

build

@anthony-chang anthony-chang self-assigned this May 31, 2022
@sameerz
Copy link
Collaborator

sameerz commented May 31, 2022

@anthony-chang @andygrove We are in burndown for 22.06. Is this meant to target 22.06, or should it be retargeted towards 22.08?

@sameerz sameerz added the feature request New feature or request label May 31, 2022
@sameerz sameerz added this to the May 23 - Jun 3 milestone May 31, 2022
@andygrove
Copy link
Contributor

@anthony-chang @andygrove We are in burndown for 22.06. Is this meant to target 22.06, or should it be retargeted towards 22.08?

Good point. @anthony-chang can you retarget for 22.08

Copy link
Contributor

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please retarget for 22.08

@anthony-chang anthony-chang changed the base branch from branch-22.06 to branch-22.08 May 31, 2022 17:59
Signed-off-by: Anthony Chang <antchang@nvidia.com>
@anthony-chang
Copy link
Contributor Author

This is blocked until the fix in #5610 is merged

@andygrove
Copy link
Contributor

This is blocked until the fix in #5610 is merged

@anthony-chang this has now been merged

@andygrove andygrove dismissed their stale review June 3, 2022 16:17

PR has been retargeted to 22.08 now

…six-character-classes

Signed-off-by: Anthony Chang <antchang@nvidia.com>
…six-character-classes

Signed-off-by: Anthony Chang <antchang@nvidia.com>
@anthony-chang
Copy link
Contributor Author

build

…six-character-classes

Signed-off-by: Anthony Chang <antchang@nvidia.com>
doTranspileTest("\\p{Alnum}", "[a-zA-Z0-9]")
doTranspileTest("\\p{Punct}", "[!\"#$%&'()*+,\\-./:;<=>?@\\^_`{|}~\\[\\]]")
doTranspileTest("\\p{Print}", "[a-zA-Z0-9!\"#$%&'()*+,\\-./:;<=>?@\\^_`{|}~\\[\\]\u0020]")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that only a subset are tested here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this one is mostly a sanity check since the \p{Print} transpilation recursively uses the definition of the 4 \p{} classes above it.
I tested the full set in the integration tests.

Signed-off-by: Anthony Chang <antchang@nvidia.com>
andygrove
andygrove previously approved these changes Jun 6, 2022
@anthony-chang
Copy link
Contributor Author

build

Signed-off-by: Anthony Chang <antchang@nvidia.com>
@anthony-chang
Copy link
Contributor Author

build

@anthony-chang
Copy link
Contributor Author

I've removed _ from the Scala fuzz tests until a fix for rapidsai/cudf#11062 is made.

NVnavkumar
NVnavkumar previously approved these changes Jun 7, 2022
Signed-off-by: Anthony Chang <antchang@nvidia.com>
@anthony-chang
Copy link
Contributor Author

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add support for POSIX characters in regular expressions
4 participants