Skip to content

Commit

Permalink
Add documentation for null character in \p{Cntrl}
Browse files Browse the repository at this point in the history
Signed-off-by: Anthony Chang <antchang@nvidia.com>
  • Loading branch information
anthony-chang committed Jun 7, 2022
1 parent 3fdf8fc commit fd8b186
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 1 deletion.
3 changes: 2 additions & 1 deletion docs/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -586,7 +586,8 @@ These are the known edge cases where running on the GPU will produce different r
next to a newline or a repetition that produces zero or more results
([#5610](https://github.com/NVIDIA/spark-rapids/pull/5610))`
- The character class `\p{ASCII}` matches only `[\x01-\x7F]` as opposed to Java's definition which matches `[\x00-\x7F]`,
since null characters are not currently supported
since null characters are not currently supported. Similarily, `\p{Cntrl}` matches only `[\x01-\x1F\x7F]` as
opposed to Java's `[\x00-\x1F\x7F]`

The following regular expression patterns are not yet supported on the GPU and will fall back to the CPU.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -450,6 +450,7 @@ class RegexParser(pattern: String) {
case "Blank" =>
ListBuffer(RegexChar(' '), RegexEscaped('t'))
case "Cntrl" =>
// should be \u0001-\u001f but we do not support the null terminator \u0000
ListBuffer(RegexCharacterRange(RegexChar('\u0001'), RegexChar('\u001f')),
RegexChar('\u007f'))
case "XDigit" =>
Expand Down

0 comments on commit fd8b186

Please sign in to comment.