Skip to content

Commit

Permalink
Add tests and docs for regexp() and regexp_like()
Browse files Browse the repository at this point in the history
Signed-off-by: Andy Grove <andygrove@nvidia.com>
  • Loading branch information
andygrove committed Nov 12, 2021
1 parent 7d3629f commit 42ff718
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 3 deletions.
11 changes: 8 additions & 3 deletions docs/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,10 +271,15 @@ The RAPIDS Accelerator for Apache Spark currently supports string literal matche
matches for the `regexp_replace` function and will fall back to CPU if a regular expression pattern
is provided.

### RLike
### RLike, regexp(), and regexp_like()

The GPU implementation of `RLike` has the following known issues where behavior is not consistent with Apache Spark and
this expression is disabled by default. It can be enabled setting `spark.rapids.sql.expression.RLike=true`.
The GPU implementation of `RLike` is disabled by default. It can be enabled by setting
`spark.rapids.sql.expression.RLike=true`.

Apache Spark 3.2.0 introduces the `regexp` and `regexp_like` functions which are equivalent to the `RLike`
expression and are also supported on GPU when `RLike` is enabled on the GPU.

`RLike` is disabled by default due to the following known issues where behavior is not consistent with Apache Spark.

- `$` does not match the end of string if the string ends with a line-terminator
([cuDF issue #9620](https://github.com/rapidsai/cudf/issues/9620))
Expand Down
22 changes: 22 additions & 0 deletions integration_tests/src/main/python/string_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -469,7 +469,29 @@ def test_like_complex_escape():
'a like "\\%SystemDrive\\%\\\\\\\\Users%"',
'a like "_oo"'),
conf={'spark.sql.parser.escapedStringLiterals': 'true'})

@pytest.mark.skipif(not is_before_spark_320() or is_databricks91_or_later(), reason='regexp is synonym for RLike starting in Spark 3.2.0')
def test_regexp():
gen = mk_str_gen('[abcd]{1,3}')
assert_gpu_and_cpu_are_equal_collect(
lambda spark: unary_op_df(spark, gen).selectExpr(
'regexp(a, "a{2}")',
'regexp(a, "a{1,3}")',
'regexp(a, "a{1,}")',
'regexp(a, "a[bc]d")'),
conf={'spark.rapids.sql.expression.RLike': 'true'})

@pytest.mark.skipif(not is_before_spark_320() or is_databricks91_or_later(), reason='regexp_like is synonym for RLike starting in Spark 3.2.0')
def test_regexp_like():
gen = mk_str_gen('[abcd]{1,3}')
assert_gpu_and_cpu_are_equal_collect(
lambda spark: unary_op_df(spark, gen).selectExpr(
'regexp_like(a, "a{2}")',
'regexp_like(a, "a{1,3}")',
'regexp_like(a, "a{1,}")',
'regexp_like(a, "a[bc]d")'),
conf={'spark.rapids.sql.expression.RLike': 'true'})

def test_rlike():
gen = mk_str_gen('[abcd]{1,3}')
assert_gpu_and_cpu_are_equal_collect(
Expand Down

0 comments on commit 42ff718

Please sign in to comment.