[BUG] regexp_extract doesn't work correctly with concat #5088

sperlingxx · 2022-03-30T08:38:18Z

Describe the bug
regexp_extract(concat(....), ) produces incorrect results in GPU runs, when children of concat contains at least one column vector.

Steps/Code to reproduce bug

val df = (1 to 10).toDF("a")
spark.conf.set("spark.rapids.sql.regexp.enabled", "true")
df.coalesce(1).select(regexp_extract(concat(col("a"), lit("a")), "(a)", 1)).collect()

GPU results: Array([], [], [], [], [], [], [], [], [], [])
CPU results: Array([a], [a], [a], [a], [a], [a], [a], [a], [a], [a])

For above query, GPU works correctly only when column(a) outputs an empty string.

The text was updated successfully, but these errors were encountered:

revans2 · 2022-03-30T15:14:33Z

This has nothing to do with concat. It appears that regexp_extract on the CPU is doing a find, where as on the GPU it is doing a full match.

val df =Seq("1a", "2a", "3a", "4a", "5a", "6a", "7a", "8a", "9a", "10a").toDF("c")
df.coalesce(1).select(regexp_extract(col("c"), "(a)", 1)).collect()

shows the same results, but changing the regular expression to be ".*(a).*" produces the same result for both the CPU and the GPU.

sperlingxx · 2022-04-02T08:02:49Z

Close this issue since it is included in #5135

sperlingxx added bug Something isn't working ? - Needs Triage Need team to review and classify labels Mar 30, 2022

sperlingxx mentioned this issue Mar 30, 2022

[FEA] Enable regular expressions by default #4509

Open

61 tasks

sperlingxx mentioned this issue Apr 2, 2022

[BUG] GpuRegExExtract is not align with RegExExtract #5135

Closed

sperlingxx closed this as completed Apr 2, 2022

sameerz added duplicate This issue or pull request already exists and removed ? - Needs Triage Need team to review and classify labels Apr 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] regexp_extract doesn't work correctly with concat #5088

[BUG] regexp_extract doesn't work correctly with concat #5088

sperlingxx commented Mar 30, 2022 •

edited

Loading

revans2 commented Mar 30, 2022 •

edited

Loading

sperlingxx commented Apr 2, 2022

[BUG] regexp_extract doesn't work correctly with concat #5088

[BUG] regexp_extract doesn't work correctly with concat #5088

Comments

sperlingxx commented Mar 30, 2022 • edited Loading

revans2 commented Mar 30, 2022 • edited Loading

sperlingxx commented Apr 2, 2022

sperlingxx commented Mar 30, 2022 •

edited

Loading

revans2 commented Mar 30, 2022 •

edited

Loading