We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug regexp_extract(concat(....), ) produces incorrect results in GPU runs, when children of concat contains at least one column vector.
regexp_extract(concat(....), )
Steps/Code to reproduce bug
val df = (1 to 10).toDF("a") spark.conf.set("spark.rapids.sql.regexp.enabled", "true") df.coalesce(1).select(regexp_extract(concat(col("a"), lit("a")), "(a)", 1)).collect()
GPU results: Array([], [], [], [], [], [], [], [], [], []) CPU results: Array([a], [a], [a], [a], [a], [a], [a], [a], [a], [a])
Array([], [], [], [], [], [], [], [], [], [])
Array([a], [a], [a], [a], [a], [a], [a], [a], [a], [a])
For above query, GPU works correctly only when column(a) outputs an empty string.
column(a)
The text was updated successfully, but these errors were encountered:
This has nothing to do with concat. It appears that regexp_extract on the CPU is doing a find, where as on the GPU it is doing a full match.
regexp_extract
val df =Seq("1a", "2a", "3a", "4a", "5a", "6a", "7a", "8a", "9a", "10a").toDF("c") df.coalesce(1).select(regexp_extract(col("c"), "(a)", 1)).collect()
shows the same results, but changing the regular expression to be ".*(a).*" produces the same result for both the CPU and the GPU.
".*(a).*"
Sorry, something went wrong.
Close this issue since it is included in #5135
No branches or pull requests
Describe the bug
regexp_extract(concat(....), )
produces incorrect results in GPU runs, when children of concat contains at least one column vector.Steps/Code to reproduce bug
GPU results:
Array([], [], [], [], [], [], [], [], [], [])
CPU results:
Array([a], [a], [a], [a], [a], [a], [a], [a], [a], [a])
For above query, GPU works correctly only when
column(a)
outputs an empty string.The text was updated successfully, but these errors were encountered: