Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar #30625

Closed
wants to merge 3 commits into from

Conversation

luluorta
Copy link
Contributor

@luluorta luluorta commented Dec 6, 2020

What changes were proposed in this pull request?

LikeSimplification rule does not work correctly for many cases that have patterns containing escape characters, for example:

SELECT s LIKE 'm%aca' ESCAPE '%' FROM t
SELECT s LIKE 'maacaa' ESCAPE 'a' FROM t

For simpilicy, this PR makes this rule just be skipped if pattern contains any escapeChar.

Why are the changes needed?

Result corrupt.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added Unit test.

@github-actions github-actions bot added the SQL label Dec 6, 2020
@luluorta
Copy link
Contributor Author

luluorta commented Dec 6, 2020

cc @cloud-fan @leanken

@SparkQA
Copy link

SparkQA commented Dec 6, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36891/

@SparkQA
Copy link

SparkQA commented Dec 6, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36891/

@SparkQA
Copy link

SparkQA commented Dec 6, 2020

Test build #132290 has finished for PR 30625 at commit 95713ed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if (pattern == null) {
// If pattern is null, return null value directly, since "col like null" == null.
Literal(null, BooleanType)
} else {
val escapeStr = String.valueOf(escapeChar)
pattern.toString match {
case _ if escapeChar == '_' || escapeChar == '%' => l
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests in LikeSimplificationSuite, too? Btw, does this simplification work correctly if escapeChar is not a default one other than _ and %?

Copy link
Contributor Author

@luluorta luluorta Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! I think this rule works correctly only if pattern is valid and escapeChar does not escape itself. For simplicity, maybe it's better to skip it if pattern constains any escapeChar. I will fix this and change the title/description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh... I see...


test("SPARK-33677: LikeSimplification should be skipped if escape is a wildcard character") {
val e = intercept[AnalysisException] {
sql("SELECT string(rand()) LIKE 'm%aca' ESCAPE '%'").collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the problem is, we should only apply like simplification if the pattern is valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule may not work even for the case that has a valid pattern: SELECT s LIKE 'maacaa' ESCAPE 'a' FROM t

@luluorta luluorta changed the title [SPARK-33677][SQL] Skip LikeSimplification rule if escape is a wildcard character [SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar Dec 7, 2020
@SparkQA
Copy link

SparkQA commented Dec 7, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36933/

pattern.toString match {
case startsWith(prefix) if !prefix.endsWith(escapeStr) =>
case p if p.contains(escapeChar) => l
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct me if I was wrong: we need to make sure this rule doesn't change result. If the pattern is invalid, Like should fail. However, it's expensive to validate the pattern (need to compile it). Here we use p.contains(escapeChar) as a shortcut to know if the pattern might be invalid.

Can we add some comments to explain it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SparkQA
Copy link

SparkQA commented Dec 7, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36933/

@SparkQA
Copy link

SparkQA commented Dec 7, 2020

Test build #132333 has finished for PR 30625 at commit 65e0d63.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 7, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36966/


test("SPARK-33677: LikeSimplification should be skipped if pattern contains any escapeChar") {
withTable("string_tbl") {
sql("CREATE TABLE string_tbl USING parquet SELECT 'm@ca' AS s")
Copy link
Member

@maropu maropu Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you use a temporary table view for better test performance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no temporary table in spark, you mean temporary view?

Copy link
Contributor Author

@luluorta luluorta Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately Like will be constant folded and not passed to LikeSimplification if using temp view here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yea, I meant a temporary view...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luluorta have you tried temp view? AFAIK ConvertToLocalRelation is disabled in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works if I use DataFrame to create temp view.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@SparkQA
Copy link

SparkQA commented Dec 7, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36966/

@SparkQA
Copy link

SparkQA commented Dec 7, 2020

Test build #132366 has finished for PR 30625 at commit 765591c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 8, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37001/

@SparkQA
Copy link

SparkQA commented Dec 8, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37001/

@SparkQA
Copy link

SparkQA commented Dec 8, 2020

Test build #132401 has finished for PR 30625 at commit 58b8bd2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Dec 8, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37013/

@SparkQA
Copy link

SparkQA commented Dec 8, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37013/

@SparkQA
Copy link

SparkQA commented Dec 8, 2020

Test build #132413 has finished for PR 30625 at commit 58b8bd2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

maropu pushed a commit that referenced this pull request Dec 8, 2020
…ny escapeChar

### What changes were proposed in this pull request?
`LikeSimplification` rule does not work correctly for many cases that have patterns containing escape characters, for example:

`SELECT s LIKE 'm%aca' ESCAPE '%' FROM t`
`SELECT s LIKE 'maacaa' ESCAPE 'a' FROM t`

For simpilicy, this PR makes this rule just be skipped if `pattern` contains any `escapeChar`.

### Why are the changes needed?
Result corrupt.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added Unit test.

Closes #30625 from luluorta/SPARK-33677.

Authored-by: luluorta <luluorta@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit 99613cd)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@maropu maropu closed this in 99613cd Dec 8, 2020
maropu pushed a commit that referenced this pull request Dec 8, 2020
…ny escapeChar

### What changes were proposed in this pull request?
`LikeSimplification` rule does not work correctly for many cases that have patterns containing escape characters, for example:

`SELECT s LIKE 'm%aca' ESCAPE '%' FROM t`
`SELECT s LIKE 'maacaa' ESCAPE 'a' FROM t`

For simpilicy, this PR makes this rule just be skipped if `pattern` contains any `escapeChar`.

### Why are the changes needed?
Result corrupt.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added Unit test.

Closes #30625 from luluorta/SPARK-33677.

Authored-by: luluorta <luluorta@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit 99613cd)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@maropu
Copy link
Member

maropu commented Dec 8, 2020

Thanks! Merged to master/branch-3.1/branch-3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants