Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar #30625

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -543,27 +543,33 @@ object LikeSimplification extends Rule[LogicalPlan] {
private val equalTo = "([^_%]*)".r

def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
case Like(input, Literal(pattern, StringType), escapeChar) =>
case l @ Like(input, Literal(pattern, StringType), escapeChar) =>
if (pattern == null) {
// If pattern is null, return null value directly, since "col like null" == null.
Literal(null, BooleanType)
} else {
val escapeStr = String.valueOf(escapeChar)
pattern.toString match {
case startsWith(prefix) if !prefix.endsWith(escapeStr) =>
// There are three different situations when pattern containing escapeChar:
// 1. pattern contains invalid escape sequence, e.g. 'm\aca'
// 2. pattern contains escaped wildcard character, e.g. 'ma\%ca'
// 3. pattern contains escaped escape character, e.g. 'ma\\ca'
// Although there are patterns can be optimized if we handle the escape first, we just
// skip this rule if pattern contains any escapeChar for simplicity.
case p if p.contains(escapeChar) => l
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct me if I was wrong: we need to make sure this rule doesn't change result. If the pattern is invalid, Like should fail. However, it's expensive to validate the pattern (need to compile it). Here we use p.contains(escapeChar) as a shortcut to know if the pattern might be invalid.

Can we add some comments to explain it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

case startsWith(prefix) =>
StartsWith(input, Literal(prefix))
case endsWith(postfix) =>
EndsWith(input, Literal(postfix))
// 'a%a' pattern is basically same with 'a%' && '%a'.
// However, the additional `Length` condition is required to prevent 'a' match 'a%a'.
case startsAndEndsWith(prefix, postfix) if !prefix.endsWith(escapeStr) =>
case startsAndEndsWith(prefix, postfix) =>
And(GreaterThanOrEqual(Length(input), Literal(prefix.length + postfix.length)),
And(StartsWith(input, Literal(prefix)), EndsWith(input, Literal(postfix))))
case contains(infix) if !infix.endsWith(escapeStr) =>
case contains(infix) =>
Contains(input, Literal(infix))
case equalTo(str) =>
EqualTo(input, Literal(str))
case _ => Like(input, Literal.create(pattern, StringType), escapeChar)
case _ => l
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -116,4 +116,52 @@ class LikeSimplificationSuite extends PlanTest {
val optimized2 = Optimize.execute(originalQuery2.analyze)
comparePlans(optimized2, originalQuery2.analyze)
}

test("SPARK-33677: LikeSimplification should be skipped if pattern contains any escapeChar") {
val originalQuery1 =
testRelation
.where(('a like "abc%") || ('a like "\\abc%"))
val optimized1 = Optimize.execute(originalQuery1.analyze)
val correctAnswer1 = testRelation
.where(StartsWith('a, "abc") || ('a like "\\abc%"))
.analyze
comparePlans(optimized1, correctAnswer1)

val originalQuery2 =
testRelation
.where(('a like "%xyz") || ('a like "%xyz\\"))
val optimized2 = Optimize.execute(originalQuery2.analyze)
val correctAnswer2 = testRelation
.where(EndsWith('a, "xyz") || ('a like "%xyz\\"))
.analyze
comparePlans(optimized2, correctAnswer2)

val originalQuery3 =
testRelation
.where(('a like ("@bc%def", '@')) || ('a like "abc%def"))
val optimized3 = Optimize.execute(originalQuery3.analyze)
val correctAnswer3 = testRelation
.where(('a like ("@bc%def", '@')) ||
(Length('a) >= 6 && (StartsWith('a, "abc") && EndsWith('a, "def"))))
.analyze
comparePlans(optimized3, correctAnswer3)

val originalQuery4 =
testRelation
.where(('a like "%mn%") || ('a like ("%mn%", '%')))
val optimized4 = Optimize.execute(originalQuery4.analyze)
val correctAnswer4 = testRelation
.where(Contains('a, "mn") || ('a like ("%mn%", '%')))
.analyze
comparePlans(optimized4, correctAnswer4)

val originalQuery5 =
testRelation
.where(('a like "abc") || ('a like ("abbc", 'b')))
val optimized5 = Optimize.execute(originalQuery5.analyze)
val correctAnswer5 = testRelation
.where(('a === "abc") || ('a like ("abbc", 'b')))
.analyze
comparePlans(optimized5, correctAnswer5)
}
}
14 changes: 14 additions & 0 deletions sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -3718,6 +3718,20 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
}
}
}

test("SPARK-33677: LikeSimplification should be skipped if pattern contains any escapeChar") {
withTable("string_tbl") {
sql("CREATE TABLE string_tbl USING parquet SELECT 'm@ca' AS s")
Copy link
Member

@maropu maropu Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you use a temporary table view for better test performance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no temporary table in spark, you mean temporary view?

Copy link
Contributor Author

@luluorta luluorta Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately Like will be constant folded and not passed to LikeSimplification if using temp view here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yea, I meant a temporary view...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luluorta have you tried temp view? AFAIK ConvertToLocalRelation is disabled in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works if I use DataFrame to create temp view.


val e = intercept[AnalysisException] {
sql("SELECT s LIKE 'm%@ca' ESCAPE '%' FROM string_tbl").collect()
}
assert(e.message.contains("the pattern 'm%@ca' is invalid, " +
"the escape character is not allowed to precede '@'"))

checkAnswer(sql("SELECT s LIKE 'm@@ca' ESCAPE '@' FROM string_tbl"), Row(true))
}
}
}

case class Foo(bar: Option[String])