Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Disallow non-collapsable subselects with ORDER BY #72991

Merged
merged 8 commits into from
May 14, 2021

Conversation

matriv
Copy link
Contributor

@matriv matriv commented May 12, 2021

Ordering an already pre-ordered and limited subselect is not allowed,
as such queries cannot be collapsed and translated into query DSL, but
the require an extra ordering step on top of the results returned
internally by the search/agg query.

Fixes: #71158

Ordering an already pre-ordered and limited subselect is not allowed,
as such queries cannot be collapsed and translated into query DSL, but
the require an extra ordering step on top of the results returned
internally by the search/agg query.

Fixes: elastic#71158
@matriv matriv requested review from costin, astefan and bpintea May 12, 2021 16:51
@matriv matriv marked this pull request as ready for review May 12, 2021 16:51
@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label May 12, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

import static org.elasticsearch.xpack.sql.SqlTestUtils.TEST_CFG;
import static org.elasticsearch.xpack.sql.types.SqlTypesTests.loadMapping;

public class VerifierErrorTests extends ESTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently errors are handled also inside VerifierTests through the error method (iirc); the reason being the test checks both things that pass and those that do not and it's easier to see both in one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VerifierTests is in eql, should I just rename it VerifierTests also for the sql class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say so yes, easier to find it since the difference is just in the packages.

TEST_CFG,
new SqlFunctionRegistry(),
indexResolution,
new org.elasticsearch.xpack.sql.analysis.analyzer.Verifier(new Metrics())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's with the fqn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was automatic from IDE, since we are in the sql.planner package, to avoid mixing with that Verifier (which we actually test here), but will change.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

Comment on lines 65 to 77
if (hasLimit.get() == false && p instanceof LimitExec) {
hasLimit.set(Boolean.TRUE);
return;
}
if (orderBy.get() == null && p instanceof OrderExec) {
orderBy.set(((OrderExec) p).order());
return;
}
if (hasLimit.get() && orderBy.get() != null && p instanceof OrderExec) {
if (((OrderExec) p).order().equals(orderBy.get()) == false) {
failures.add(fail(p, "Cannot use ORDER BY on top of a subquery with ORDER BY and LIMIT"));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional compacting.

Suggested change
if (hasLimit.get() == false && p instanceof LimitExec) {
hasLimit.set(Boolean.TRUE);
return;
}
if (orderBy.get() == null && p instanceof OrderExec) {
orderBy.set(((OrderExec) p).order());
return;
}
if (hasLimit.get() && orderBy.get() != null && p instanceof OrderExec) {
if (((OrderExec) p).order().equals(orderBy.get()) == false) {
failures.add(fail(p, "Cannot use ORDER BY on top of a subquery with ORDER BY and LIMIT"));
}
}
if (p instanceof LimitExec) {
if (hasLimit.get() == false) {
hasLimit.set(true); // I'd use either objects or primitives for this and above statement
}
} else if (p instanceof OrderExec) {
if (orderBy.get() == null) {
orderBy.set(((OrderExec) p).order());
} else if (hasLimit.get()) {
if (((OrderExec) p).order().equals(orderBy.get()) == false) {
failures.add(fail(p, "Cannot use ORDER BY on top of a subquery with ORDER BY and LIMIT"));
}
}
}

assertEquals(
"1:96: Cannot use ORDER BY on top of a subquery with ORDER BY and LIMIT",
error(
"SELECT * FROM (" + "SELECT max(int) AS max, bool FROM test GROUP BY bool ORDER BY max ASC LIMIT 10) " + "ORDER BY max DESC"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the string split meant to emphasise the subselect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but the formatter made it an one-liner, will fix.

}
if (hasLimit.get() && orderBy.get() != null && p instanceof OrderExec) {
if (((OrderExec) p).order().equals(orderBy.get()) == false) {
failures.add(fail(p, "Cannot use ORDER BY on top of a subquery with ORDER BY and LIMIT"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the error message descriptive enough, tho? That is actually possible[*], just not in any (i.e. "conflicting") configuration, right? Subselects are a gray area anyways, so I guess this could also remain like this, but alternatively we could add some notes in the limitations docs or somehow rephrase this message -- up to you.

[*] Didn't run it on the new code, but I guess this should work: SELECT * FROM (SELECT * FROM test_emp ORDER BY emp_no LIMIT 10) ORDER BY emp_no LIMIT 5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it passes, since it's the exact same order By so it can be executed once and then merge the limits to 5: https://github.com/elastic/elasticsearch/pull/72991/files#diff-147ac28e10bf6b2342a276203e3b250ccd593163c3aa087ee6d72ed19572c163R147

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the error message, imho we shouldn't add more details, about not supporting an ORDER BY on top of the result set returned internally, I think it will just complicate things, and regarding the limitations we already state about supporting queries that can be "flattened".

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that I fully understand the scenario that we are avoiding here...
You compare the ORDER BYs basically from two different plans that contain each other using if (((OrderExec) p).order().equals(orderBy.get()) == false). The equals method of Order covers the direction (ASC/DESC), on what the ordering is applied (the expression itself) and the nulls position.
Two comments:

  • NULLS are not covered by the tests. There is only one passing test using NULLS FIRST.
  • I don't understand why this test passes SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM test_emp ORDER BY emp_no DESC) ORDER BY emp_no ASC NULLS LAST) ORDER BY emp_no DESC NULLS LAST LIMIT 12) ORDER BY emp_no DESC NULLS FIRST if the nulls position is different in the outermost query and the first inner query. I must be missing the exact scenario that we are forbidding.

@matriv
Copy link
Contributor Author

matriv commented May 13, 2021

@astefan Thx a lot for catching this extra case, in fact when everything inside the hierarchy is flattened the actual ORDER BY applied, is the last one (top of hierarchy) before encountering the next ORDER BY that will break things (because a LIMIT is also already present).

Added more tests for verifying those un-supported queries and a couple more integ tests that also include nulls.

@matriv matriv requested a review from astefan May 13, 2021 13:43
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@matriv matriv merged commit a5a20ae into elastic:master May 14, 2021
@matriv matriv deleted the fix-71158 branch May 14, 2021 13:00
matriv added a commit to matriv/elasticsearch that referenced this pull request May 14, 2021
Ordering an already pre-ordered and limited subselect is not allowed,
as such queries cannot be collapsed and translated into query DSL, but
the require an extra ordering step on top of the results returned
internally by the search/agg query.

Fixes: elastic#71158
(cherry picked from commit a5a20ae)
matriv added a commit to matriv/elasticsearch that referenced this pull request May 14, 2021
Ordering an already pre-ordered and limited subselect is not allowed,
as such queries cannot be collapsed and translated into query DSL, but
the require an extra ordering step on top of the results returned
internally by the search/agg query.

Fixes: elastic#71158
(cherry picked from commit a5a20ae)
matriv added a commit that referenced this pull request May 14, 2021
Ordering an already pre-ordered and limited subselect is not allowed,
as such queries cannot be collapsed and translated into query DSL, but
the require an extra ordering step on top of the results returned
internally by the search/agg query.

Fixes: #71158
(cherry picked from commit a5a20ae)
matriv added a commit that referenced this pull request May 14, 2021
Ordering an already pre-ordered and limited subselect is not allowed,
as such queries cannot be collapsed and translated into query DSL, but
the require an extra ordering step on top of the results returned
internally by the search/agg query.

Fixes: #71158
(cherry picked from commit a5a20ae)
matriv added a commit that referenced this pull request May 14, 2021
Ordering an already pre-ordered and limited subselect is not allowed,
as such queries cannot be collapsed and translated into query DSL, but
the require an extra ordering step on top of the results returned
internally by the search/agg query.

Fixes: #71158
(cherry picked from commit a5a20ae)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: Multiple ORDER BY and LIMIT on top of each other leads to incorrect results
7 participants