Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature configurable calcite bloat #16248

Merged
merged 5 commits into from
May 6, 2024

Conversation

nozjkoitop
Copy link
Contributor

@nozjkoitop nozjkoitop commented Apr 9, 2024

Description

This pull request introduces a configurable druid.sql.planner.bloat parameter in CalciteRulesManager.

Fixed the problem when complicated nested sql queries which where working before Druid 29.0.0 (1.21.0 calcite version) are being rejected by calcite planner.

With exception:
There are not enough rules to produce a node with desired properties: convention=DRUID, sort=[]

Appears that the exceptiom is being thrown, when the root RelNode which initially is instance of AbstractConverter is being treated as ProjectMergeRule, which counts the result of merge not optimal and reject merging, which means that transformation of AbstractConverter is not happening, so at the end of the day this AbstractConverter instance keeps skipping bestCost initialization, because it returns default infinite value which is being skipped by VolcanoPlanner. That's why bestCost to root relNode is never set which is causing exception
This issue is probably caused by lots of optimizations in VolcanoPlanner and ProjectMergeRule and looks like as a calcite bug.

Release note

Created workaround: configurable druid.sql.planner.bloat parameter, which should be set on broker jvm.conf, it will adjust the limits of ProjectMergeRule result cost and could allow running complex queries(default value 100, suggested 1000, but it depends on data to be processed).


Key changed/added classes in this PR
  • CalciteRulesManager

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@@ -82,6 +83,10 @@ public class CalciteRulesManager
private static final int HEP_DEFAULT_MATCH_LIMIT = Integer.parseInt(
System.getProperty(HEP_DEFAULT_MATCH_LIMIT_CONFIG_STRING, "1200")
);
private static final String BLOAT_PROPERTY = "druid.sql.planner.bloat";
private static final int BLOAT = Integer.parseInt(
System.getProperty(BLOAT_PROPERTY, "100")
Copy link
Member

@kgyrtkirk kgyrtkirk Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the druid planner we'll not like at all if multiple projects are kep on top of eachother...

the default seems to be the same (100) - what do you think about raising it to a bigger value; like a few millions - so that its less likely to hit this issue out of the box ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, considering that this is primarily a workaround for a potential Calcite problem, I'm hesitant to increase the default value that much. However, I agree that raising the default to at least 1000 would be a good compromise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah - I agree...an off switch would be better than tweaking an integer;
it depends on the execution engine how it evaluates a set of expressions which may share common subexpressions ...it might not cause much problem in case it could avoid recomputations.

@@ -82,6 +83,10 @@ public class CalciteRulesManager
private static final int HEP_DEFAULT_MATCH_LIMIT = Integer.parseInt(
System.getProperty(HEP_DEFAULT_MATCH_LIMIT_CONFIG_STRING, "1200")
);
private static final String BLOAT_PROPERTY = "druid.sql.planner.bloat";
private static final int BLOAT = Integer.parseInt(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be passed via context parameter or through runtime property. A context param seems a better fit as it would allow you to run those complex queries without having any impact on the rest of the queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks! I will have a look

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use context parameter

*/
private static final List<RelOptRule> BASE_RULES =
ImmutableList.of(
CoreRules.AGGREGATE_STAR_TABLE,
CoreRules.AGGREGATE_PROJECT_STAR_TABLE,
CoreRules.PROJECT_MERGE,
ProjectMergeRule.Config.DEFAULT.withBloat(BLOAT).toRule(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do users know when to set this context parameter ?
I think we should change the error message so that they can be a bit more helpful for the end users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, these are calcite internals, and all we get on the druid side is RelOptPlanner.CannotPlanException, which I believe could be thrown by tons of reasons. We can add the suggestion to try to set this context parameter at QueryHandler, but IMO it's a real shot in the dark there.

@cryptoe cryptoe merged commit b5958b6 into apache:master May 6, 2024
85 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants