Add syntax support for PARTITIONED BY/CLUSTERED BY in INSERT queries #12163

LakshSingla · 2022-01-17T11:25:45Z

NOTE: This PR and the description was originally written while keeping PARTITION BY and CLUSTER BY clauses in mind. The changes might not be reflected in the description and the comments.

This PR aims to add parser changes for supporting PARTITIONED BY and CLUSTERED BY as proposed in the issue #11929.

Description

This PR adds the ability to execute the queries of the form

INSERT INTO druid.dst SELECT * FROM druid.foo PARTITION BY 'day' CLUSTER BY 1, 2

Since PARTITION BY and CLUSTER BY are non-standard SQL keywords whose support is not provided in Apache Calcite (which is used for parsing the SQL queries), some changes are made into the pom.xml of druid-sql to accommodate for the changes made to the parser.

Proposal

Calcite is a general-purpose library, and while it does support parsing of SQL that should cover most of the dialects of SQL, one can also provide one's own SQL parser to it, to parse custom SQL statements.

Pre Read

(This is my current understanding and might have some flaws in it, please feel free to point out where I am wrong)

Provision in Calcite for extending the grammar
Calcite provides the option of setting the parser factory in SqlParser.Config which could be anything that implements SqlParserImplFactory. Therefore as far as Calcite is concerned one can rewrite the whole SQL parser from scratch in Java and can pass it to the SqlParser.Config (which is not ideal and tedious). Following is the minimalistic code snippet which achieves the same:

  static final SqlParser.Config PARSER_CONFIG = SqlParser
      .configBuilder()
      .setParserFactory(new CustomImplementationExtendingSqlParserImplFactory()) // Custom sql parser factory
      .build();

For providing the default parser, Calcite uses JavaCC, which is a parser generator (Parser.jj) to specify the grammar in EBNF, which gets compiled to a Java class. Ideally, we would like to add or extend our custom grammar production rules in this Parser.jj. Luckily, Calcite does support extending this Parser.jj via the use of Apache FreeMarker (FTL) templates.

Therefore, in a nutshell, we would like to recompile the default parser by providing our custom rules (through template), and then pass it to the SqlParser.Config.

Requirements

We don’t want to copy-paste Calcite’s Parser.jj manually and modify it directly. This is because a lot of changes do happen in Calcite's main branch, and we would ideally want to keep in line with those changes without having to copy and merge the Parser.jj from upstream every time. Therefore we want to reuse templates as much as possible.
Automate the whole process of templating and compiling, so that it gets incorporated into the build process.
Original SQL functionality shouldn't be affected.

Modifications to the POM

The stages added to the build process look a lot similar to the one used by the project: dask-sql which also extends the grammar for its own purpose. Therefore, we can use other "druidesque" methods/plugins to achieve the following, but the overall steps remain the same:

Copy the resources to target/codegen directory.
a. Copy the default Parser.jj from Calcite’s jar, to the build directory
b. Copy our FTL files, which contains our custom rules, and config.fmpp files, which contain the information to be plugged in to the parser.
Generate consolidated Parser.jj which contains the default Calcite's Parser.jj along with our custom rules, using fmpp plugin
With the above file as input, and using JavaCC plugin create the custom parser implementation DruidSqlParserImpl.java. This implementation would be passed to the SqlConfig for parsing the SQL strings.

Production Grammar added

/*
customInsert() -> insert() [PARTITION BY stringLiteral()] [CLUSTER BY clusterItems()]
clusterItems() -> (similar to what can be added in front of an ORDER BY clause in a SELECT statement)
*/
SqlNode DruidSqlInsert() :
{
    insertNode = SqlInsert()
    [
      <PARTITION> <BY>
      partitionBy = StringLiteral()
    ]
    [
      <CLUSTER> <BY>
      clusterBy = ClusterItems()
    ]
    {
  if(partitionBy == null && clusterBy == null) {
    return insertNode;
  }
  if(!(insertNode instanceof SqlInsert)) {
    return insertNode;
  }
  SqlInsert sqlInsert = (SqlInsert) insertNode;
  return new DruidSqlInsert(sqlInsert, partitionBy, clusterBy);
}
}

Class Design

For the custom insert statement, we are creating a class DruidSqlInsert, which inherits from the SqlInsert. The custom class contains the extra information (like the clause attached to the PartitionBy) while passing the information from the main INSERT node to the parent.
Therefore from the custom class, we can extract the extra information as and when required, while passing the same object down for the further conversion to Rel/DruidRel objects. (NOTE: Those objects however won't have any knowledge of these extra clauses added, that would be manually plugged).

Integration with the DruidPlanner

Once the String gets parsed as a SqlNode, we get an instance of DruidSqlInsert (if the statement is an INSERT statement with either of the PARTITION BY or CLUSTER BY clauses.
After performing some validation, (e.g. SELECT query inside the INSERT (with a CLUSTER BY) should not have an ORDER BY clause, we extract the information passed to the CLUSTER BY.
In the SELECT query, we add the information passed to the CLUSTER BY, as its ORDER BY.

Caveats to the Design

(Most of the following arise due to how we are extending Apache Calcite's grammar, and the limited number of regions we can actually make the changes).

The biggest caveat to the approach is that, in the Calcite's grammar: LINK we are NOT extending the INSERT production rule. Instead, we are extending the top-level statement production rule to look like:

statement:
      customInsertStatement
  |   resetStatement
  ....
  |   insert
  |   update

This is because in the original Parser.jj, there is a limited number of places wherein we can insert our custom statements.
The direct implication of this is that EXPLAIN PLAN FOR SELECT * FROM druid.foo CLUSTER BY dim1 doesn't work, since there is no customExplainStatement or its equivalent. To make this work, we would have to add our own custom EXPLAIN rule as well.

As a direct consequence of the above, customInsertStatement actually overshadows the insert statement. That is any statement that begins with INSERT is going to be interpreted as customInsertStatement, irrespective of whether it uses custom clauses or not. Therefore, we have to account for simple insert statements with our custom rule.
There is no support for our custom syntax at the RelNode level. Conversion from SQL string to Druid Native queries look like this: String -> SqlNode -> RelNode -> DruidRel. The custom syntax is only present till the conversion from String to SqlNode. After that, the custom parameters are extracted in the ParsedNodes present in DruidPlanner, and not sent further down to the RelNode/DruidRel's. This is because there is no rule for parsing the custom nodes in the SqlToRelConverter.class. This possibly implies that once the native query is created, parameters passed in say PARTITION BY would need to be manually injected before executing the query.
The CLUSTER BY and PARTITION BY clauses are linked to the INSERT statement, as opposed to the SELECT statements used inside them. Therefore the query looks like INSERT INTO dst (SELECT * FROM foo) PARTITION BY 'day' CLUSTER BY dim1 rather than INSERT INTO dst (SELECT * FROM foo PARTITION BY 'day' CLUSTER BY dim1).

Alternatives design ideas

The most 'ideal' however unmaintainable way to achieve the same would be to copy the Parser.jj from Calcite project directly, and make the changes there. This would allow us to directly make the changes in the SqlInsert's production rule (and mitigate caveats 1 & 2). However, doing so is not recommended.
Instead of DruidSqlInsert inheriting from the SqlInsert, it can not do so. To make the query work, we would need to replace the query node (in ParsedNodes). The current class design ensures that no replacement of nodes is required.

Note

In the newer version of Calcite, there is a default_config.fmpp which can be used to supply the boilerplate code in config.fmpp ([CALCITE-4238] Create a default parser configuration, to reduce redundant information in sub-parsers calcite#2167). On migrating to newer version, that file can be used to reduce the code in config.fmpp significantly.

PS: Currently the build process is failing due to spotbugs/forbiddenapis being run on the generated parser files, and due to failing mvn dependency:analyze check.

Key changed/added classes in this PR

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

LakshSingla · 2022-01-17T14:50:14Z

LGTM failed due to forbiddenapis failing on the generated parser file.

sql/src/main/codegen/includes/insert.ftl

cryptoe

LGTM post review comments.

cryptoe · 2022-01-25T10:20:35Z

sql/pom.xml

+    <dependency>
+      <groupId>org.slf4j</groupId>
+      <artifactId>slf4j-api</artifactId>
+      <version>1.7.25</version>


Can this come from the parent pom ?

Hi, I remembered why I didn't use the version from the parent pom. Their is a parent class variable <slf4j.version>1.7.12</slf4j.version>, however the sl4j-api is using a different version in the parent pom, which is hardcoded.

We shouldn't use different versions of slf4j-api in the sql module vs. the rest of the modules: it should be <scope>provided</scope> with no version specified. That way, we'll only ship one copy of the jar.

If you do that change, does it work out OK or does something bad happen?

cryptoe · 2022-01-25T10:21:35Z

sql/pom.xml

+      <plugin>
+        <groupId>com.googlecode.fmpp-maven-plugin</groupId>
+        <artifactId>fmpp-maven-plugin</artifactId>
+        <version>1.0</version>


Push to parent pom would be helpful

Hey, I searched the usages and it is a one-off plugin only used here. That's why refraining from pushing to parent pom. WDYT?

For hygiene I think it's good to define all the plugin version numbers in pluginManagement of the main pom. The actual configuration should stay in this module, since it's specific to this particular module.

PTAL at the poms once again, updated with the relevant changes.

sql/src/main/codegen/config.fmpp

cryptoe · 2022-01-25T10:27:45Z

sql/src/main/codegen/config.fmpp

+
+    # List of new keywords. Example: "DATABASES", "TABLES". If the keyword is not a reserved
+    # keyword add it to 'nonReservedKeywords' section.
+    keywords: [


Make it non reserved ?

Yeah, probably makes sense for this to be a non-reserved keyword.

It would be ideal that "CLUSTER" is non-reserved since it won't break people's query's with CLUSTER/cluster being an identifier (datasource name, column name, etc). But on testing it out, the test case INSERT INTO dst SELECT * FROM foo CLUSTER BY dim1 breaks. This is because the parser is assuming it is written like INSERT INTO dst SELECT * FROM foo AS CLUSTER ..., since CLUSTER can be used to name stuff. On the other hand INSERT INTO dst SELECT * FROM foo PARTITION BY 'day' CLUSTER BY dim1 won't fail.
Fwiw, PARTITION is also a reserved keyword in druid.

Reverting the change for now so that build passes.

Got it. In that case let's keep it as a reserved keyword.

One thing that @abhishekagarwal87 suggested is to have a flag (in PlannerContext or PlannerConfig) that can be used to switch between the custom and the default parser. That would make sense in case we don't want existing queries with cluster identifiers to fail.

IMO, we should always use our own version of the parser. It gives us more control over our dialect, which we'll find useful for other things in the future.

I'm not sure if parser flags are a good idea or not, but if we do have them, IMO the best approach is to have "versions" of the entire parser. i.e. the flag would be druid.sql.parser.version=2 or druid.sql.parser.version=latest rather than controlling specific features.

But, again, I'm not sure if we need this or not.

New keywords are added from time to time in other databases' SQL dialects. How do they typically handle this?

Having two parsers creates great usability and testing issues: All tests would have to be run in both. Users would need a per-query way to switch parsers, which means the poor user has to understand which is in each one, which is a hard task even for developers. Automated tools that generate Druid SQL would not know what to do.

So, better is to get the solution right within a single parser.

Yeah, that makes sense. I agree it makes sense to go with a single dialect; and in that context I like going with PARTITIONED BY and CLUSTERED BY instead of PARTITION BY and CLUSTER BY, like you mentioned in #12163 (comment), because it will be less likely to collide with existing column names.

We'd need to add something to the release notes about the new keywords.

The parser flag is supposed to be at a druid cluster level. It's definitely not great but a workaround if we don't want to use a new reserved keyword. But if we do, using CLUSTERED BY instead of CLUSTER BY will definitely make the chance of affecting queries very low so +1 to that.

Yep, I might have been unclear in the original comment. Its general usage is not to be overridden per query but acts as a fallback in case there is some constraint due to which these changes would break existing queries. But going through the PARTITIONED BY, CLUSTERED BY route, it makes sense to not consider that case since maintaining and deprecating the older parser would be unnecessary overhead.

sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlInsert.java

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java

sql/src/test/java/org/apache/druid/sql/calcite/CalciteInsertDmlTest.java

sql/pom.xml

sql/src/main/codegen/config.fmpp

sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlInsert.java

abhishekagarwal87 · 2022-01-26T07:55:53Z

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java

+              }
+            }
+            // Creates a new SqlOrderBy query, which may have our CLUSTER BY overwritten
+            query = new SqlOrderBy(


I am wondering if the SQL layer is the right place to do this transformation or should we just leave it to the native layer to use orderBy and clusterBy together. My assumption for this transformation is that we would like the query results to be ordered on same dimensions that we want to use to arrange data in segments. Maybe it's something that QueryMaker should decide?

Currently, whatever you do enter in the "OrderBy" is getting converted as an ordering spec. I think the aim of the clause is to add syntactic sugar rather than add a different semantic.
(I am unfamiliar with Hive) I looked up Hive, which implements both of these clauses stackoverflow, and it seems that both of them try and achieve similar functionality albeit with different results and different semantics.
But yeah then it makes sense to implement both the CLUSTER BY and ORDER BY clauses and punt the actual logic to QueryMaker or a DruidRel.
The trouble I see implementing it is
a) We will need to convert it to a RelNode in SqlToRelConverter, and then pass it down the query stack. (This is because CLUSTER BY can be a reference to a column in the statement, and we probably need to use it). I am unsure if this is possible without making changes to the SqlToRelConverter itself.
b) A new Query of type Query.INSERT would need to be created to accommodate "ClusteringSpec" in addition to the "OrderingSpec".

I think it's OK for now to continue using the native ordering specs to represent cluster by, and to disallow having ORDER BY and CLUSTER BY at the same time. Also, IMO, we should not allow ORDER BY on the SELECT for an INSERT. The error could be something like "Cannot have ORDER BY on an INSERT query; use CLUSTER BY instead."

That way, we can keep discussing if we should allow ORDER BY and CLUSTER BY to be used together, and if so, what should happen. Without blocking this PR.

LakshSingla · 2022-02-03T20:37:51Z

Thanks everyone for reviewing the PR. The changes have been made per the newer proposal for the supported clauses in PARTITIONED BY and making it compulsory in INSERT queries. Comments from the last round of reviews have also been addressed

gianm

Looking mostly good to me; a few remaining comments about unparse and also about the error messages.

gianm · 2022-02-04T22:11:05Z

sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlInsert.java

+  {
+    super.unparse(writer, leftPrec, rightPrec);
+    writer.keyword("PARTITIONED BY");
+    writer.keyword(getPartitionedBy().toString()); // TODO: Make it cleaner by directly unparsing the SqlNode


Does this unparse method get used anywhere? If so, we need to fix this because it's going to be incorrect in a bunch of cases. Or if it doesn't get used anywhere, IMO it's better to throw UnsupportedOperationException rather than have a broken implementation. At least that way it's clear that the method isn't reliable.

Updated the method, with the proper implementation. Here are a few unparse examples after the change:

INSERT INTO `dst` (SELECT * FROM `foo`) PARTITIONED BY ALL TIME INSERT INTO `dst` (SELECT * FROM `view`.`aview`) PARTITIONED BY `TIME_FLOOR`(`__TIME`, 'PT2H') INSERT INTO `druid`.`dst` (SELECT `__time`, FLOOR(`m1`) AS `floor_m1`, `dim1`, CEIL(`m2`) FROM `foo`) PARTITIONED BY FLOOR(`__TIME` TO DAY) CLUSTERED BY 2 `dim1` DESC CEIL(`m2`)

unparse method is important since this is being used in SqlNode's toString method. So it gets used while debugging to view the original node as a SQL string. (Also, it gets used in other unparse methods, so not relevant to DruidSqlInsert currently since it's a top-level node, but it might cause say Explain on a DriudSqlInsert to fail, which is going to be implemented). Therefore I would say it is important to get it right.

gianm · 2022-02-04T22:13:30Z

sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlInsert.java

+    writer.keyword("PARTITIONED BY");
+    writer.keyword(getPartitionedBy().toString()); // TODO: Make it cleaner by directly unparsing the SqlNode
+    if (getClusteredBy() != null) {
+      writer.sep("CLUSTERED BY");


Why sep rather than keyword? sep is supposed to be for list separators, which doesn't seem right here.

Fixed, oversight on my part

sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java

gianm · 2022-02-04T22:48:04Z

sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java

+  public static Granularity convertSqlNodeToGranularity(SqlNode sqlNode) throws ParseException
+  {
+
+    final String genericParseFailedMessageFormatString = "Unable to parse the granularity from %s. Expected HOUR, DAY, MONTH, YEAR, ALL TIME, FLOOR function or %s function";


"parse the granularity" sounds strange to me, grammatically. How about this:

Encountered <%s> after PARTITIONED BY. Expected HOUR, DAY, MONTH, YEAR, ALL TIME, FLOOR(__time TO <timeUnit>) or %s(__time, '<period>').

gianm · 2022-02-04T22:51:35Z

sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java

+        "FLOOR".equalsIgnoreCase(operatorName)
+        || TimeFloorOperatorConversion.SQL_FUNCTION_NAME.equalsIgnoreCase(operatorName),
+        StringUtils.format(
+            "PARTITIONED BY clause can only parse FLOOR and %s functions.",


The verb "parse" here seems out of place. The clause doesn't do the parsing; the parser does the parsing.

How about including the same error message as above? I don't think this message needs to be different; the earlier one makes total sense here.

gianm · 2022-02-04T22:55:38Z

sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java

+    List<SqlNode> operandList = sqlCall.getOperandList();
+    Preconditions.checkArgument(
+        operandList.size() == 2,
+        StringUtils.format("Invalid number of arguments passed to %s in PARTIITONED BY clause", operatorName)


How about:

%s in PARTITIONED BY must have two arguments.

(Even if you don't change this, PARTIITONED is misspelled, so please fix that at least.)

gianm · 2022-02-04T23:09:17Z

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java

+
+    private ParsedNodes(
+        @Nullable SqlExplain explain,
+        @Nullable SqlInsert insert,


I see. But we'll need EXPLAIN on INSERT to work eventually, so we'll need to clean this up at some point.

gianm · 2022-02-04T23:09:45Z

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java

@@ -765,13 +786,65 @@ static ParsedNodes create(final SqlNode node) throws ValidationException
      if (query.getKind() == SqlKind.INSERT) {
        insert = (SqlInsert) query;
        query = insert.getSource();
+
+        // Processing to be done when the original query has either of the PARTITION BY or CLUSTER BY clause
+        if (insert instanceof DruidSqlInsert) {


Makes sense to me to do it in a follow up. We do need to do it, though, so EXPLAIN on INSERT works properly.

sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java

abhishekagarwal87 · 2022-02-07T16:07:58Z

sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java

+        period = new Period(granularityString);
+      }
+      catch (IllegalArgumentException e) {
+        throw new ParseException(StringUtils.format("Unable to create period from %s", granularitySqlNode.toString()));


Suggested change

throw new ParseException(StringUtils.format("Unable to create period from %s", granularitySqlNode.toString()));

throw new ParseException(StringUtils.format("Unable to create period from %s", granularityString));

I used the other form because then I won't need to explicitly put the string in quotes. Plus it represents exactly what the user typed, not the one curated for our use.

sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java

abhishekagarwal87 · 2022-02-07T16:13:04Z

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java

+      }
+    }
+    catch (JsonProcessingException e) {
+      throw new ValidationException("Unable to serialize partition granularity.");


we should log the exception stack trace.

abhishekagarwal87 · 2022-02-07T16:17:57Z

Great work @LakshSingla. Almost there :) I had some comments on the error messages. My reasoning for the rephrasing is that users don't know about the conversion or translation of parameters they are passing. So if a parameter X cannot be converted to Y, the error message can simply be that "X is an invalid input. Please refer to documentation" instead of "Unable to convert X to Y".

LakshSingla · 2022-02-07T19:35:48Z

Thanks @abhishekagarwal87 @gianm for another pass at the PR. I have updated the clumsy error messages in the DruidSqlParserUtils to be more affirmative, and hide the 'parse' that happens behind the scenes. Also, DruidSqlInsert.unparse now works as expected, updated the comment above with some examples for the same.

Regarding EXPLAIN PLAN FOR druidSqlInsert, that would for sure be added in a separate PR, and shouldn't be much work once this gets merged. One of the other reasons I wanted to raise a separate PR was because this patch contains a lot of boilerplate code to make the extended syntax work and I didn't want to incorporate those changes with this patch.

I hope I have addressed the latest and the unresolved comments. PTAL at the updated PR. Thanks.

gianm · 2022-02-07T20:34:18Z

sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlParserUtils.java

    }
    catch (Exception e) {
+      log.error(e, StringUtils.format("Unable to convert %s to a valid granularity.", sqlNode.toString()));


If this error occurs, it's due to user error on one particular query, so it shouldn't be an ERROR that appears in the server logs. That should be reserved for really important stuff that cluster operators need to worry about.

However, there should be some way to get the stack trace, and we generally don't want to return it directly to users of the SQL API.

So in a situation like this I'd do log.debug instead of log.error, so server admins can see the stack trace if they really need to, but won't see it by default. That balances all of the relevant desires: server admins shouldn't see "user error" stuff by default, and users shouldn't see stack traces as part of normal error messages.

Updated with the correct level, thanks for the explanation.

gianm

LGTM - thanks @LakshSingla!

gianm · 2022-02-08T02:13:15Z

@LakshSingla Could you please take a look at the "intellij inspections" check, and the CalciteInsertDmlTest failure in the "other modules test"? The issues there look legit.

#12163 makes PARTITIONED BY a required clause in INSERT queries. While this is required, if a user accidentally omits the clause, it emits a JavaCC/Calcite error, since it's syntactically incorrect. The error message is cryptic. Since it's a custom clause, this PR aims to make the clause optional on the syntactic side, but move the validation to DruidSqlInsert where we can surface a friendlier error.

FrankChen021 · 2022-04-27T14:56:15Z

@LakshSingla The INSERT-SELECT query is only supported in the tests, right? I tried to execute a INSERT-SELECT query on the lastest master branch, it said: Cannot execute INSERT queries.

When I checked the code, I found the message was thrown from NativeQueryMakerFactory which implements QueryMakerFactory. And only the class TestQueryMakerFactory in the tests implements the method.

LakshSingla · 2022-04-28T05:51:49Z

@FrankChen021 This PR aims to add only the support for the INSERT statements with the custom PARTITIONED/CLUSTERED syntax. To actually run an INSERT query, a supported QueryMaker/QueryMakerFactory is required, which is only supported in the tests (TestQueryMakerFactory).

FrankChen021 · 2022-04-28T06:24:20Z

@LakshSingla Thanks for the clarification. I thought the INSERT query was supported, looks like my memory is wrong.

LakshSingla added 13 commits January 4, 2022 16:44

Initial commit, build working

29969da

Merge branch 'master' into calcite-ext-prototype

6d3382f

Add for SqlSelect and OrderBy

f70aad9

Integrate the changes such that set context works

7b2c884

Merge branch 'master' into calcite-ext-prototype

f406334

Fix nullptr in DruidPlanner

c9d9bc9

Add files for SqlInsert

ab57bc0

Revert setcontext changes, only insert changes being exposed

7a3e479

Cleanup code, ClusterBy will create a new SqlNode of type OrderBy

d791669

Cleanup code, revert unnesecary changes

5b741a6

Merge branch 'master' into calcite-ext-insert

2ddad66

Remove contextMap from ParsedNodes

1845ddf

Add getter for partitionBy

2817a85

LakshSingla marked this pull request as draft January 17, 2022 11:25

Checkstyle fix

7d63e44

LakshSingla added 2 commits January 17, 2022 22:02

Add generated files to ignorelist of forbiddenapis

a22eb0e

indentation fix, add license to insert.ftl

1119dd0

abhishekagarwal87 reviewed Jan 18, 2022

View reviewed changes

sql/src/main/codegen/includes/insert.ftl Outdated Show resolved Hide resolved

LakshSingla added 2 commits January 18, 2022 17:32

Add testcases which plan using modified syntax

50ce3fb

Handle LIMIT/OFFSET clauses, add more testcases

472291c

LakshSingla marked this pull request as ready for review January 18, 2022 14:17

LakshSingla added 3 commits January 19, 2022 13:10

Exclude generated classes from spotbug/forbiddenapis

6c2fcf2

Merge branch 'master' into calcite-ext-insert

e31ebcc

Remove version from mvn-dep plugin, add sl4j dep

0b7dec6

LakshSingla changed the title ~~[WIP] Add syntax support for PARTITION BY/CLUSTER BY in INSERT queries~~ Add syntax support for PARTITION BY/CLUSTER BY in INSERT queries Jan 20, 2022

Pass the partition by granularity in the querycontext

f78606c

abhishekagarwal87 added the Area - SQL label Jan 25, 2022

cryptoe reviewed Jan 25, 2022

View reviewed changes

abhishekagarwal87 reviewed Jan 26, 2022

View reviewed changes

minor fixes

b90ab16

LakshSingla requested review from abhishekagarwal87 and gianm February 3, 2022 20:38

LakshSingla added 3 commits February 4, 2022 09:40

intellij inspections

e335214

Exclude druidsqlinsert from codecov

2cef674

Exclude druidsqlinsert from codecov2

8d58136

gianm reviewed Feb 4, 2022

View reviewed changes

Missed testcase

973d3fa

abhishekagarwal87 reviewed Feb 7, 2022

View reviewed changes

LakshSingla added 2 commits February 7, 2022 23:04

DruidSqlInsert.unparse working

dc2b45e

Fix error messages

32244a6

Revert codecov exclude

5017257

gianm reviewed Feb 7, 2022

View reviewed changes

reduce severity for parsing error log

5b79694

gianm approved these changes Feb 7, 2022

View reviewed changes

LakshSingla added 2 commits February 8, 2022 09:52

intellij, test errors

43b7388

unused import

ad78bba

abhishekagarwal87 approved these changes Feb 8, 2022

View reviewed changes

abhishekagarwal87 merged commit 4add251 into apache:master Feb 8, 2022

This was referenced Feb 9, 2022

Explain plan for custom insert syntax #12243

Merged

Surface a user friendly error when PARTITIONED BY is omitted #12246

Merged

abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022

	throw new ParseException(StringUtils.format("Unable to create period from %s", granularitySqlNode.toString()));
	throw new ParseException(StringUtils.format("Unable to create period from %s", granularityString));

Add syntax support for PARTITIONED BY/CLUSTERED BY in INSERT queries #12163

Add syntax support for PARTITIONED BY/CLUSTERED BY in INSERT queries #12163

Conversation

LakshSingla commented Jan 17, 2022 • edited Loading

Description

Proposal

Pre Read

Requirements

Modifications to the POM

Production Grammar added

Class Design

Integration with the DruidPlanner

Caveats to the Design

Alternatives design ideas

Note

Key changed/added classes in this PR

LakshSingla commented Jan 17, 2022

cryptoe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LakshSingla Jan 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LakshSingla Jan 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gianm Jan 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LakshSingla commented Feb 3, 2022

gianm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LakshSingla Feb 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhishekagarwal87 commented Feb 7, 2022

LakshSingla commented Feb 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gianm left a comment

Choose a reason for hiding this comment

gianm commented Feb 8, 2022

FrankChen021 commented Apr 27, 2022

LakshSingla commented Apr 28, 2022 • edited Loading

FrankChen021 commented Apr 28, 2022

LakshSingla commented Jan 17, 2022 •

edited

Loading

LakshSingla Jan 26, 2022 •

edited

Loading

LakshSingla Jan 27, 2022 •

edited

Loading

gianm Jan 27, 2022 •

edited

Loading

LakshSingla Feb 7, 2022 •

edited

Loading

LakshSingla commented Feb 7, 2022 •

edited

Loading

LakshSingla commented Apr 28, 2022 •

edited

Loading