Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add syntax support for PARTITIONED BY/CLUSTERED BY in INSERT queries #12163

Merged
merged 54 commits into from
Feb 8, 2022

Conversation

LakshSingla
Copy link
Contributor

@LakshSingla LakshSingla commented Jan 17, 2022

NOTE: This PR and the description was originally written while keeping PARTITION BY and CLUSTER BY clauses in mind. The changes might not be reflected in the description and the comments.

This PR aims to add parser changes for supporting PARTITIONED BY and CLUSTERED BY as proposed in the issue #11929.

Description

This PR adds the ability to execute the queries of the form

INSERT INTO druid.dst SELECT * FROM druid.foo PARTITION BY 'day' CLUSTER BY 1, 2

Since PARTITION BY and CLUSTER BY are non-standard SQL keywords whose support is not provided in Apache Calcite (which is used for parsing the SQL queries), some changes are made into the pom.xml of druid-sql to accommodate for the changes made to the parser.

Proposal

Calcite is a general-purpose library, and while it does support parsing of SQL that should cover most of the dialects of SQL, one can also provide one's own SQL parser to it, to parse custom SQL statements.

Pre Read

(This is my current understanding and might have some flaws in it, please feel free to point out where I am wrong)

Provision in Calcite for extending the grammar
Calcite provides the option of setting the parser factory in SqlParser.Config which could be anything that implements SqlParserImplFactory. Therefore as far as Calcite is concerned one can rewrite the whole SQL parser from scratch in Java and can pass it to the SqlParser.Config (which is not ideal and tedious). Following is the minimalistic code snippet which achieves the same:

  static final SqlParser.Config PARSER_CONFIG = SqlParser
      .configBuilder()
      .setParserFactory(new CustomImplementationExtendingSqlParserImplFactory()) // Custom sql parser factory
      .build();

For providing the default parser, Calcite uses JavaCC, which is a parser generator (Parser.jj) to specify the grammar in EBNF, which gets compiled to a Java class. Ideally, we would like to add or extend our custom grammar production rules in this Parser.jj. Luckily, Calcite does support extending this Parser.jj via the use of Apache FreeMarker (FTL) templates.

Therefore, in a nutshell, we would like to recompile the default parser by providing our custom rules (through template), and then pass it to the SqlParser.Config.

Requirements

  1. We don’t want to copy-paste Calcite’s Parser.jj manually and modify it directly. This is because a lot of changes do happen in Calcite's main branch, and we would ideally want to keep in line with those changes without having to copy and merge the Parser.jj from upstream every time. Therefore we want to reuse templates as much as possible.
  2. Automate the whole process of templating and compiling, so that it gets incorporated into the build process.
  3. Original SQL functionality shouldn't be affected.

Modifications to the POM

The stages added to the build process look a lot similar to the one used by the project: dask-sql which also extends the grammar for its own purpose. Therefore, we can use other "druidesque" methods/plugins to achieve the following, but the overall steps remain the same:

  1. Copy the resources to target/codegen directory.
    a. Copy the default Parser.jj from Calcite’s jar, to the build directory
    b. Copy our FTL files, which contains our custom rules, and config.fmpp files, which contain the information to be plugged in to the parser.
  2. Generate consolidated Parser.jj which contains the default Calcite's Parser.jj along with our custom rules, using fmpp plugin
  3. With the above file as input, and using JavaCC plugin create the custom parser implementation DruidSqlParserImpl.java. This implementation would be passed to the SqlConfig for parsing the SQL strings.

Production Grammar added

/*
customInsert() -> insert() [PARTITION BY stringLiteral()] [CLUSTER BY clusterItems()]
clusterItems() -> (similar to what can be added in front of an ORDER BY clause in a SELECT statement)
*/
SqlNode DruidSqlInsert() :
{
    insertNode = SqlInsert()
    [
      <PARTITION> <BY>
      partitionBy = StringLiteral()
    ]
    [
      <CLUSTER> <BY>
      clusterBy = ClusterItems()
    ]
    {
  if(partitionBy == null && clusterBy == null) {
    return insertNode;
  }
  if(!(insertNode instanceof SqlInsert)) {
    return insertNode;
  }
  SqlInsert sqlInsert = (SqlInsert) insertNode;
  return new DruidSqlInsert(sqlInsert, partitionBy, clusterBy);
}
}

Class Design

For the custom insert statement, we are creating a class DruidSqlInsert, which inherits from the SqlInsert. The custom class contains the extra information (like the clause attached to the PartitionBy) while passing the information from the main INSERT node to the parent.
Therefore from the custom class, we can extract the extra information as and when required, while passing the same object down for the further conversion to Rel/DruidRel objects. (NOTE: Those objects however won't have any knowledge of these extra clauses added, that would be manually plugged).

Integration with the DruidPlanner

  1. Once the String gets parsed as a SqlNode, we get an instance of DruidSqlInsert (if the statement is an INSERT statement with either of the PARTITION BY or CLUSTER BY clauses.
  2. After performing some validation, (e.g. SELECT query inside the INSERT (with a CLUSTER BY) should not have an ORDER BY clause, we extract the information passed to the CLUSTER BY.
  3. In the SELECT query, we add the information passed to the CLUSTER BY, as its ORDER BY.

Caveats to the Design

(Most of the following arise due to how we are extending Apache Calcite's grammar, and the limited number of regions we can actually make the changes).

  1. The biggest caveat to the approach is that, in the Calcite's grammar: LINK we are NOT extending the INSERT production rule. Instead, we are extending the top-level statement production rule to look like:
statement:
      customInsertStatement
  |   resetStatement
  ....
  |   insert
  |   update

This is because in the original Parser.jj, there is a limited number of places wherein we can insert our custom statements.
The direct implication of this is that EXPLAIN PLAN FOR SELECT * FROM druid.foo CLUSTER BY dim1 doesn't work, since there is no customExplainStatement or its equivalent. To make this work, we would have to add our own custom EXPLAIN rule as well.

  1. As a direct consequence of the above, customInsertStatement actually overshadows the insert statement. That is any statement that begins with INSERT is going to be interpreted as customInsertStatement, irrespective of whether it uses custom clauses or not. Therefore, we have to account for simple insert statements with our custom rule.

  2. There is no support for our custom syntax at the RelNode level. Conversion from SQL string to Druid Native queries look like this: String -> SqlNode -> RelNode -> DruidRel. The custom syntax is only present till the conversion from String to SqlNode. After that, the custom parameters are extracted in the ParsedNodes present in DruidPlanner, and not sent further down to the RelNode/DruidRel's. This is because there is no rule for parsing the custom nodes in the SqlToRelConverter.class. This possibly implies that once the native query is created, parameters passed in say PARTITION BY would need to be manually injected before executing the query.

  3. The CLUSTER BY and PARTITION BY clauses are linked to the INSERT statement, as opposed to the SELECT statements used inside them. Therefore the query looks like INSERT INTO dst (SELECT * FROM foo) PARTITION BY 'day' CLUSTER BY dim1 rather than INSERT INTO dst (SELECT * FROM foo PARTITION BY 'day' CLUSTER BY dim1).

Alternatives design ideas

  1. The most 'ideal' however unmaintainable way to achieve the same would be to copy the Parser.jj from Calcite project directly, and make the changes there. This would allow us to directly make the changes in the SqlInsert's production rule (and mitigate caveats 1 & 2). However, doing so is not recommended.
  2. Instead of DruidSqlInsert inheriting from the SqlInsert, it can not do so. To make the query work, we would need to replace the query node (in ParsedNodes). The current class design ensures that no replacement of nodes is required.

Note

  1. In the newer version of Calcite, there is a default_config.fmpp which can be used to supply the boilerplate code in config.fmpp ([CALCITE-4238] Create a default parser configuration, to reduce redundant information in sub-parsers calcite#2167). On migrating to newer version, that file can be used to reduce the code in config.fmpp significantly.

PS: Currently the build process is failing due to spotbugs/forbiddenapis being run on the generated parser files, and due to failing mvn dependency:analyze check.


Key changed/added classes in this PR

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@LakshSingla LakshSingla marked this pull request as draft January 17, 2022 11:25
@LakshSingla
Copy link
Contributor Author

LGTM failed due to forbiddenapis failing on the generated parser file.

@LakshSingla LakshSingla marked this pull request as ready for review January 18, 2022 14:17
@LakshSingla LakshSingla changed the title [WIP] Add syntax support for PARTITION BY/CLUSTER BY in INSERT queries Add syntax support for PARTITION BY/CLUSTER BY in INSERT queries Jan 20, 2022
Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM post review comments.

sql/pom.xml Outdated
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.25</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this come from the parent pom ?

Copy link
Contributor Author

@LakshSingla LakshSingla Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I remembered why I didn't use the version from the parent pom. Their is a parent class variable <slf4j.version>1.7.12</slf4j.version>, however the sl4j-api is using a different version in the parent pom, which is hardcoded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't use different versions of slf4j-api in the sql module vs. the rest of the modules: it should be <scope>provided</scope> with no version specified. That way, we'll only ship one copy of the jar.

If you do that change, does it work out OK or does something bad happen?

sql/pom.xml Outdated
<plugin>
<groupId>com.googlecode.fmpp-maven-plugin</groupId>
<artifactId>fmpp-maven-plugin</artifactId>
<version>1.0</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Push to parent pom would be helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I searched the usages and it is a one-off plugin only used here. That's why refraining from pushing to parent pom. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For hygiene I think it's good to define all the plugin version numbers in pluginManagement of the main pom. The actual configuration should stay in this module, since it's specific to this particular module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL at the poms once again, updated with the relevant changes.

sql/src/main/codegen/config.fmpp Show resolved Hide resolved

# List of new keywords. Example: "DATABASES", "TABLES". If the keyword is not a reserved
# keyword add it to 'nonReservedKeywords' section.
keywords: [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it non reserved ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably makes sense for this to be a non-reserved keyword.

Copy link
Contributor Author

@LakshSingla LakshSingla Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal that "CLUSTER" is non-reserved since it won't break people's query's with CLUSTER/cluster being an identifier (datasource name, column name, etc). But on testing it out, the test case INSERT INTO dst SELECT * FROM foo CLUSTER BY dim1 breaks. This is because the parser is assuming it is written like INSERT INTO dst SELECT * FROM foo AS CLUSTER ..., since CLUSTER can be used to name stuff. On the other hand INSERT INTO dst SELECT * FROM foo PARTITION BY 'day' CLUSTER BY dim1 won't fail.
Fwiw, PARTITION is also a reserved keyword in druid.

Reverting the change for now so that build passes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. In that case let's keep it as a reserved keyword.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that @abhishekagarwal87 suggested is to have a flag (in PlannerContext or PlannerConfig) that can be used to switch between the custom and the default parser. That would make sense in case we don't want existing queries with cluster identifiers to fail.

Copy link
Contributor

@gianm gianm Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should always use our own version of the parser. It gives us more control over our dialect, which we'll find useful for other things in the future.

I'm not sure if parser flags are a good idea or not, but if we do have them, IMO the best approach is to have "versions" of the entire parser. i.e. the flag would be druid.sql.parser.version=2 or druid.sql.parser.version=latest rather than controlling specific features.

But, again, I'm not sure if we need this or not.

New keywords are added from time to time in other databases' SQL dialects. How do they typically handle this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having two parsers creates great usability and testing issues: All tests would have to be run in both. Users would need a per-query way to switch parsers, which means the poor user has to understand which is in each one, which is a hard task even for developers. Automated tools that generate Druid SQL would not know what to do.

So, better is to get the solution right within a single parser.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense. I agree it makes sense to go with a single dialect; and in that context I like going with PARTITIONED BY and CLUSTERED BY instead of PARTITION BY and CLUSTER BY, like you mentioned in #12163 (comment), because it will be less likely to collide with existing column names.

We'd need to add something to the release notes about the new keywords.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser flag is supposed to be at a druid cluster level. It's definitely not great but a workaround if we don't want to use a new reserved keyword. But if we do, using CLUSTERED BY instead of CLUSTER BY will definitely make the chance of affecting queries very low so +1 to that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I might have been unclear in the original comment. Its general usage is not to be overridden per query but acts as a fallback in case there is some constraint due to which these changes would break existing queries. But going through the PARTITIONED BY, CLUSTERED BY route, it makes sense to not consider that case since maintaining and deprecating the older parser would be unnecessary overhead.

sql/pom.xml Outdated Show resolved Hide resolved
sql/pom.xml Show resolved Hide resolved
sql/src/main/codegen/config.fmpp Show resolved Hide resolved
}
}
// Creates a new SqlOrderBy query, which may have our CLUSTER BY overwritten
query = new SqlOrderBy(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if the SQL layer is the right place to do this transformation or should we just leave it to the native layer to use orderBy and clusterBy together. My assumption for this transformation is that we would like the query results to be ordered on same dimensions that we want to use to arrange data in segments. Maybe it's something that QueryMaker should decide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, whatever you do enter in the "OrderBy" is getting converted as an ordering spec. I think the aim of the clause is to add syntactic sugar rather than add a different semantic.
(I am unfamiliar with Hive) I looked up Hive, which implements both of these clauses stackoverflow, and it seems that both of them try and achieve similar functionality albeit with different results and different semantics.
But yeah then it makes sense to implement both the CLUSTER BY and ORDER BY clauses and punt the actual logic to QueryMaker or a DruidRel.
The trouble I see implementing it is
a) We will need to convert it to a RelNode in SqlToRelConverter, and then pass it down the query stack. (This is because CLUSTER BY can be a reference to a column in the statement, and we probably need to use it). I am unsure if this is possible without making changes to the SqlToRelConverter itself.
b) A new Query of type Query.INSERT would need to be created to accommodate "ClusteringSpec" in addition to the "OrderingSpec".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK for now to continue using the native ordering specs to represent cluster by, and to disallow having ORDER BY and CLUSTER BY at the same time. Also, IMO, we should not allow ORDER BY on the SELECT for an INSERT. The error could be something like "Cannot have ORDER BY on an INSERT query; use CLUSTER BY instead."

That way, we can keep discussing if we should allow ORDER BY and CLUSTER BY to be used together, and if so, what should happen. Without blocking this PR.

@LakshSingla
Copy link
Contributor Author

Thanks everyone for reviewing the PR. The changes have been made per the newer proposal for the supported clauses in PARTITIONED BY and making it compulsory in INSERT queries. Comments from the last round of reviews have also been addressed

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking mostly good to me; a few remaining comments about unparse and also about the error messages.

{
super.unparse(writer, leftPrec, rightPrec);
writer.keyword("PARTITIONED BY");
writer.keyword(getPartitionedBy().toString()); // TODO: Make it cleaner by directly unparsing the SqlNode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this unparse method get used anywhere? If so, we need to fix this because it's going to be incorrect in a bunch of cases. Or if it doesn't get used anywhere, IMO it's better to throw UnsupportedOperationException rather than have a broken implementation. At least that way it's clear that the method isn't reliable.

Copy link
Contributor Author

@LakshSingla LakshSingla Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the method, with the proper implementation. Here are a few unparse examples after the change:

INSERT INTO `dst`
(SELECT *
FROM `foo`) PARTITIONED BY ALL TIME

INSERT INTO `dst`
(SELECT *
FROM `view`.`aview`) PARTITIONED BY `TIME_FLOOR`(`__TIME`, 'PT2H')

INSERT INTO `druid`.`dst`
(SELECT `__time`, FLOOR(`m1`) AS `floor_m1`, `dim1`, CEIL(`m2`)
FROM `foo`) PARTITIONED BY FLOOR(`__TIME` TO DAY) CLUSTERED BY 2 `dim1` DESC CEIL(`m2`)

unparse method is important since this is being used in SqlNode's toString method. So it gets used while debugging to view the original node as a SQL string. (Also, it gets used in other unparse methods, so not relevant to DruidSqlInsert currently since it's a top-level node, but it might cause say Explain on a DriudSqlInsert to fail, which is going to be implemented). Therefore I would say it is important to get it right.

writer.keyword("PARTITIONED BY");
writer.keyword(getPartitionedBy().toString()); // TODO: Make it cleaner by directly unparsing the SqlNode
if (getClusteredBy() != null) {
writer.sep("CLUSTERED BY");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why sep rather than keyword? sep is supposed to be for list separators, which doesn't seem right here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, oversight on my part

public static Granularity convertSqlNodeToGranularity(SqlNode sqlNode) throws ParseException
{

final String genericParseFailedMessageFormatString = "Unable to parse the granularity from %s. Expected HOUR, DAY, MONTH, YEAR, ALL TIME, FLOOR function or %s function";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"parse the granularity" sounds strange to me, grammatically. How about this:

Encountered <%s> after PARTITIONED BY. Expected HOUR, DAY, MONTH, YEAR, ALL TIME, FLOOR(__time TO <timeUnit>) or %s(__time, '<period>').

"FLOOR".equalsIgnoreCase(operatorName)
|| TimeFloorOperatorConversion.SQL_FUNCTION_NAME.equalsIgnoreCase(operatorName),
StringUtils.format(
"PARTITIONED BY clause can only parse FLOOR and %s functions.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verb "parse" here seems out of place. The clause doesn't do the parsing; the parser does the parsing.

How about including the same error message as above? I don't think this message needs to be different; the earlier one makes total sense here.

List<SqlNode> operandList = sqlCall.getOperandList();
Preconditions.checkArgument(
operandList.size() == 2,
StringUtils.format("Invalid number of arguments passed to %s in PARTIITONED BY clause", operatorName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

%s in PARTITIONED BY must have two arguments.

(Even if you don't change this, PARTIITONED is misspelled, so please fix that at least.)


private ParsedNodes(
@Nullable SqlExplain explain,
@Nullable SqlInsert insert,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But we'll need EXPLAIN on INSERT to work eventually, so we'll need to clean this up at some point.

@@ -765,13 +786,65 @@ static ParsedNodes create(final SqlNode node) throws ValidationException
if (query.getKind() == SqlKind.INSERT) {
insert = (SqlInsert) query;
query = insert.getSource();

// Processing to be done when the original query has either of the PARTITION BY or CLUSTER BY clause
if (insert instanceof DruidSqlInsert) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me to do it in a follow up. We do need to do it, though, so EXPLAIN on INSERT works properly.

period = new Period(granularityString);
}
catch (IllegalArgumentException e) {
throw new ParseException(StringUtils.format("Unable to create period from %s", granularitySqlNode.toString()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new ParseException(StringUtils.format("Unable to create period from %s", granularitySqlNode.toString()));
throw new ParseException(StringUtils.format("Unable to create period from %s", granularityString));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the other form because then I won't need to explicitly put the string in quotes. Plus it represents exactly what the user typed, not the one curated for our use.

}
}
catch (JsonProcessingException e) {
throw new ValidationException("Unable to serialize partition granularity.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should log the exception stack trace.

@abhishekagarwal87
Copy link
Contributor

Great work @LakshSingla. Almost there :) I had some comments on the error messages. My reasoning for the rephrasing is that users don't know about the conversion or translation of parameters they are passing. So if a parameter X cannot be converted to Y, the error message can simply be that "X is an invalid input. Please refer to documentation" instead of "Unable to convert X to Y".

@LakshSingla
Copy link
Contributor Author

LakshSingla commented Feb 7, 2022

Thanks @abhishekagarwal87 @gianm for another pass at the PR. I have updated the clumsy error messages in the DruidSqlParserUtils to be more affirmative, and hide the 'parse' that happens behind the scenes. Also, DruidSqlInsert.unparse now works as expected, updated the comment above with some examples for the same.

Regarding EXPLAIN PLAN FOR druidSqlInsert, that would for sure be added in a separate PR, and shouldn't be much work once this gets merged. One of the other reasons I wanted to raise a separate PR was because this patch contains a lot of boilerplate code to make the extended syntax work and I didn't want to incorporate those changes with this patch.

I hope I have addressed the latest and the unresolved comments. PTAL at the updated PR. Thanks.

}
catch (Exception e) {
log.error(e, StringUtils.format("Unable to convert %s to a valid granularity.", sqlNode.toString()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this error occurs, it's due to user error on one particular query, so it shouldn't be an ERROR that appears in the server logs. That should be reserved for really important stuff that cluster operators need to worry about.

However, there should be some way to get the stack trace, and we generally don't want to return it directly to users of the SQL API.

So in a situation like this I'd do log.debug instead of log.error, so server admins can see the stack trace if they really need to, but won't see it by default. That balances all of the relevant desires: server admins shouldn't see "user error" stuff by default, and users shouldn't see stack traces as part of normal error messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with the correct level, thanks for the explanation.

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks @LakshSingla!

@gianm
Copy link
Contributor

gianm commented Feb 8, 2022

@LakshSingla Could you please take a look at the "intellij inspections" check, and the CalciteInsertDmlTest failure in the "other modules test"? The issues there look legit.

@abhishekagarwal87 abhishekagarwal87 merged commit 4add251 into apache:master Feb 8, 2022
abhishekagarwal87 pushed a commit that referenced this pull request Feb 11, 2022
#12163 makes PARTITIONED BY a required clause in INSERT queries. While this is required, if a user accidentally omits the clause, it emits a JavaCC/Calcite error, since it's syntactically incorrect. The error message is cryptic. Since it's a custom clause, this PR aims to make the clause optional on the syntactic side, but move the validation to DruidSqlInsert where we can surface a friendlier error.
@FrankChen021
Copy link
Member

@LakshSingla The INSERT-SELECT query is only supported in the tests, right? I tried to execute a INSERT-SELECT query on the lastest master branch, it said: Cannot execute INSERT queries.

When I checked the code, I found the message was thrown from NativeQueryMakerFactory which implements QueryMakerFactory. And only the class TestQueryMakerFactory in the tests implements the method.

@LakshSingla
Copy link
Contributor Author

LakshSingla commented Apr 28, 2022

@FrankChen021 This PR aims to add only the support for the INSERT statements with the custom PARTITIONED/CLUSTERED syntax. To actually run an INSERT query, a supported QueryMaker/QueryMakerFactory is required, which is only supported in the tests (TestQueryMakerFactory).

@FrankChen021
Copy link
Member

@LakshSingla Thanks for the clarification. I thought the INSERT query was supported, looks like my memory is wrong.

@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants