-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-7832] Add ZetaSQL as a dialect in BeamSQL #9210
Conversation
Run Java PreCommit |
Run RAT PreCommit |
Run SQL PostCommit |
2 similar comments
Run SQL PostCommit |
Run SQL PostCommit |
Run Java PreCommit |
Run SQL PostCommit |
2 similar comments
Run SQL PostCommit |
Run SQL PostCommit |
@@ -309,6 +312,10 @@ private QueryPlanner instantiatePlanner(JdbcConnection jdbcConnection, RuleSet[] | |||
return new CalciteQueryPlanner(jdbcConnection, ruleSets); | |||
} | |||
|
|||
if (queryPlannerClassName.equals(ZETASQL_PLANNER)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace this with a reflection version and move it into its own PR:
try {
return Class.forName(queryPlannerClassName)
.getConstructor(QueryPlanner.class)
.newInstance(jdbcConnection, ruleSets);
} catch (Exception e) {
throw new RuntimeException(
String.format("Cannot construct query planner %s", queryPlannerClassName), e);
}
Please move the ZetaSQL code into its own package. "sdks/java/extensions/sql/zetasql" would be a good choice. Once that is done, you can send an email to dev@ proposing this as a code drop. If that proposal is accepted I'll give it a quick code review. If not, we can discuss breaking this up into smaller pieces. |
@apilloud BeamSQL depends on Calcite and does relocation on it. The problem is, in order to initialize an instance of ZetaSQL planner, we expect BeamSQL pass in a config to ZetaSQL planner, which unfortunately is CalciteConnectionConfig. Clearly, it causes a conflict and code fails to compile. The right solution is vendored Calcite, which is in progress: #9189 So I suggest we still keep all code in BeamSQL, but in the dedicated What do you think? Do you have a better idea? |
I tried to run our internal test suite against this branch. There is a problem with gRPC dependency from jni_channel. It doesn't seem to be using vendored gRPC, or relocating gRPC classes properly. |
It was causing:
It seems there is an issue with Netty as well:
|
During the process of moving zetasql planner to another package, I also experienced missing grpc dependency. |
The grpc issue might be a ZetaSQL bug, I'll work with you to get an isolated repro on Monday. As for moving this to a separate package, I would consider that a blocker to this merging. ZetaSQL currently only works on a limited subset of linux machines, we don't want to break other users and developers. I see #9189 is waiting for my review, so I'm going to do that now to help move this forward. |
5adf530
to
e2fd047
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too large to review, but it is a code drop so we are OK with that. I reviewed the changes outside of zetasql then skimmed the rest.
LGTM
PrecommitJobBuilder builder = new PrecommitJobBuilder( | ||
scope: this, | ||
nameBase: 'JavaBeamZetaSQL', | ||
gradleTask: ':javaPreCommitBeamZetaSQL', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this justifies its own jenkins precommit. Should this just be added to the existing SQL precommit?
If you do plan to keep, please test that this works in the PR prior to merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no SQL precommit. There is a SQL postcommit. I am actually happy to add to an existing precommit if there is any suitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the SQL unit tests run as part of the Java precommit.
@@ -35,6 +35,8 @@ applyJavaNature( | |||
include(dependency("org.apache.calcite:.*")) | |||
include(dependency("org.apache.calcite.avatica:.*")) | |||
include(dependency("org.codehaus.janino:.*")) | |||
include(dependency("com.google.api.grpc:.*")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole block of things goes away in #9189. We will probably need to vendor ZetaSQL. (No action required now, just want to make sure you are aware of the conflict.) cc: @vectorijk
@@ -82,6 +92,10 @@ dependencies { | |||
compile "org.apache.calcite:calcite-core:$calcite_version" | |||
compile "org.apache.calcite:calcite-linq4j:$calcite_version" | |||
compile "org.apache.calcite.avatica:avatica-core:$avatica_version" | |||
compile "com.google.api.grpc:proto-google-common-protos:1.12.0" // Interfaces with ZetaSQL use this | |||
compile "com.google.zetasql:zetasql-jni-channel:2019.07.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make the version 2019.07.1
a variable.
// Disable Gradle cache (it should not be used because the IT's won't run). | ||
outputs.upToDateWhen { false } | ||
|
||
include '**/*ZetaSQL.class' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make this *TestZetaSQL.class
instead?
@@ -132,6 +132,8 @@ public String explain(String sqlString) throws ParseException { | |||
public static class BeamSqlEnvBuilder { | |||
private static final String CALCITE_PLANNER = | |||
"org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner"; | |||
private static final String ZETASQL_PLANNER = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't belong here. Can you just delete it?
FunctionSignatureId.FN_CASE_NO_VALUE, | ||
FunctionSignatureId.FN_CASE_WITH_VALUE, | ||
FunctionSignatureId.FN_TIMESTAMP_ADD, | ||
// FunctionSignatureId.FN_TIMESTAMP_SUB, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a whole bunch of these commented out values, should they just be deleted? (I'm guessing they are effectively TODOs, but want to confirm.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are effectively TODO or partially supported. I will add comment to mark it.
Run javaPreCommitBeamZetaSQL |
Run Seed Job |
Run JavaBeamZetaSQL Precommit |
Run Python PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left some comments. Since this is so huge, it is best to merge first and then fix the issues after, I think. The main thing that probably is new to this PR is the integration with Beam's gradle and Jenkins configurations. I think you should not have to do any new configuration to run your tests. Maybe I missed something special about them?
PrecommitJobBuilder builder = new PrecommitJobBuilder( | ||
scope: this, | ||
nameBase: 'JavaBeamZetaSQL', | ||
gradleTask: ':javaPreCommitBeamZetaSQL', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the SQL unit tests run as part of the Java precommit.
@@ -187,6 +208,19 @@ task runPojoExample(type: JavaExec) { | |||
args = ["--runner=DirectRunner"] | |||
} | |||
|
|||
task runZetaSQLTest(type: Test) { | |||
// Disable Gradle cache (it should not be used because the IT's won't run). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are these ITs? I was scanning the files but did not see them.
|
||
/** ZetaSQLDialectSpecTest. */ | ||
@RunWith(JUnit4.class) | ||
public class ZetaSQLDialectSpecTestZetaSQL { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally this class would end with Test
. It has a strange name. But more important is that people may not notice it if they are reading quickly for test classes, or using find
command line tool, etc.
It is also far too big. Should be broken up into areas of the dialect to make it easier to navigate and control which tests execute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Tests should be split.
For naming, ending Test
will be matched as normal SQL tests (which we should not allow it happen due to less platform support of ZetaSQL). It is why IT
is the end for integration tests. I followed the same convention to add ZetaSQL
after Test
to indicate what kind of tests it is.
Looks like the SQL post commit didn't get run on this. It started hard failing immediately after this went in: https://issues.apache.org/jira/browse/BEAM-8036 |
Support ZetaSQL(https://github.com/google/zetasql) as a dialect in BeamSQL.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.