Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the druid json_object issue #16592

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

AlbericByte
Copy link
Contributor

@AlbericByte AlbericByte commented Jun 12, 2024

Fixes #16356.

Description

the dependence calcite cr need to be review
the root case is that the operator is SqlFunction, but in the map, the key instance is SqlJsonObjectFunction, the type is not equals, so we can naver get from the map in the following code:

 public @Nullable RexCallImplementor get(final SqlOperator operator) {
    if (operator instanceof SqlUserDefinedFunction) {
      org.apache.calcite.schema.Function udf =
          ((SqlUserDefinedFunction) operator).getFunction();
      if (!(udf instanceof ImplementableFunction)) {
        throw new IllegalStateException("User defined function " + operator
            + " must implement ImplementableFunction");
      }
      CallImplementor implementor =
          ((ImplementableFunction) udf).getImplementor();
      return wrapAsRexCallImplementor(implementor);
    } else if (operator instanceof SqlTypeConstructorFunction) {
      return map.get(SqlStdOperatorTable.ROW);
    }
    return map.get(operator);
  }

Fixed the bug ...

Release note

Fix the SQL JSON_OBJECT() function results in RUNTIME_FAILURE when querying INFORMATION_SCHEMA.COLUMNS


Key changed/added classes in this PR
  • NestedDataOperatorConversions.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@FrankChen021
Copy link
Member

So to resolve this problem, we need to update the calcite to a release that includes your fix?

@FrankChen021
Copy link
Member

can we add a test case?

@AlbericByte
Copy link
Contributor Author

@FrankChen021, @kgyrtkirk and @cryptoe
i am creating the test case in calcite first, still working on this.
After, i will create the test case in druid

@cryptoe
Copy link
Contributor

cryptoe commented Jun 24, 2024

So you want all functions to work with equalsIgnoreCase ?

select fruit, sum(price) from foo group by 1 = select fruit, sUm(price) from foo group by 1 ?

@kgyrtkirk
Copy link
Member

I wonder why can't we simply use the SqlStdOperatorTable.JSON_OBJECT instead of declaring a new sqlfunction in NestedDataOperatorConversions.

  public static class JsonObjectOperatorConversion implements SqlOperatorConversion
  {
    private static final SqlFunction SQL_FUNCTION = SqlStdOperatorTable.JSON_OBJECT;
  [...]

I've done a quick check with a testcase in CalciteSysQueryTest

  @Test
  public void testJsonObject()
  {
    testBuilder()
        .sql(
            "select JSON_OBJECT('name': COLUMN_NAME, 'type': \"DATA_TYPE\")"
                + "from INFORMATION_SCHEMA.COLUMNS order by COLUMN_NAME limit 1"
        )
        .expectedResults(
            ImmutableList.of(
                new Object[] {"{\"name\":\"CATALOG_NAME\",\"type\":\"VARCHAR\"}"}
            )
        )
        .run();
  }

I'm not sure if other tests will pass with this change...but SqlToRel was using a direct comparision check on the builtin operator; we have a few of those for the JSON_ functions...maybe the best would be to use the SqlStdOperator versions for the ones SqlToRel does special handling?

...other way around could be to somehow get rid of the Bindables mode - so that this query gets executed the same way as regular Druid queries

@AlbericByte
Copy link
Contributor Author

AlbericByte commented Jun 27, 2024

Bindables mode
hI @kgyrtkirk Thanks for your suggestion, I have switch to SqlStdOperatorTable.JSON_OBJECT

May i know what is your idea of SqlStdOperator versions solution?

);
}
})
.returnTypeInference(NESTED_RETURN_TYPE_INFERENCE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the are some differences between these functions....most importantly the return type is different; the SqlStdOp returns VARCHAR ; meanwhile this function was returning NESTED_DATA

there are also some test failures; one is CalciteNestedDataQueryTest.testJsonQueryAndJsonObject

Copy link
Contributor Author

@AlbericByte AlbericByte Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgyrtkirk @cryptoe and @FrankChen021 i try to use the original OperatorConversions(not SqlStdOperatorTable.JSON_OBJECT)

and during plan process, original JsonObjectOperatorConversion will add new virtualColumn base on the Expression.

but for the BindablesQuery, we have different plan process and not add virtualColumn base on the Expression, so it will go to calcite json_object, then we have this runtime error for any BindablesQuery which use json_object.

Please let me know your idea, i need your help to fix this issue.

@AlbericByte
Copy link
Contributor Author

AlbericByte commented Jul 5, 2024

@kgyrtkirk i have try to add some calcite rule for plan, but it still need to get sqlfunction from rextableimp, i do not have any other work around now, i am open if someone can provide idea or continue to work on this, i can monitor and learn.

@kgyrtkirk
Copy link
Member

I think that getting rid of the bindables mode might just make this and possibly other issues go away...

I was looking around a bit and I think you will get similar issues for all the functions inside NestedDataOperatorConversions

to address that I think the following might be considered:

  • fix these SqlFunction-s in NestedDataOperatorConversions ; to be usable bindable functions
  • make the bindables go away
    • the reason these are around is due to the fact that some tables are only provided in a way the bindable wants them here
    • fixing that would be to make sure that all tables can be unwrapped to DruidTable
    • easiest option would be to provide them as an InlineTable ; however that would mean all those values will appear in the plans - so a new subclass of DruidTable would be needed - which could access information_schema stuff

I would preffer (2) as I have a feeling that (1) might be quite tricky....
Removing bindables mode have further benefits; it will make those queries work the same as regular tables with joins/etc; and it would reduce the complexity a bit.

I'll try to think about further alternatives/ideas/etc

cc: @gianm

@AlbericByte
Copy link
Contributor Author

@kgyrtkirk thanks let me know your solution, i will try this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL JSON_OBJECT() function results in RUNTIME_FAILURE when querying INFORMATION_SCHEMA.COLUMNS
4 participants