Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect results (including nulls) when querying string column with col <> '' and col is not null #10525

Open
shashanksinghal opened this issue Oct 21, 2020 · 3 comments

Comments

@shashanksinghal
Copy link

Affected Version

0.18.0 and 0.20.0

Description

For druid 0.18.0 and config useDefaultValueForNull to false, querying a table with condition on string column (lets say col) with condition like col <> '' and col is not null returns rows with that column null. Both of these conditions behave correctly if passed separately but when together, nulls are not filtered at all.

  • Cluster size
    Local docker setup
  • Configurations in use
    useDefaultValueForNull is False
  • Steps to reproduce the problem
  1. Setup local druid 0.18.0 using docker setup
  2. Load example data viz. wikipedia
  3. Query: select * from wikipedia where cityName is not null and cityName <> '' limit 100
  • The error message or stack traces encountered.
    In the results you can see rows with cityName null as well
  • Any debugging that you have already done
    I tested it with version 0.18.0 and 0.20.0 and both have these issue
@Gahen
Copy link

Gahen commented Oct 29, 2020

We'd also notice this issue to when filtering doing an SQL query like SELECT * FROM some_table WHERE some_string_field IS NOT NULL AND (NOT some_string_field = 'some value') GROUP BY some_string_field.

The EXPLAIN PLAN FOR seems to show that druid discards the "not null" clause when parsing the SQL as it's the exact same when removing the some_string_field IS NOT NULL AND part.

One workaround for us was to use LIKE instead of IS

@shashanksinghal
Copy link
Author

Thanks @Gahen, another possible solution I found is:
instead of
select * from receiver where some_string_field is not null and some_string_field != ''
send
select * from receiver where NVL(some_string_field, 'nullVal') != 'nullVal' and some_string_field != ''

NOTE that NVL solves the issue but COALESCE does not.

@chenyuzhi459
Copy link
Contributor

chenyuzhi459 commented Feb 24, 2021

Hey, i met the same problem with you. In my case, is not null is invalid in my sql-query for string column(which is string-array type strictly), and I had fix the problem with pr #10921. Hope it can help you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants