Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query planner is failing to optimize intervals in some cases #16527

Open
sergei3000 opened this issue May 31, 2024 · 0 comments
Open

Query planner is failing to optimize intervals in some cases #16527

sergei3000 opened this issue May 31, 2024 · 0 comments

Comments

@sergei3000
Copy link

Query plan in web ui shows the intervals are fully scanned when there's

  1. a BETWEEN clause on DATE_TRUNC('DAY, __time)`
  2. a NOT IN clause containing a list of dates with some adjacent days that are inside the between interval
  3. some other filter on some other column
    in the query.

Affected Versi

29.0.1

Description

I believe there's some problem/bug with the query planner.

For example, here's a query that works in my setup (it's a small cluster of several servers):

SELECT *
FROM mytable
WHERE __time BETWEEN '2024-03-01' and '2024-03-10' and mycol = 'ABC'
and DATE_TRUNC('day', __time) not in (TIMESTAMP '2024-03-05',  '2024-03-07')

(I changed the names of columns/tables except for __time)
By "works" I mean it works fast and the intervals are optimal in the plan:

  "intervals": {
    "type": "intervals",
    "intervals": [
      "2024-03-01T00:00:00.000Z/2024-03-05T00:00:00.000Z",
      "2024-03-06T00:00:00.000Z/2024-03-07T00:00:00.000Z",
      "2024-03-08T00:00:00.000Z/2024-03-10T00:00:00.001Z"
    ]
  },

But once I change the dates list in the not in part to have two adjacent dates:

SELECT *
FROM mytable
WHERE __time BETWEEN '2024-03-01' and '2024-03-10' and mycol = 'ABC'
and DATE_TRUNC('day', __time) not in (TIMESTAMP '2024-03-05', '2024-03-06')

intervals get a full scan (and the query hangs forever):

  "intervals": {
    "type": "intervals",
    "intervals": [
      "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
    ]
  },

Another interesting case is when I take the problematic query and delete the additional filter by another column (note the commented out and mycol = 'ABC' in this one):

SELECT *
FROM mytable
WHERE __time BETWEEN '2024-03-01' and '2024-03-10' --and symbol = 'SPY'
and DATE_TRUNC('day', __time) not in (TIMESTAMP '2024-03-05', '2024-03-06')

This time the intervals are just as expected and the query is fast again:

  "intervals": {
    "type": "intervals",
    "intervals": [
      "2024-03-01T00:00:00.000Z/2024-03-05T00:00:00.000Z",
      "2024-03-07T00:00:00.000Z/2024-03-10T00:00:00.001Z"
    ]
  },

Anyways, I expected the query planner/optimizer should always understand the constraints set by the dates in my between clause in all cases.

Let me know if you want any other info, I'll try to provide whatever I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant