Unsatisfiable WHERE clause leads to exception instead of empty result #13152

hosswald · 2022-09-28T14:10:33Z

Affected Version

0.23.0

Description

The following query

SELECT *
FROM "druid"."mydata"
WHERE false
ORDER BY "__time" DESC

leads to the following error:
druid error: Unsupported operation (org.apache.druid.java.util.common.UOE): Time-ordering on scan queries is only supported for queries with segment specs of type MultipleSpecificSegmentSpec or SpecificSegmentSpec...a [MultipleIntervalSegmentSpec] was received instead.

false can be an arbitrary unsatisfiable boolean expression. I came across it when trying to filter for a specific exact timestamp in Superset, which is translated into the unsatisfiable expression

WHERE MILLIS_TO_TIMESTAMP(recordTimeParsed) >= TIME_PARSE('2022-09-25T00:00:00')
  AND MILLIS_TO_TIMESTAMP(recordTimeParsed) <  TIME_PARSE('2022-09-25T00:00:00')

Since it seems to be valid vor any unsatisfiable WHERE clause, I'm omitting details about my specific dataset.

The text was updated successfully, but these errors were encountered:

gianm · 2022-09-28T18:07:47Z

I tried this locally and what happens here is:

The unsatisfiable filter leads to the table being swapped out with an inline empty recordset. This is good, it means the planner realizes no data can possibly match the filter.
The query runs as normal on the inline empty recordset. Here it fails because scan query does not support ordering on inline data.

We can fix it by allowing the scan query to order inline data.

abhishekagarwal87 · 2022-09-29T06:49:01Z

Yeah. I was thinking if we can just make a special case of empty inline data. Since we would be returning an empty sequence anyway. @599166320 was also making some changes in this area recently. that could be relevant.

599166320 · 2022-09-30T07:42:17Z

@abhishekagarwal87 I recently improved the sorting function of common fields, which should solve the sorting problem of inline data together. However, I have recently closed this PR for some reasons. If you are interested, I will reopen this pr and let you review it.

abhishekagarwal87 · 2022-09-30T09:52:52Z

@paul-rogers - since you were reviewing that PR, can you take that PR to completion?

paul-rogers · 2022-10-03T23:04:55Z

In the DB community, there is often a need to get the schema of a query without running the query. Typically a UI wants to work out the table or chart structure it will use to process data. Or, some bit of code generation wants to know how to create code to handle this particular query structure.

The fancy DBs provides a PREPARE statement to get the schema. The crude-but-effective approach is to use WHERE 0=1 or WHERE FALSE as in this example.

Each of these is a message to the planner that the client only needs the result set schema (signature), not the data. Given this, the proper approach is to plan the query, gather the signature, then go down a "no data" path. In Druid, that just means returning an empty Sequence with no execution behind it.

Given this, the need to handle sorting is moot on this code path. Because the "no data" path is a well-known idiom, I'd guess that Calcite has a way to tell us that the result set will always be empty. We can use that to trigger our no-data path.

599166320 · 2022-10-04T01:32:21Z

Here, I want to talk about an issue related to inline data sources.

At present, Druid can use the orderBy query of scanquery for inlineDataSources with data, but the query result set is not sorted, which I think is a problem.

For example, the following query results are not sorted:

{
  "queryType": "scan",
  "dataSource": {
    "type": "inline",
    "columnNames": ["__time","country", "city"],
    "rows": [
      [1664555457209,"United States", "San Francisco"],
      [1664555457219,"Canada", "Calgary"],
      [1664555457229,"Canada", "Calgary"],
      [1664555457239,"Canada", "Calgary"],
      [1664555457249,"Canada", "Calgary"]
    ]
  },
  "columns": ["__time","country", "city"],
  "intervals": ["0000/3000"],
    "orderBy": [
    {
      "columnName": "__time",
      "order": "descending"
    },
    {
      "columnName": "city",
      "order": "descending"
    }
  ],
    "limit": 1001
}

I think that for the result set of an inlineDataSource, whether there is data or no data, as long as it is sorted, it can run the same path to simplify processing.

gianm · 2022-10-04T04:46:17Z

I sketched out an idea for this issue. Here it is: master...gianm:druid:scan-inline-sort. I haven't made a PR yet, since I'd need to add some tests.

The idea in the patch goes beyond the empty-data case. It also handles the case where we really do have inline data, and want to actually sort it. By skipping getIntervalsFromSpecificQuerySpec, and using Intervals.ONLY_ETERNITY instead, we are basically saying we're going to sort all data instead of doing it interval-by-interval. That's fine when we have only one query runner. (Which is the case for inline data.)

abhishekagarwal87 · 2022-10-04T05:49:27Z

@gianm - but what piece of code does the sorting of inline data itself by the time column? I was thinking of limiting the size of inline data and then feeding it to stableLimitingSort (that uses a priority queue for ordering).

gianm · 2022-10-06T19:18:11Z

Good question. That's another reason that little sketch isn't ready to be a PR 🙂

I think the answer is, if there's a limit then it's done by stableLimitingSort. If there's no limit then I don't see that it actually does get sorted. That'd need to change.

Your idea sounds good. What would we do if there's no limit? Ideally, since it's inline data and it's not likely to be very big, we sort it anyway. It could be done either in the mergeRunners call, or in the runner created by createRunner (since there's only one runner for inline data). Wondering what you think.

gianm · 2022-10-06T19:23:16Z

Thinking about it a bit more. I think it makes the most sense to do the sorting in the runner from createRunner. I actually think that is already happening: check out the makeCursors call in ScanQueryEngine. It is either ascending or descending based on the time order of the query.

So mergeRunners just needs to merge runners that are already sorted. In the inline case, there is only one runner, so there's nothing to do. We can return it as-is.

What do you think?

abhishekagarwal87 · 2022-10-07T11:18:28Z

I can't spot the sorting operation in the runner. The data from physical segments are already sorted by time. That ordering is used to decide if the data needs to be reversed. That's why we are only passing the type of ordering (Asc/desc) down the makeCursors call and not the columns on which to order by. Are you suggesting passing the columns to makeCursors and then doing the sorting in RowBasedStorageAdapter?
Or you are suggesting doing the sorting in the createRunner method itself? The former seems a simpler approach to implement. We will just materialize the rows into a list and sort the list.

hosswald added the Uncategorized problem report label Sep 28, 2022

abhishekagarwal87 added Bug Area - SQL and removed Uncategorized problem report labels Sep 28, 2022

hosswald changed the title ~~Unsatisfyable WHERE clause leads to exception instead of empty result~~ Unsatisfiable WHERE clause leads to exception instead of empty result Sep 29, 2022

599166320 mentioned this issue Oct 2, 2022

ScanQuery supports multi column orderBy queries #13168

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsatisfiable WHERE clause leads to exception instead of empty result #13152

Unsatisfiable WHERE clause leads to exception instead of empty result #13152

hosswald commented Sep 28, 2022

gianm commented Sep 28, 2022

abhishekagarwal87 commented Sep 29, 2022

599166320 commented Sep 30, 2022

abhishekagarwal87 commented Sep 30, 2022

paul-rogers commented Oct 3, 2022 •

edited

Loading

599166320 commented Oct 4, 2022

gianm commented Oct 4, 2022

abhishekagarwal87 commented Oct 4, 2022

gianm commented Oct 6, 2022

gianm commented Oct 6, 2022

abhishekagarwal87 commented Oct 7, 2022

Unsatisfiable WHERE clause leads to exception instead of empty result #13152

Unsatisfiable WHERE clause leads to exception instead of empty result #13152

Comments

hosswald commented Sep 28, 2022

Affected Version

Description

gianm commented Sep 28, 2022

abhishekagarwal87 commented Sep 29, 2022

599166320 commented Sep 30, 2022

abhishekagarwal87 commented Sep 30, 2022

paul-rogers commented Oct 3, 2022 • edited Loading

599166320 commented Oct 4, 2022

gianm commented Oct 4, 2022

abhishekagarwal87 commented Oct 4, 2022

gianm commented Oct 6, 2022

gianm commented Oct 6, 2022

abhishekagarwal87 commented Oct 7, 2022

paul-rogers commented Oct 3, 2022 •

edited

Loading