query slow about near real time data #9918

cxlRay · 2020-05-25T07:59:56Z

when send batch query to Realtime processes tasks, the performance is too bad. TPS only have 80, response time is more than one second,
max response time can be 30 second and 99% response time can be 15 second.
what I confused is the data of Realtime processes are in mem, why response time of query is so long?

Affected Version

v druid-0.16.1-incubating.

Description

Cluster size
coordinator and overlord: 2
historical: 7
middleManager: 7
broker: 5
The testing tool is jmeter
testing result
thread num: 500
TPS: 107.80
average response time: 4372ms
99% response time: 15559ms
max response time: 30150ms
min response time: 373
the Flame chart of Realtime task code when test
the configuration of middleManager
the configuration of Realtime tasks

{
  "type": "kafka",
  "dataSchema": {
    "dataSource": "xxxx",
    "parser": {
      "type": "string",
      "parseSpec": {
        "format": "json",
        "timestampSpec": {
          "column": "timestamp",
          "format": "posix"
        },
        "dimensionsSpec": {
          "dimensions": ["tag1","tag2","tag3","tag4","tag5","tag6","tag7"],
          "dimensionExclusions": [
            "timestamp",
            "value"
          ]
        }
      }
    },
    "metricsSpec": [
      {
        "name": "value",
        "fieldName": "value",
        "type": "doubleSum"
      }
    ],
    "granularitySpec": {
      "type": "uniform",
      "segmentGranularity": "HOUR",
      "queryGranularity": "NONE",
      "rollup" : false
    }
  },
  "tuningConfig": {
    "type": "kafka",
    "intermediatePersistPeriod": "PT1H",
    "maxTotalRows": "245000000",
    "maxRowsPerSegment": 5000000
  },
  "ioConfig": {
    "topic": "xxxxx",
    "consumerProperties": {
      "bootstrap.servers": "xxxx:9092"
    },
    "taskCount": 16,
    "replicas": 1,
    "taskDuration": "PT1H"
  }
}

about segments
my query

{
  "queryType": "timeseries",
  "dataSource": "xxxx",
  "granularity": "second",
  "context": {
  	"skipEmptyBuckets": true,
  	"vectorize": "true"
  },
  "filter": { "type": "and", "fields": [{ "type": "selector", "dimension": "endpoint", "value": "host"}, { "type": "selector", "dimension": "metric", "value":"cpu.busy"}] },
  "aggregations": [
  	{ "type": "count", "name": "count"},
  	{ "type": "stringLast", "name": "dsType", "fieldName": "counterType" },
  	{ "type": "doubleMax", "name": "max_value", "fieldName": "value" },
  	{ "type": "doubleMin", "name": "min_value", "fieldName": "value" },
    { "type": "doubleSum", "name": "sum_value", "fieldName": "value" }
  ],
  "postAggregations": [
    { "type": "arithmetic",
      "name": "avg",
      "fn": "/",
      "fields": [
        { "type": "fieldAccess", "name": "sum_value", "fieldName": "sum_value" },
        { "type": "fieldAccess", "name": "count", "fieldName": "count" }
      ]
    }
  ],
  "intervals": [ "2020-05-25T15:35:00+08:00/2020-05-25T15:52:00+08:00" ]
}

The text was updated successfully, but these errors were encountered:

yuanlihan · 2020-06-11T03:30:46Z

Hi @cxlRay,
You can try to minimise intermediatePersistPeriod which is PT10M by default and then the persisted data could apply the "vectorize": "true" option. Hope this helps to some extent.

exherb · 2020-06-17T11:57:38Z

some issue here.

cxlRay · 2020-06-18T09:42:18Z

@yuanlihan thanks for your answer, minimise intermediatePersistPeriod will create much more small file

yuanlihan · 2020-06-19T02:36:38Z

@yuanlihan thanks for your answer, minimise intermediatePersistPeriod will create much more small file

@cxlRay that's true. But the temporary persist files will be cleaned up when the hourly tasks finished. And the persisted incremental files/indexes(with extra indexes to speed up query processing) will be more efficient than the in-memory incrementalIndex. As far as I know, when a query scan the latest in-memory incrementalIndex(like latest 10 minutes's data), Druid processes the in-memory fact table held by incrementalIndex row by row.
Actually, I had tried to minimise the value of maxRowsInMemory to reduce in-memory rows but this will also introduce some overheads caused by frequently persisting.

exherb · 2020-06-22T03:07:24Z

timeseries is very slow with filter. (10-20s)

cxlRay · 2020-06-22T07:06:04Z

@exherb say more about your question, if your cluster run on ssd, there will be better

exherb · 2020-06-22T12:35:59Z

@exherb say more about your question, if your cluster run on ssd, there will be better

historacal & middlemanager runs on ssd.
5million rows per segement / kafka ingestion task
use turnilo to query
historical query/time is acceptable (around 100-500ms)
peon query：timeseries(lookup) / topN query/time is acceptable
peon query：topN(lookup) / timeseries is very slow (timeseries query/time: 16s-40s)

cxlRay · 2020-06-23T02:17:15Z

just like yuanlihan say，minimise intermediatePersistPeriod or maxRowsInMemory , add query parmeter vectorize

exherb · 2020-06-23T03:14:14Z

just like yuanlihan say，minimise intermediatePersistPeriod or maxRowsInMemory , add query parmeter vectorize

intermediatePersistPeriod: P10M
maxRowsInMemory: 1000000

context: { vectorize: true }

still slow.

navis · 2020-06-23T03:27:36Z

rows in no-rollup incremental index are stored as

ConcurrentSkipListMap<Long, ConcurrentLinkedDeque<IncrementalIndexRow>>

which seemed not easy to be fast, imho.

Anyway, can you try with coarser granularity, like 15 minutes or something?

exherb · 2020-06-23T03:30:01Z

rows in no-rollup incremental index are stored as
ConcurrentSkipListMap<Long, ConcurrentLinkedDeque<IncrementalIndexRow>>
which seemed not easy to be fast, imho.

Anyway, can you try with coarser granularity, like 15 minutes or something?

we enabled rollup to 1minutes, segment granularity by hour. Are you suggestion change segment granularity to 15 minutes?

cxlRay · 2020-06-23T03:59:12Z

@exherb how do you know the slow part is peon query,the query process: client -> broker -> indexing service(Peon), use monitor metric or analysis code?

exherb · 2020-06-23T04:01:37Z

@exherb how do you know the slow part is peon query,the query process: client -> broker -> indexing service(Peon), use monitor metric or analysis code?

by druid metrics: query/segment/time / query/time (datasource~=.+middlemanager.+)

navis · 2020-06-23T04:40:32Z

@exherb I mean granularity of timeseries query.

louisliu318 · 2020-07-21T11:32:55Z

same issue.

cxlRay added the Uncategorized problem report label May 25, 2020

jihoonson added Area - Streaming Ingestion Performance and removed Uncategorized problem report labels Jun 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query slow about near real time data #9918

query slow about near real time data #9918

cxlRay commented May 25, 2020 •

edited

Loading

yuanlihan commented Jun 11, 2020

exherb commented Jun 17, 2020

cxlRay commented Jun 18, 2020

yuanlihan commented Jun 19, 2020

exherb commented Jun 22, 2020

cxlRay commented Jun 22, 2020

exherb commented Jun 22, 2020

cxlRay commented Jun 23, 2020

exherb commented Jun 23, 2020

navis commented Jun 23, 2020

exherb commented Jun 23, 2020

cxlRay commented Jun 23, 2020

exherb commented Jun 23, 2020

navis commented Jun 23, 2020

louisliu318 commented Jul 21, 2020

query slow about near real time data #9918

query slow about near real time data #9918

Comments

cxlRay commented May 25, 2020 • edited Loading

Affected Version

Description

yuanlihan commented Jun 11, 2020

exherb commented Jun 17, 2020

cxlRay commented Jun 18, 2020

yuanlihan commented Jun 19, 2020

exherb commented Jun 22, 2020

cxlRay commented Jun 22, 2020

exherb commented Jun 22, 2020

cxlRay commented Jun 23, 2020

exherb commented Jun 23, 2020

navis commented Jun 23, 2020

exherb commented Jun 23, 2020

cxlRay commented Jun 23, 2020

exherb commented Jun 23, 2020

navis commented Jun 23, 2020

louisliu318 commented Jul 21, 2020

cxlRay commented May 25, 2020 •

edited

Loading