Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query slow about near real time data #9918

Open
cxlRay opened this issue May 25, 2020 · 15 comments
Open

query slow about near real time data #9918

cxlRay opened this issue May 25, 2020 · 15 comments

Comments

@cxlRay
Copy link

cxlRay commented May 25, 2020

when send batch query to Realtime processes tasks, the performance is too bad. TPS only have 80, response time is more than one second,
max response time can be 30 second and 99% response time can be 15 second.
what I confused is the data of Realtime processes are in mem, why response time of query is so long?

Affected Version

v druid-0.16.1-incubating.

Description

  • Cluster size
    coordinator and overlord: 2
    historical: 7
    middleManager: 7
    broker: 5

  • The testing tool is jmeter

  • testing result
    thread num: 500
    TPS: 107.80
    average response time: 4372ms
    99% response time: 15559ms
    max response time: 30150ms
    min response time: 373

  • the Flame chart of Realtime task code when test
    apm-flame

  • the configuration of middleManager
    middlManager

  • the configuration of Realtime tasks

{
  "type": "kafka",
  "dataSchema": {
    "dataSource": "xxxx",
    "parser": {
      "type": "string",
      "parseSpec": {
        "format": "json",
        "timestampSpec": {
          "column": "timestamp",
          "format": "posix"
        },
        "dimensionsSpec": {
          "dimensions": ["tag1","tag2","tag3","tag4","tag5","tag6","tag7"],
          "dimensionExclusions": [
            "timestamp",
            "value"
          ]
        }
      }
    },
    "metricsSpec": [
      {
        "name": "value",
        "fieldName": "value",
        "type": "doubleSum"
      }
    ],
    "granularitySpec": {
      "type": "uniform",
      "segmentGranularity": "HOUR",
      "queryGranularity": "NONE",
      "rollup" : false
    }
  },
  "tuningConfig": {
    "type": "kafka",
    "intermediatePersistPeriod": "PT1H",
    "maxTotalRows": "245000000",
    "maxRowsPerSegment": 5000000
  },
  "ioConfig": {
    "topic": "xxxxx",
    "consumerProperties": {
      "bootstrap.servers": "xxxx:9092"
    },
    "taskCount": 16,
    "replicas": 1,
    "taskDuration": "PT1H"
  }
}
  • about segments
    segment-1
    segments-2

  • my query

{
  "queryType": "timeseries",
  "dataSource": "xxxx",
  "granularity": "second",
  "context": {
  	"skipEmptyBuckets": true,
  	"vectorize": "true"
  },
  "filter": { "type": "and", "fields": [{ "type": "selector", "dimension": "endpoint", "value": "host"}, { "type": "selector", "dimension": "metric", "value":"cpu.busy"}] },
  "aggregations": [
  	{ "type": "count", "name": "count"},
  	{ "type": "stringLast", "name": "dsType", "fieldName": "counterType" },
  	{ "type": "doubleMax", "name": "max_value", "fieldName": "value" },
  	{ "type": "doubleMin", "name": "min_value", "fieldName": "value" },
    { "type": "doubleSum", "name": "sum_value", "fieldName": "value" }
  ],
  "postAggregations": [
    { "type": "arithmetic",
      "name": "avg",
      "fn": "/",
      "fields": [
        { "type": "fieldAccess", "name": "sum_value", "fieldName": "sum_value" },
        { "type": "fieldAccess", "name": "count", "fieldName": "count" }
      ]
    }
  ],
  "intervals": [ "2020-05-25T15:35:00+08:00/2020-05-25T15:52:00+08:00" ]
}
@yuanlihan
Copy link
Contributor

Hi @cxlRay,
You can try to minimise intermediatePersistPeriod which is PT10M by default and then the persisted data could apply the "vectorize": "true" option. Hope this helps to some extent.

@exherb
Copy link
Contributor

exherb commented Jun 17, 2020

some issue here.

@cxlRay
Copy link
Author

cxlRay commented Jun 18, 2020

@yuanlihan thanks for your answer, minimise intermediatePersistPeriod will create much more small file

@yuanlihan
Copy link
Contributor

@yuanlihan thanks for your answer, minimise intermediatePersistPeriod will create much more small file

@cxlRay that's true. But the temporary persist files will be cleaned up when the hourly tasks finished. And the persisted incremental files/indexes(with extra indexes to speed up query processing) will be more efficient than the in-memory incrementalIndex. As far as I know, when a query scan the latest in-memory incrementalIndex(like latest 10 minutes's data), Druid processes the in-memory fact table held by incrementalIndex row by row.
Actually, I had tried to minimise the value of maxRowsInMemory to reduce in-memory rows but this will also introduce some overheads caused by frequently persisting.

@exherb
Copy link
Contributor

exherb commented Jun 22, 2020

timeseries is very slow with filter. (10-20s)

@cxlRay
Copy link
Author

cxlRay commented Jun 22, 2020

@exherb say more about your question, if your cluster run on ssd, there will be better

@exherb
Copy link
Contributor

exherb commented Jun 22, 2020

@exherb say more about your question, if your cluster run on ssd, there will be better

  • historacal & middlemanager runs on ssd.
  • 5million rows per segement / kafka ingestion task
  • use turnilo to query
  • historical query/time is acceptable (around 100-500ms)
  • peon query:timeseries(lookup) / topN query/time is acceptable
  • peon query:topN(lookup) / timeseries is very slow (timeseries query/time: 16s-40s)

@cxlRay
Copy link
Author

cxlRay commented Jun 23, 2020

just like yuanlihan say,minimise intermediatePersistPeriod or maxRowsInMemory , add query parmeter vectorize

@exherb
Copy link
Contributor

exherb commented Jun 23, 2020

just like yuanlihan say,minimise intermediatePersistPeriod or maxRowsInMemory , add query parmeter vectorize

intermediatePersistPeriod: P10M
maxRowsInMemory: 1000000

context: { vectorize: true }

still slow.

@navis
Copy link
Contributor

navis commented Jun 23, 2020

rows in no-rollup incremental index are stored as

ConcurrentSkipListMap<Long, ConcurrentLinkedDeque<IncrementalIndexRow>>

which seemed not easy to be fast, imho.

Anyway, can you try with coarser granularity, like 15 minutes or something?

@exherb
Copy link
Contributor

exherb commented Jun 23, 2020

rows in no-rollup incremental index are stored as

ConcurrentSkipListMap<Long, ConcurrentLinkedDeque<IncrementalIndexRow>>

which seemed not easy to be fast, imho.

Anyway, can you try with coarser granularity, like 15 minutes or something?

we enabled rollup to 1minutes, segment granularity by hour. Are you suggestion change segment granularity to 15 minutes?

@cxlRay
Copy link
Author

cxlRay commented Jun 23, 2020

@exherb how do you know the slow part is peon query,the query process: client -> broker -> indexing service(Peon), use monitor metric or analysis code?

@exherb
Copy link
Contributor

exherb commented Jun 23, 2020

@exherb how do you know the slow part is peon query,the query process: client -> broker -> indexing service(Peon), use monitor metric or analysis code?

by druid metrics: query/segment/time / query/time (datasource~=.+middlemanager.+)

@navis
Copy link
Contributor

navis commented Jun 23, 2020

@exherb I mean granularity of timeseries query.

@louisliu318
Copy link

same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants