You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 24, 2021. It is now read-only.
I'm recently facing sporadic disk latency jitters, which triggers TiKV_async_request_write_duration_seconds. Alert itself is a great idea, but debugging/analyzing what happened isn't currently straightforward with the out-of-box dashboards. I have a couple of proposals:
Add the panel to show sum(rate(tikv_storage_engine_async_request_duration_seconds_bucket{type="write"}[1m])) by (le, instance, type)) under the Storage row
Improvements in the RaftIO row
Add P99.9 and P99.99 to Apply log duration and Append log duration panels
Add Apply log duration per server (P99.99) and Append log duration per server (P99.99) to show P99.99 durations per server in addition to P99 durations per server that we currently have
I'm recently facing sporadic disk latency jitters, which triggers
TiKV_async_request_write_duration_seconds
. Alert itself is a great idea, but debugging/analyzing what happened isn't currently straightforward with the out-of-box dashboards. I have a couple of proposals:sum(rate(tikv_storage_engine_async_request_duration_seconds_bucket{type="write"}[1m])) by (le, instance, type))
under theStorage
rowRaftIO
rowApply log duration
andAppend log duration
panelsApply log duration per server (P99.99)
andAppend log duration per server (P99.99)
to show P99.99 durations per server in addition to P99 durations per server that we currently haveI think this aligns well with the official recommendations based on P99.99 metrics. (e.g. https://pingcap.com/docs/v3.0/reference/key-monitoring-metrics/tikv-dashboard/#key-metrics-description)
FYI - Proposals are based on
tikv-pull.json
fromv2.1.16
The text was updated successfully, but these errors were encountered: