-
Notifications
You must be signed in to change notification settings - Fork 678
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
- Loading branch information
Showing
2 changed files
with
70 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
--- | ||
title: TiFlash Alert Rules | ||
summary: Learn the alert rules of the TiFlash cluster. | ||
category: reference | ||
--- | ||
|
||
# TiFlash Alert Rules | ||
|
||
This document introduces the alert rules of the TiFlash cluster. | ||
|
||
## `TiFlash_schema_error` | ||
|
||
- Alert rule: | ||
|
||
`increase(tiflash_schema_apply_count{type="failed"}[15m]) > 0` | ||
|
||
- Description: | ||
|
||
When the schema apply error occurs, an alert is triggered. | ||
|
||
- Solution: | ||
|
||
The error might be caused by some wrong logic. Contact [TiFlash R&D](mailto:support@pingcap.com) for support. | ||
|
||
## `TiFlash_schema_apply_duration` | ||
|
||
- Alert rule: | ||
|
||
`histogram_quantile(0.99, sum(rate(tiflash_schema_apply_duration_seconds_bucket[1m])) BY (le, instance)) > 20` | ||
|
||
- Description: | ||
|
||
When the probability that the apply duration exceeds 20 seconds is over 99%, an alert is triggered. | ||
|
||
- Solution: | ||
|
||
It might be caused by the internal problems of the TiFlash TMT engine. Contact [TiFlash R&D](mailto:support@pingcap.com) for support. | ||
|
||
## `TiFlash_raft_read_index_duration` | ||
|
||
- Alert rule: | ||
|
||
`histogram_quantile(0.99, sum(rate(tiflash_raft_read_index_duration_seconds_bucket[1m])) BY (le, instance)) > 3` | ||
|
||
- Description: | ||
|
||
When the probability that the read index duration exceeds 3 seconds is over 99%, an alert is triggered. | ||
|
||
> **Note:** | ||
> | ||
> `read index` is the kvproto request sent to the TiKV leader. TiKV region retries, busy store, or network problems might lead to long request time of `read index`. | ||
- Solution: | ||
|
||
The frequent retries might be caused by frequent splitting or migration of the TiKV cluster. You can check the TiKV cluster status to identify the retry reason. | ||
|
||
## `TiFlash_raft_wait_index_duration` | ||
|
||
- Alert rule: | ||
|
||
`histogram_quantile(0.99, sum(rate(tiflash_raft_wait_index_duration_seconds_bucket[1m])) BY (le, instance)) > 2` | ||
|
||
- Description: | ||
|
||
When the probability that the waiting time for Region Raft Index in TiFlash exceeds 2 seconds is over 99%, an alert is triggered. | ||
|
||
- Solution: | ||
|
||
It might be caused by a communication error between TiKV and the proxy. Contact [TiFlash R&D](mailto:support@pingcap.com) for support. |