-
Notifications
You must be signed in to change notification settings - Fork 678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tiflash: add the maintain.md doc #2189
Conversation
reference/tiflash/maintain.md
Outdated
|
||
## Logout a TiFlash node | ||
|
||
Logouting a TiFlash node differs from [Scaling in the TiFlash node](/reference/tiflash/scale.md#scale-in-tiflash-node) in that the logout doesn't remove the node from TiDB Ansible; instead, it just safely shutdown the process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference anchor is not finalized because scale.md
is not merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at L90. The anchor is decided: #scale-in-a-tiflash-node
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
reference/tiflash/maintain.md
Outdated
|
||
# Maintain a TiFlash Cluster | ||
|
||
This document describes common operations when you maintain a TiFlash cluster, including checking the version, node logout, troubleshooting, critical logs, and a system table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document describes common operations when you maintain a TiFlash cluster, including checking the version, node logout, troubleshooting, critical logs, and a system table. | |
This document describes how to perform common operations when you maintain a TiFlash cluster, including checking the TiFlash version, taking TiFlash nodes down, and troubleshooting TiFlash. This document also introduces critical logs and system tables of TiFlash. |
reference/tiflash/maintain.md
Outdated
LD_LIBRARY_PATH=./ ./tiflash version | ||
``` | ||
|
||
- Check the TiFlash version by referring to the TiFlash log. For the log path, see the [logger] part in [Configure the `tiflash.toml` file](/reference/tiflash/configuration.md#configure-the-`tiflash.toml`-file). For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Check the TiFlash version by referring to the TiFlash log. For the log path, see the [logger] part in [Configure the `tiflash.toml` file](/reference/tiflash/configuration.md#configure-the-`tiflash.toml`-file). For example: | |
- Check the TiFlash version by referring to the TiFlash log. For the log path, see the `[logger]` part in [the `tiflash.toml` configuration file](/reference/tiflash/configuration.md#configure-the-`tiflash.toml`-file). For example: |
The anchor link is wrong. Please correct it by referring to the PingCAP documentation style guide. @toutdesuite
reference/tiflash/maintain.md
Outdated
<information>: TiFlash version: TiFlash 0.2.0 master-375035282451103999f3863c691e2fc2 | ||
``` | ||
|
||
## Logout a TiFlash node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Logout a TiFlash node | |
## Take a TiFlash node down |
take nodes down
is an expression used more commonly in our technical documents and it has more search results in Google. Please update this expression in other places (if any). @toutdesuite
reference/tiflash/maintain.md
Outdated
|
||
## Take a TiFlash node down | ||
|
||
Taking a TiFlash node down differs from [Scaling in a TiFlash node](/reference/tiflash/scale.md#scale-in-a-tiflash-node) in that the former doesn't remove the node from TiDB Ansible; instead, it just safely shutdown the process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a TiFlash node down differs from [Scaling in a TiFlash node](/reference/tiflash/scale.md#scale-in-a-tiflash-node) in that the former doesn't remove the node from TiDB Ansible; instead, it just safely shutdown the process. | |
Taking a TiFlash node down differs from [Scaling in a TiFlash node](/reference/tiflash/scale.md#scale-in-a-tiflash-node) in that the former doesn't remove the node in TiDB Ansible; instead, it just safely shuts down the TiFlash process. |
reference/tiflash/maintain.md
Outdated
> | ||
> After you take the TiFlash node down, if the number of the remaining nodes in the TiFlash cluster is greater than or equal to the maximum replicas of all data tables, you can go directly to step 3. | ||
|
||
1. For a TiDB server, if the number of replicas of tables is greater than or equal to that of the remaining TiFlash nodes in the cluster, execute the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. For a TiDB server, if the number of replicas of tables is greater than or equal to that of the remaining TiFlash nodes in the cluster, execute the following command: | |
1. If the number of replicas of tables is greater than or equal to that of the remaining TiFlash nodes in the cluster, execute the following command on those tables in the TiDB client: |
reference/tiflash/maintain.md
Outdated
alter table <db-name>.<table-name> set tiflash replica 0; | ||
``` | ||
|
||
2. To ensure TiFlash replicas of related tables are removed, see [View the Table Replication Progress](/reference/tiflash/use-tiflash.md#view-the-table-replication-progress). If you cannot view the replication progress of the related tables, it means that the replicas are removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. To ensure TiFlash replicas of related tables are removed, see [View the Table Replication Progress](/reference/tiflash/use-tiflash.md#view-the-table-replication-progress). If you cannot view the replication progress of the related tables, it means that the replicas are removed. | |
2. To ensure that the TiFlash replicas of these tables are removed, see [View the Table Replication Progress](/reference/tiflash/use-tiflash.md#view-the-table-replication-progress). If you cannot view the replication progress of the tables, it means that the replicas are removed. |
reference/tiflash/maintain.md
Outdated
|
||
2. To ensure TiFlash replicas of related tables are removed, see [View the Table Replication Progress](/reference/tiflash/use-tiflash.md#view-the-table-replication-progress). If you cannot view the replication progress of the related tables, it means that the replicas are removed. | ||
|
||
3. Input the `store` command into [pd-ctl](/reference/tools/pd-control.md) (the binary file in `resources/bin` in the tidb-ansible directory) to view the `store id` of the TiFlash node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3. Input the `store` command into [pd-ctl](/reference/tools/pd-control.md) (the binary file in `resources/bin` in the tidb-ansible directory) to view the `store id` of the TiFlash node. | |
3. Input the `store` command into [pd-ctl](/reference/tools/pd-control.md) (the binary file is in `resources/bin` of the tidb-ansible directory) to view the `store id` of the TiFlash node. |
reference/tiflash/maintain.md
Outdated
|
||
4. Input `store delete <store_id>` into `pd-ctl`. Here `<store_id>` refers to the `store id` in step 3. | ||
|
||
5. When the corresponding `store` of the node disappeared, or when `state_name` is changed to `Tomestone`, shutdown the TiFlash process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5. When the corresponding `store` of the node disappeared, or when `state_name` is changed to `Tomestone`, shutdown the TiFlash process. | |
5. When the corresponding `store` of the node disappears, or when `state_name` is changed to `Tomestone`, stop the TiFlash process. |
In very few cases we use the past tense in our technical documents. Please read the Google Developer Style Guide for reference.
reference/tiflash/maintain.md
Outdated
|
||
> **Note:** | ||
> | ||
> If you don't cancel all tables replicated to TiFlash before all TiFlash nodes in a cluster stop running, you need to manually delete the replication rule in PD. Or you cannot successfully take the TiFlash node down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> If you don't cancel all tables replicated to TiFlash before all TiFlash nodes in a cluster stop running, you need to manually delete the replication rule in PD. Or you cannot successfully take the TiFlash node down. | |
> If you don't cancel all tables replicated to TiFlash before all TiFlash nodes stop running, you need to manually delete the replication rule in PD. Or you cannot successfully take the TiFlash node down. |
reference/tiflash/maintain.md
Outdated
|
||
## TiFlash troubleshooting | ||
|
||
This section describes some common questions of TiFlash, the reasons, and the solutions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section describes some common questions of TiFlash, the reasons, and the solutions. | |
This section describes some commonly encountered issues when using TiFlash, the reasons, and the solutions. |
reference/tiflash/maintain.md
Outdated
|
||
This section describes some common questions of TiFlash, the reasons, and the solutions. | ||
|
||
### TiFlash replica is always in an unusable state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### TiFlash replica is always in an unusable state | |
### TiFlash replica is always unavailable |
reference/tiflash/maintain.md
Outdated
|
||
### TiFlash replica is always in an unusable state | ||
|
||
This is because TiFlash is in the exception status caused by the configuration error or the environment problems. You can take the following steps to identify the problem component: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because TiFlash is in the exception status caused by the configuration error or the environment problems. You can take the following steps to identify the problem component: | |
This is because TiFlash is in an abnormal state caused by configuration errors or environment issues. Take the following steps to identify the faulty component: |
reference/tiflash/maintain.md
Outdated
|
||
The expected result is `"enable-placement-rules": "true"`. | ||
|
||
2. Check whether the TiFlash process in the operation system is working correctly using `UpTime` of the TiFlash-Summary monitor panel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. Check whether the TiFlash process in the operation system is working correctly using `UpTime` of the TiFlash-Summary monitor panel. | |
2. Check whether the TiFlash process is working correctly by viewing `UpTime` on the TiFlash-Summary monitoring panel. |
reference/tiflash/maintain.md
Outdated
echo "store" | /path/to/pd-ctl -u http://<pd-ip>:<pd-port> | ||
``` | ||
|
||
If `store.labels` includes information such as `{"key": "engine", "value": "tiflash"}`, it refers to the TiFlash proxy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If `store.labels` includes information such as `{"key": "engine", "value": "tiflash"}`, it refers to the TiFlash proxy. | |
The TiFlash proxy's `store.labels` includes information such as `{"key": "engine", "value": "tiflash"}`. You can check this information to confirm a TiFlash proxy. |
Please review my suggestion. The Chinese is: store.labels 中含有 {"key": "engine", "value": "tiflash"} 信息的为 TiFlash proxy。 @ilovesoup
reference/tiflash/maintain.md
Outdated
|
||
If `store.labels` includes information such as `{"key": "engine", "value": "tiflash"}`, it refers to the TiFlash proxy. | ||
|
||
4. Check whether `pd buddy` can print the logs correctly (the value of `log` in the [flash.flash_cluster] configuration item of the log path, is by default the `tmp` directory configured by the TiFlash configuration file). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4. Check whether `pd buddy` can print the logs correctly (the value of `log` in the [flash.flash_cluster] configuration item of the log path, is by default the `tmp` directory configured by the TiFlash configuration file). | |
4. Check whether `pd buddy` can correctly print the logs (the log path is the value of `log` in the [flash.flash_cluster] configuration item; the default log path is under the `tmp` directory configured in the TiFlash configuration file). |
原文是“查看 pd buddy 是否正常打印日志(日志路径的对应配置项 [flash.flash_cluster] log 设置的值,默认为 TiFlash 配置文件配置的 tmp 目录下)”,完整的说法是不是“查看 pd buddy 是否正常打印日志(日志路径为对应配置项 [flash.flash_cluster] 中 log
设置的值,日志路径默认在 TiFlash 配置文件配置的 tmp 目录下)”? @ilovesoup
reference/tiflash/maintain.md
Outdated
|
||
Reconfirm the value of `max-replicas`. | ||
|
||
6. Check whether the remaining disk space of the machine (where `store` of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of the `store` capacity (which is controlled by the `low-space-ratio` parameter), PD cannot schedule data to TiFlash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6. Check whether the remaining disk space of the machine (where `store` of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of the `store` capacity (which is controlled by the `low-space-ratio` parameter), PD cannot schedule data to TiFlash. | |
6. Check whether the remaining disk space of the machine (where `store` of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of the `store` capacity (which is controlled by the `low-space-ratio` parameter), PD cannot schedule data to this TiFlash node. |
reference/tiflash/maintain.md
Outdated
|
||
### TiFlash query time is unstable, and error log prints many `Lock Exception` messages | ||
|
||
This is because large amounts of data are written to the cluster, which leads to the situation that the TiFlash query encounters a lock and requires query retry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because large amounts of data are written to the cluster, which leads to the situation that the TiFlash query encounters a lock and requires query retry. | |
This is because large amounts of data are written to the cluster, which causes that the TiFlash query encounters a lock and requires query retry. |
reference/tiflash/maintain.md
Outdated
|
||
This is because large amounts of data are written to the cluster, which leads to the situation that the TiFlash query encounters a lock and requires query retry. | ||
|
||
You can set the query timestamp to one second earlier in TiDB (for example, `set @@tidb_snapshot=412881237115666555;`), to reduce the possibility that TiFlash query encounters a lock; thereby mitigating the risk of unstable query time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can set the query timestamp to one second earlier in TiDB (for example, `set @@tidb_snapshot=412881237115666555;`), to reduce the possibility that TiFlash query encounters a lock; thereby mitigating the risk of unstable query time. | |
You can set the query timestamp to one second earlier in TiDB (for example, `set @@tidb_snapshot=412881237115666555;`). This makes less TiFlash queries encounter a lock and mitigate the risk of unstable query time. |
reference/tiflash/maintain.md
Outdated
|
||
You can set the query timestamp to one second earlier in TiDB (for example, `set @@tidb_snapshot=412881237115666555;`), to reduce the possibility that TiFlash query encounters a lock; thereby mitigating the risk of unstable query time. | ||
|
||
### Partial queries return `Region Unavailable` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Partial queries return `Region Unavailable` | |
### Some queries return the `Region Unavailable` error |
reference/tiflash/maintain.md
Outdated
|
||
### Partial queries return `Region Unavailable` | ||
|
||
If the load pressure in TiFlash is so heavy that TiFlash data replication falls behind. Some queries might return error message `Region Unavailable`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the load pressure in TiFlash is so heavy that TiFlash data replication falls behind. Some queries might return error message `Region Unavailable`. | |
If the load pressure on TiFlash is too heavy and it causes that TiFlash data replication falls behind, some queries might return the `Region Unavailable` error. |
reference/tiflash/maintain.md
Outdated
|
||
If the load pressure in TiFlash is so heavy that TiFlash data replication falls behind. Some queries might return error message `Region Unavailable`. | ||
|
||
In this case, you can share the pressure by adding TiFlash nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, you can share the pressure by adding TiFlash nodes. | |
In this case, you can balance the load pressure by adding more TiFlash nodes. |
reference/tiflash/maintain.md
Outdated
| Log Information | Log Description | | ||
|---------------|-------------------| | ||
| [ 23 ] <Information> KVStore: Start to persist [region 47, applied: term 6 index 10] | Data starts to be replicated (the number in the square brackets at the start of the log refers to the thread ID | | ||
| [ 30 ] <Debug> CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | `Handling DAG request` refers to that TiFlash starts to handle a Coprocessor request | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| [ 30 ] <Debug> CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | `Handling DAG request` refers to that TiFlash starts to handle a Coprocessor request | | |
| [ 30 ] <Debug> CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | Handling DAG request, that is, TiFlash starts to handle a Coprocessor request | |
reference/tiflash/maintain.md
Outdated
|---------------|-------------------| | ||
| [ 23 ] <Information> KVStore: Start to persist [region 47, applied: term 6 index 10] | Data starts to be replicated (the number in the square brackets at the start of the log refers to the thread ID | | ||
| [ 30 ] <Debug> CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | `Handling DAG request` refers to that TiFlash starts to handle a Coprocessor request | | ||
| [ 30 ] <Debug> CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | `Handle DAG request done` refers to that TiFlash finishes a Coprocessor request | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| [ 30 ] <Debug> CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | `Handle DAG request done` refers to that TiFlash finishes a Coprocessor request | | |
| [ 30 ] <Debug> CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | Handling DAG request done, that is, TiFlash finishes handling a Coprocessor request | |
reference/tiflash/maintain.md
Outdated
LD_LIBRARY_PATH=./ ./tiflash version | ||
``` | ||
|
||
- Check the TiFlash version by referring to the TiFlash log. For the log path, see the `[logger]` part in [configure the `tiflash.toml` file](/reference/tiflash/configuration.md#configure-the-tiflashtoml-file). For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Check the TiFlash version by referring to the TiFlash log. For the log path, see the `[logger]` part in [configure the `tiflash.toml` file](/reference/tiflash/configuration.md#configure-the-tiflashtoml-file). For example: | |
- Check the TiFlash version by referring to the TiFlash log. For the log path, see the `[logger]` part in [the `tiflash.toml` file](/reference/tiflash/configuration.md#configure-the-tiflashtoml-file). For example: |
Delete configure
so the sentence is complete.
reference/tiflash/maintain.md
Outdated
alter table <db-name>.<table-name> set tiflash replica 0; | ||
``` | ||
|
||
2. To ensure that the TiFlash replicas of these tables are removed, see [View the Table Replication Progress](/reference/tiflash/use-tiflash.md#view-the-table-replication-progress). If you cannot view the replication progress of the related tables, it means that the replicas are removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. To ensure that the TiFlash replicas of these tables are removed, see [View the Table Replication Progress](/reference/tiflash/use-tiflash.md#view-the-table-replication-progress). If you cannot view the replication progress of the related tables, it means that the replicas are removed. | |
2. To ensure that the TiFlash replicas of these tables are removed, see [View the Table Replication Progress](/reference/tiflash/use-tiflash.md#view-the-table-replication-progress). If you cannot view the replication progress of the related tables, it means that the replicas are removed. |
The anchor is invalid. Please confirm again. (Be careful of the links!) @toutdesuite
reference/tiflash/maintain.md
Outdated
|
||
This is because TiFlash is in an abnormal state caused by configuration errors or environment issues. Take the following steps to identify the faulty component: | ||
|
||
1. Check whether PD enables the `Placement Rules` feature (to enable the feature, see the step 2 of [Add a TiFlash component in an existing TiDB Cluster](/reference/tiflash/deploy.md#add-a-TiFlash-component-in-an-existing-TiDB-cluster): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Check whether PD enables the `Placement Rules` feature (to enable the feature, see the step 2 of [Add a TiFlash component in an existing TiDB Cluster](/reference/tiflash/deploy.md#add-a-TiFlash-component-in-an-existing-TiDB-cluster): | |
1. Check whether PD enables the `Placement Rules` feature (to enable the feature, see the step 2 of [Add a TiFlash component in an existing TiDB Cluster](/reference/tiflash/deploy.md#add-a-TiFlash-component-in-an-existing-TiDB-cluster): |
This anchor is also wrong. Please read our link style guide carefully! @toutdesuite
@yikeke PTAL again |
/merge |
/run-all-tests |
* tiflash: add the maintain.md doc * remove dead link * modify anchor * modify anchor * modify anchor * address comments * address comments * address comments, esp anchor links * Update reference/tiflash/maintain.md * minor edits Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> Co-authored-by: yikeke <yikeke@pingcap.com>
cherry pick to release-3.1 in PR #2213 |
* tiflash: add the maintain.md doc * remove dead link * modify anchor * modify anchor * modify anchor * address comments * address comments * address comments, esp anchor links * Update reference/tiflash/maintain.md * minor edits Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> Co-authored-by: yikeke <yikeke@pingcap.com>
cherry pick to release-4.0 in PR #2214 |
* tiflash: add the maintain.md doc * remove dead link * modify anchor * modify anchor * modify anchor * address comments * address comments * address comments, esp anchor links * Update reference/tiflash/maintain.md * minor edits Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> Co-authored-by: yikeke <yikeke@pingcap.com> Co-authored-by: toutdesuite <guizhiluo2014@163.com> Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> Co-authored-by: yikeke <yikeke@pingcap.com>
* tiflash: add the maintain.md doc * remove dead link * modify anchor * modify anchor * modify anchor * address comments * address comments * address comments, esp anchor links * Update reference/tiflash/maintain.md * minor edits Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> Co-authored-by: yikeke <yikeke@pingcap.com> Co-authored-by: toutdesuite <guizhiluo2014@163.com> Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> Co-authored-by: yikeke <yikeke@pingcap.com>
What is changed, added or deleted? (Required)
add the maintain.md doc
Which TiDB version(s) do your changes apply to? (Required)
If you select two or more versions from above, to trigger the bot to cherry-pick this PR to your desired release version branch(es), you must add corresponding labels such as needs-cherry-pick-4.0, needs-cherry-pick-3.1, needs-cherry-pick-3.0, and needs-cherry-pick-2.1.
What is the related PR or file link(s)?