Skip to content

Latest commit

 

History

History
71 lines (40 loc) · 5.36 KB

backup-and-restore-faq.md

File metadata and controls

71 lines (40 loc) · 5.36 KB
title summary aliases
Backup & Restore FAQ
Learn about Frequently Asked Questions (FAQ) and the solutions of BR.
/docs/dev/br/backup-and-restore-faq/

Backup & Restore FAQ

This document lists the frequently asked questions (FAQs) and the solutions about Backup & Restore (BR).

What should I do if the error message could not read local://...:download sst failed is returned during data restoration?

When you restore data, each node must have access to all backup files (SST files). By default, if local storage is used, you cannot restore data because the backup files are scattered among different nodes. Therefore, you have to copy the backup file of each TiKV node to the other TiKV nodes.

It is recommended to mount an NFS disk as a backup disk during backup. For details, see Back up a single table to a network disk.

How much does it affect the cluster during backup using BR?

When you use the oltp_read_only scenario of sysbench to back up to a disk (make sure the backup disk and the service disk are different) at full rate, the cluster QPS is decreased by 15%-25%. The impact on the cluster depends on the table schema.

To reduce the impact on the cluster, you can use the --ratelimit parameter to limit the backup rate.

Does BR back up system tables? During data restoration, do they raise conflict?

The system libraries (information_schema, performance_schema, mysql) are filtered out during full backup. For more details, refer to the Backup Principle.

Because these system libraries do not exist in the backup files, no conflict occurs among system tables during data restoration.

What should I do to handle the Permission denied error, even if I have tried to run BR using root in vain?

You need to confirm whether TiKV has access to the backup directory. To back up data, confirm whether TiKV has the write permission. To restore data, confirm whether it has the read permission.

During the backup operation, if the storage medium is the local disk or a network file system (NFS), make sure that the user to start BR and the user to start TiKV are consistent (if BR and TiKV are on different machines, the users' UIDs must be consistent). Otherwise, the Permission denied issue might occur.

Running BR with the root access might fail due to the disk permission, because the backup files (SST files) are saved by TiKV.

Note:

You might encounter the same problem during data restoration. When the SST files are read for the first time, the read permission is verified. The execution duration of DDL suggests that there might be a long interval between checking the permission and running BR. You might receive the error message Permission denied after waiting for a long time.

Therefore, It is recommended to check the permission before data restoration.

What should I do to handle the Io(Os...) error?

Almost all of these problems are system call errors that occur when TiKV writes data to the disk. You can check the mounting method and the file system of the backup directory, and try to back up data to another folder or another hard disk.

For example, you might encounter the Code: 22(invalid argument) error when backing up data to the network disk built by samba.

What should I do to handle the rpc error: code = Unavailable desc =... error occurred in BR?

This error might occur when the capacity of the cluster to restore (using BR) is insufficient. You can further confirm the cause by checking the monitoring metrics of this cluster or the TiKV log.

To handle this issue, you can try to scale out the cluster resources, reduce the concurrency during restore, and enable the RATE_LIMIT option.

Where are the backed up files stored when I use local storage?

When you use local storage, backupmeta is generated on the node where BR is running, and backup files are generated on the Leader nodes of each Region.

How about the size of the backup data? Are there replicas of the backup?

During data backup, backup files are generated on the Leader nodes of each Region. The size of the backup is equal to the data size, with no redundant replicas. Therefore, the total data size is approximately the total number of TiKV data divided by the number of replicas.

However, if you want to restore data from local storage, the number of replicas is equal to that of the TiKV nodes, because each TiKV must have access to all backup files.

What should I do when BR restores data to the upstream cluster of TiCDC/Drainer?

  • The data restored using BR cannot be replicated to the downstream. This is because BR directly imports SST files but the downstream cluster currently cannot obtain these files from the upstream.

  • Before v4.0.3, DDL jobs generated during the BR restore might cause unexpected DDL executions in TiCDC/Drainer. Therefore, if you need to perform restore on the upstream cluster of TiCDC/Drainer, add all tables restored using BR to the TiCDC/Drainer block list.

You can use filter.rules to configure the block list for TiCDC and use syncer.ignore-table to configure the block list for Drainer.