TiDB Backup & Restore Overview

Based on the Raft protocol and a reasonable deployment topology, TiDB realizes high availability of clusters. When a few nodes in the cluster fail, the cluster can still be available. On this basis, to further ensure data safety, TiDB provides the Backup & Restore (BR) feature as the last resort to recover data from natural disasters and misoperations.

BR satisfies the following requirements:

  • Back up cluster data to a disaster recovery (DR) system with an RPO as short as 5 minutes, reducing data loss in disaster scenarios.
  • Handle the cases of misoperations from applications by rolling back data to a time point before the error event.
  • Perform history data auditing to meet the requirements of judicial supervision.
  • Clone the production environment, which is convenient for troubleshooting, performance tuning, and simulation testing.

Before you use

This section describes the prerequisites for using TiDB backup and restore, including restrictions, usage tips and compatibility issues.

Restrictions

  • PITR only supports restoring data to an empty cluster.
  • PITR only supports cluster-level restore and does not support database-level or table-level restore.
  • PITR does not support restoring the data of user tables or privilege tables from system tables.
  • BR does not support running multiple backup tasks on a cluster at the same time.
  • When a PITR is running, you cannot run a log backup task or use TiCDC to replicate data to a downstream cluster.

Some tips

Snapshot backup:

  • It is recommended that you perform the backup operation during off-peak hours to minimize the impact on applications.
  • It is recommended that you execute multiple backup or restore tasks one by one. Running multiple backup tasks in parallel leads to low performance. Worse still, a lack of collaboration between multiple tasks might result in task failures and affect cluster performance.

Snapshot restore:

  • BR uses resources of the target cluster as much as possible. Therefore, it is recommended that you restore data to a new cluster or an offline cluster. Avoid restoring data to a production cluster. Otherwise, your application will be affected inevitably.

Backup storage and network configuration:

  • It is recommended that you store backup data to a storage system that is compatible with Amazon S3, GCS, or Azure Blob Storage.
  • You need to ensure that BR, TiKV, and the backup storage system have enough network bandwidth, and that the backup storage system can provide sufficient read and write performance (IOPS). Otherwise, they might become a performance bottleneck during backup and restore.

Use backup and restore

The way to use BR varies with the deployment method of TiDB. This document introduces how to use the br command-line tool to back up and restore TiDB cluster data in an on-premise deployment.

For information about how to use this feature in other deployment scenarios, see the following documents:

BR features

TiDB BR provides the following features:

  • Back up cluster data: You can back up full data (full backup) of the cluster at a certain time point, or back up the data changes in TiDB (log backup, in which log means KV changes in TiKV).

  • Restore backup data:

    • You can restore a full backup or specific databases or tables in a full backup.
    • Based on backup data (full backup and log backup), you can restore the target cluster to any time point of the backup cluster. This type of restore is called point-in-time recovery, or PITR for short.

Back up cluster data

Full backup backs up all data of a cluster at a specific time point. TiDB supports the following way of full backup:

  • Back up cluster snapshots: A snapshot of a TiDB cluster contains transactionally consistent data at a specific time. For details, see Snapshot backup.

Full backup occupies much storage space and contains only cluster data at a specific time point. If you want to choose the restore point as required, that is, to perform point-in-time recovery (PITR), you can use the following two ways of backup at the same time:

  • Start log backup. After log backup is started, the task keeps running on all TiKV nodes and backs up TiDB incremental data in small batches to the specified storage periodically.
  • Perform snapshot backup regularly. Back up the full cluster data to the backup storage, for example, perform cluster snapshot backup at 0:00 AM every day.

Backup performance and impact on TiDB clusters

  • When CPU and I/O resources are sufficient in the cluster, the snapshot backup has a limited impact on the TiDB cluster, generally staying below 20%. With appropriate configuration of the TiDB cluster, this impact can be further minimized to 10% or even less. When CPU and I/O resources are insufficient, you can adjust the TiKV configuration item backup.num-threads to change the number of worker threads used by the backup task to reduce the impact of the backup task on the TiDB cluster. The backup speed of a TiKV node is scalable and ranges from 50 MB/s to 100 MB/s. For more information, see Backup performance and impact.
  • When there are only log backup tasks, the impact on the cluster is about 5%. Log backup flushes all the changes generated after the last refresh every 3-5 minutes to the backup storage, which can achieve a Recovery Point Objective (RPO) as short as five minutes.

Restore backup data

Corresponding to the backup features, you can perform two types of restore: full restore and PITR.

  • Restore a full backup

    • Restore cluster snapshot backup: You can restore snapshot backup data to an empty cluster or a cluster that does not have data conflicts (with the same schema or tables). For details, see Restore snapshot backup. In addition, you can restore specific databases or tables from the backup data and filter out unwanted data. For details, see Restore specific databases or tables from backup data.
  • Restore data to any point in time (PITR)

    • By running the br restore point command, you can restore the latest snapshot backup data before recovery time point and log backup data to a specified time. BR automatically determines the restore scope, accesses backup data, and restores data to the target cluster in turn.

Restore performance and impact on TiDB clusters

  • Data restore is performed at a scalable speed. Generally, the speed is 100 MiB/s per TiKV node. br only supports restoring data to a new cluster and uses the resources of the target cluster as much as possible. For more details, see Restore performance and impact.
  • On each TiKV node, PITR can restore log data at 30 GiB/h. For more details, see PITR performance and impact.

Backup storage

TiDB supports backing up data to Amazon S3, Google Cloud Storage (GCS), Azure Blob Storage, NFS, and other S3-compatible file storage services. For details, see the following documents:

Compatibility

Compatibility with other features

Backup and restore might go wrong when some TiDB features are enabled or disabled. If these features are not consistently enabled or disabled during backup and restore, compatibility issues might occur.

FeatureIssueSolution
GBK charsetBR of versions earlier than v5.4.0 does not support restoring charset=GBK tables. No version of BR supports recovering charset=GBK tables to TiDB clusters earlier than v5.4.0.
Clustered index#565Make sure that the value of the tidb_enable_clustered_index global variable during restore is consistent with that during backup. Otherwise, data inconsistency might occur, such as default not found error and inconsistent data index.
New collation#352Make sure that the value of the new_collations_enabled_on_first_bootstrap variable during restore is consistent with that during backup. Otherwise, inconsistent data index might occur and checksum might fail to pass. For more information, see FAQ - Why does BR report new_collations_enabled_on_first_bootstrap mismatch?.
Global temporary tablesMake sure that you are using v5.3.0 or a later version of BR to back up and restore data. Otherwise, an error occurs in the definition of the backed global temporary tables.
TiDB Lightning Physical ImportIf the upstream database uses the physical import mode of TiDB Lightning, data cannot be backed up in log backup. It is recommended to perform a full backup after the data import. For more information, see When the upstream database imports data using TiDB Lightning in the physical import mode, the log backup feature becomes unavailable. Why?.

Version compatibility

Before performing backup and restore, BR compares the TiDB cluster version with its own and checks their compatibility. If the versions are incompatible, BR reports an error and exits. To forcibly skip the version check, you can set --check-requirements=false. Note that skipping the version check might introduce incompatibility in data.

Starting from v7.0.0, TiDB gradually supports performing backup and restore operations through SQL statements. Therefore, it is strongly recommended to use the BR tool of the same major version as the TiDB cluster when backing up and restoring cluster data, and avoid performing data backup and restore operations across major versions. This helps ensure smooth execution of restore operations and data consistency.

The compatibility information for BR before TiDB v6.6.0 is as follows:

Backup version (vertical) \ Restore version (horizontal)Restore to TiDB v6.0Restore to TiDB v6.1Restore to TiDB v6.2Restore to TiDB v6.3, v6.4, or v6.5Restore to TiDB v6.6
TiDB v6.0, v6.1, v6.2, v6.3, v6.4, or v6.5 snapshot backupCompatible (known issue #36379: if backup data contains an empty schema, BR might report an error.)CompatibleCompatibleCompatibleCompatible (BR must be v6.6)
TiDB v6.3, v6.4, v6.5, or v6.6 log backupIncompatibleIncompatibleIncompatibleCompatibleCompatible

See also