Backup and Restore
This document describes how to back up and restore the data of a TiDB cluster in Kubernetes.
TiDB in Kubernetes supports two kinds of backup strategies:
- Full backup (scheduled or ad-hoc): use
mydumperto take a logical backup of the TiDB cluster. - Incremental backup: use
TiDB Binlogto replicate data in the TiDB cluster to another database or take a real-time backup of the data.
Currently, TiDB in Kubernetes only supports automatic restoration for full backup taken by mydumper. Restoring the incremental backup data by TiDB Binlog requires manual operations.
Full backup
Full backup uses mydumper to take a logical backup of a TiDB cluster. The backup task creates a PVC (PersistentVolumeClaim) to store data.
In the default configuration, the backup uses PV (Persistent Volume) to store backup data. You can also store the data in Google Cloud Storage buckets, Ceph Object Storage or Amazon S3 by changing the configuration. In this case, the backup data is temporarily stored in the PV before it is uploaded to object storage. Refer to TiDB cluster backup configuration for all configuration options you have.
You can either set up a scheduled full backup job or take a full backup in an ad-hoc manner.
Scheduled full backup
Scheduled full backup is a task created alongside the TiDB cluster, and it runs periodically like crontab.
To configure a scheduled full backup, modify the scheduledBackup section in the values.yaml file of the TiDB cluster:
Set
scheduledBackup.createtotrue.Set
scheduledBackup.storageClassNameto thestorageClassof the PV that stores the backup data.Configure
scheduledBackup.schedulein the Cron format to define the scheduling.Create a Kubernetes Secret containing the username and password (the user must have the privileges to back up the data). Meanwhile, set
scheduledBackup.secretNameto the name of the createdSecret(default tobackup-secret):kubectl create secret generic backup-secret -n <namespace> --from-literal=user=<user> --from-literal=password=<password>Create a new TiDB cluster with the scheduled full backup task by running
helm install, or enable the scheduled full backup for the existing cluster byhelm upgrade:helm upgrade <release_name> pingcap/tidb-cluster -f values.yaml --version=<tidb-operator-version>
Ad-hoc full backup
Ad-hoc full backup is encapsulated in a helm chart - pingcap/tidb-backup. According to the mode configuration in the values.yaml file, this chart can perform either full backup or data restoration. The restore section covers how to restore the backup data.
Follow the steps below to perform an ad-hoc full backup task:
Modify the
values.yamlfile:- Set
clusterNameto the target TiDB cluster name. - Set
modetobackup. - Set
storage.classNameto thestorageClassof the PV that stores the backup data. - Adjust the
storage.sizeaccording to your database size.
- Set
Create a Kubernetes Secret containing the username and password (the user must have the privileges to back up the data). Meanwhile, set
secretNamein thevalues.yamlfile to the name of the createdSecret(default tobackup-secret):kubectl create secret generic backup-secret -n <namespace> --from-literal=user=<user> --from-literal=password=<password>Run the following command to perform an ad-hoc backup task:
helm install pingcap/tidb-backup --name=<backup-name> --namespace=<namespace> -f values.yaml --version=<tidb-operator-version>
View backups
For backups stored in PV, you can view them by using the following command:
kubectl get pvc -n <namespace> -l app.kubernetes.io/component=backup,pingcap.com/backup-cluster-name=<cluster-name>
If you store your backup data in Google Cloud Storage, Ceph Object Storage or Amazon S3, you can view the backups by using the GUI or CLI tools provided by these storage providers.
Restore
The pingcap/tidb-backup helm chart helps restore a TiDB cluster using backup data. Follow the steps below to restore:
Modify the
values.yamlfile:- Set
clusterNameto the target TiDB cluster name. - Set
modetorestore. - Set
nameto the name of the backup you want to restore (refer to view backups to view available backups). If the backup is stored in Google Cloud Storage, Ceph Object Storage or Amazon S3, you must configure the corresponding sections and make sure that the same configurations are applied as you perform the full backup.
- Set
Create a Kubernetes Secret containing the user and password (the user must have the privileges to back up the data). Meanwhile, set
secretNamein thevalues.yamlfile to the name of the createdSecret(default tobackup-secret; skip this if you have already created one when you perform full backup):kubectl create secret generic backup-secret -n <namespace> --from-literal=user=<user> --from-literal=password=<password>Restore the backup:
helm install pingcap/tidb-backup --namespace=<namespace> --name=<restore-name> -f values.yaml --version=<tidb-operator-version>
Incremental backup
Incremental backup uses TiDB Binlog to collect binlog data from TiDB and provide near real-time backup and replication to downstream platforms.
For the detailed guide of maintaining TiDB Binlog in Kubernetes, refer to TiDB Binlog.
Scale in Pump
To scale in Pump, for each Pump node, make the node offline and then run the helm upgrade command to delete the corresponding Pump Pod.
Make a Pump node offline from the TiDB cluster
Suppose there are 3 Pump nodes, and you want to get the third node offline and modify
<ordinal-id>to2, run the following command (<version>is the current version of TiDB).kubectl run offline-pump-<ordinal-id> --image=pingcap/tidb-binlog:<version> --namespace=<namespace> --restart=OnFailure -- /binlogctl -pd-urls=http://<release-name>-pd:2379 -cmd offline-pump -node-id <release-name>-pump-<ordinal-id>:8250Then, check the log output of Pump. If Pump outputs
pump offline, please delete my pod, the state of the Pump node is successfully switched tooffline.kubectl logs -f -n <namespace> <release-name>-pump-<ordinal-id>Delete the corresponding Pump Pod
Modify
binlog.pump.replicasin thevalues.yamlfile to2and then run the following command to delete the Pump Pod.helm upgrade <release-name> pingcap/tidb-cluster -f values.yaml --version=<chart-version>