Troubleshoot Log Backup

This document summarizes common problems during log backup and the solutions.

After restoring a downstream cluster using the br restore point command, data cannot be accessed from TiFlash. What should I do?

In v6.2.0, PITR does not support restoring the TiFlash replicas of a cluster. After restoring data, you need to execute the following statement to set the TiFlash replica of the schema or table.

ALTER TABLE table_name SET TIFLASH REPLICA @count;

In v6.3.0 and later versions, after PITR completes data restore, BR automatically executes the ALTER TABLE SET TIFLASH REPLICA DDL statement according to the number of TiFlash replicas in the upstream cluster at the corresponding time. You can check the TiFlash replica setting using the following SQL statement:

SELECT * FROM INFORMATION_SCHEMA.tiflash_replica;

What should I do if the status of a log backup task becomes ERROR?

During a log backup task, the task status becomes ERROR if it fails and cannot be recovered after retrying. The following is an example:

br log status --pd x.x.x.x:2379 ● Total 1 Tasks. > #1 < name: task1 status: ○ ERROR start: 2022-07-25 13:49:02.868 +0000 end: 2090-11-18 14:07:45.624 +0000 storage: s3://tmp/br-log-backup0ef49055-5198-4be3-beab-d382a2189efb/Log speed(est.): 0.00 ops/s checkpoint[global]: 2022-07-25 14:46:50.118 +0000; gap=11h31m29s error[store=1]: KV:LogBackup:RaftReq error-happen-at[store=1]: 2022-07-25 14:54:44.467 +0000; gap=11h23m35s error-message[store=1]: retry time exceeds: and error failed to get initial snapshot: failed to get the snapshot (region_id = 94812): Error during requesting raftstore: message: "read index not ready, reason can not read index due to merge, region 94812" read_index_not_ready { reason: "can not read index due to merge" region_id: 94812 }: failed to get initial snapshot: failed to get the snapshot (region_id = 94812): Error during requesting raftstore: message: "read index not ready, reason can not read index due to merge, region 94812" read_index_not_ready { reason: "can not read index due to merge" region_id: 94812 }: failed to get initial snapshot: failed to get the snapshot (region_id = 94812): Error during requesting raftstore: message: "read index not ready, reason can not read index due to merge, region 94812" read_index_not_ready { reason: "can not read index due to merge" region_id: 94812 }

To address this problem, check the error message for the cause and perform as instructed. After the problem is addressed, run the following command to resume the task:

br log resume --task-name=task1 --pd x.x.x.x:2379

After the backup task is resumed, you can check the status using br log status. The backup task continues when the task status becomes NORMAL.

● Total 1 Tasks. > #1 < name: task1 status: ● NORMAL start: 2022-07-25 13:49:02.868 +0000 end: 2090-11-18 14:07:45.624 +0000 storage: s3://tmp/br-log-backup0ef49055-5198-4be3-beab-d382a2189efb/Log speed(est.): 15509.75 ops/s checkpoint[global]: 2022-07-25 14:46:50.118 +0000; gap=6m28s

What should I do if the error message ErrBackupGCSafepointExceeded is returned when using the br log resume command to resume the suspended task?

Error: failed to check gc safePoint, checkpoint ts 433177834291200000: GC safepoint 433193092308795392 exceed TS 433177834291200000: [BR:Backup:ErrBackupGCSafepointExceeded]backup GC safepoint exceeded

After you pause a log backup task, to prevent the MVCC data from being garbage collected, the pausing task program sets the current checkpoint as the service safepoint automatically. This ensures that the MVCC data generated within 24 hours can remain. If the MVCC data of the backup checkpoint has been generated for more than 24 hours, the data of the checkpoint will be garbage collected, and the backup task is unable to resume.

To address this problem, delete the current task using br log stop, and then create a log backup task using br log start. At the same time, you can perform a full backup for subsequent PITR.