Synchronous Replication for Dual Data Centers
This document introduces how to configure synchronous replication for dual data centers.
In the scenario of dual data centers, one is the primary center and the other is the DR (data recovery) center. When a Region has an odd number of replicas, more replicas are placed in the primary center. When the DR center is down for more than a specified period of time, the asynchronous mode is used by default for the replication between two centers.
To use the synchronous mode, you can configure it in the PD configuration file or change the replication mode manually using pd-ctl.
Enable synchronous replication in the PD configuration file
The replication mode is controlled by PD. You can configure it in the PD configuration file when deploying a cluster. See the following example:
[replication-mode]
replication-mode = "dr-auto-sync"
[replication-mode.dr-auto-sync]
label-key = "zone"
primary = "z1"
dr = "z2"
primary-replicas = 2
dr-replicas = 1
wait-store-timeout = "1m"
wait-sync-timeout = "1m"
In the configuration above:
dr-auto-sync
is the mode to enable synchronous replication.- The label key
zone
is used to distinguish different data centers. - TiKV instances with the
"z1"
value are considered in the primary data center, and TiKV instances with"z2"
are in the DR data center. primary-replicas
is the number of replicas that should be placed in the primary data center.dr-replicas
is the number of replicas that should be placed in the DR data center.wait-store-timeout
is the time to wait before falling back to asynchronous replication.
To check the current replication state of the cluster, use the following URL:
% curl http://pd_ip:pd_port/pd/api/v1/replication_mode/status
{
"mode": "dr-auto-sync",
"dr-auto-sync": {
"label-key": "zone",
"state": "sync"
}
}
After the cluster state becomes sync
, it will not become async
unless the number of down instances is larger than the specified number of replicas in either data center. Once the cluster state becomes async
, PD requests TiKV to change the replication mode to asynchronous
and checks whether TiKV instances are recovered from time to time. When the number of down instances is smaller than the number of replicas in both data centers, the cluster enters the sync-recover
state, and then requests TiKV to change the replication mode to synchronous
. After all Regions become synchronous
, the cluster becomes sync
again.
Change the replication mode manually
You can use pd-ctl
to change a cluster from asynchronous
to synchronous
.
>> config set replication-mode dr-auto-sync
Or change back to asynchronous
:
>> config set replication-mode majority
You can also update the label key:
>> config set replication-mode dr-auto-sync label-key dc