Scale the TiDB Cluster Using TiDB Ansible
The capacity of a TiDB cluster can be increased or decreased without affecting the online services.
Assume that the topology is as follows:
Name | Host IP | Services |
---|---|---|
node1 | 172.16.10.1 | PD1 |
node2 | 172.16.10.2 | PD2 |
node3 | 172.16.10.3 | PD3, Monitor |
node4 | 172.16.10.4 | TiDB1 |
node5 | 172.16.10.5 | TiDB2 |
node6 | 172.16.10.6 | TiKV1 |
node7 | 172.16.10.7 | TiKV2 |
node8 | 172.16.10.8 | TiKV3 |
node9 | 172.16.10.9 | TiKV4 |
Increase the capacity of a TiDB/TiKV node
For example, if you want to add two TiDB nodes (node101, node102) with the IP addresses 172.16.10.101
and 172.16.10.102
, take the following steps:
Edit the
inventory.ini
file and thehosts.ini
file, and append the node information.Edit the
inventory.ini
file:[tidb_servers] 172.16.10.4 172.16.10.5 172.16.10.101 172.16.10.102 [pd_servers] 172.16.10.1 172.16.10.2 172.16.10.3 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitored_servers] 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.4 172.16.10.5 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 172.16.10.101 172.16.10.102 [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3Now the topology is as follows:
Name Host IP Services node1 172.16.10.1 PD1 node2 172.16.10.2 PD2 node3 172.16.10.3 PD3, Monitor node4 172.16.10.4 TiDB1 node5 172.16.10.5 TiDB2 node101 172.16.10.101 TiDB3 node102 172.16.10.102 TiDB4 node6 172.16.10.6 TiKV1 node7 172.16.10.7 TiKV2 node8 172.16.10.8 TiKV3 node9 172.16.10.9 TiKV4 Edit the
hosts.ini
file:[servers] 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.4 172.16.10.5 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 172.16.10.101 172.16.10.102 [all:vars] username = tidb ntp_server = pool.ntp.org
Initialize the newly added node.
Configure the SSH mutual trust and sudo rules of the target machine on the control machine:
ansible-playbook -i hosts.ini create_users.yml -l 172.16.10.101,172.16.10.102 -u root -kInstall the NTP service on the target machine:
ansible-playbook -i hosts.ini deploy_ntp.yml -u tidb -bInitialize the node on the target machine:
ansible-playbook bootstrap.yml -l 172.16.10.101,172.16.10.102
Deploy the newly added node:
ansible-playbook deploy.yml -l 172.16.10.101,172.16.10.102Start the newly added node:
ansible-playbook start.yml -l 172.16.10.101,172.16.10.102Update the Prometheus configuration and restart the cluster:
ansible-playbook rolling_update_monitor.yml --tags=prometheusMonitor the status of the entire cluster and the newly added node by opening a browser to access the monitoring platform:
http://172.16.10.3:3000
.
You can use the same procedure to add a TiKV node. But to add a PD node, some configuration files need to be manually updated.
Increase the capacity of a PD node
For example, if you want to add a PD node (node103) with the IP address 172.16.10.103
, take the following steps:
Edit the
inventory.ini
file and append the node information to the end of the[pd_servers]
group:[tidb_servers] 172.16.10.4 172.16.10.5 [pd_servers] 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.103 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitored_servers] 172.16.10.4 172.16.10.5 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.103 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3Now the topology is as follows:
Name Host IP Services node1 172.16.10.1 PD1 node2 172.16.10.2 PD2 node3 172.16.10.3 PD3, Monitor node103 172.16.10.103 PD4 node4 172.16.10.4 TiDB1 node5 172.16.10.5 TiDB2 node6 172.16.10.6 TiKV1 node7 172.16.10.7 TiKV2 node8 172.16.10.8 TiKV3 node9 172.16.10.9 TiKV4 Initialize the newly added node:
ansible-playbook bootstrap.yml -l 172.16.10.103Deploy the newly added node:
ansible-playbook deploy.yml -l 172.16.10.103Login the newly added PD node and edit the starting script:
{deploy_dir}/scripts/run_pd.shRemove the
--initial-cluster="xxxx" \
configuration.Add
--join="http://172.16.10.1:2379" \
. The IP address (172.16.10.1
) can be any of the existing PD IP address in the cluster.Start the PD service in the newly added PD node:
{deploy_dir}/scripts/start_pd.shUse
pd-ctl
to check whether the new node is added successfully:./pd-ctl -u "http://172.16.10.1:2379"
Start the monitoring service:
ansible-playbook start.yml -l 172.16.10.103Update the cluster configuration:
ansible-playbook deploy.ymlRestart Prometheus, and enable the monitoring of PD nodes used for increasing the capacity:
ansible-playbook stop.yml --tags=prometheus ansible-playbook start.yml --tags=prometheusMonitor the status of the entire cluster and the newly added node by opening a browser to access the monitoring platform:
http://172.16.10.3:3000
.
Decrease the capacity of a TiDB node
For example, if you want to remove a TiDB node (node5) with the IP address 172.16.10.5
, take the following steps:
Stop all services on node5:
ansible-playbook stop.yml -l 172.16.10.5Edit the
inventory.ini
file and remove the node information:[tidb_servers] 172.16.10.4 #172.16.10.5 # the removed node [pd_servers] 172.16.10.1 172.16.10.2 172.16.10.3 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitored_servers] 172.16.10.4 #172.16.10.5 # the removed node 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3Now the topology is as follows:
Name Host IP Services node1 172.16.10.1 PD1 node2 172.16.10.2 PD2 node3 172.16.10.3 PD3, Monitor node4 172.16.10.4 TiDB1 node5 172.16.10.5 TiDB2 removed node6 172.16.10.6 TiKV1 node7 172.16.10.7 TiKV2 node8 172.16.10.8 TiKV3 node9 172.16.10.9 TiKV4 Update the Prometheus configuration and restart the cluster:
ansible-playbook rolling_update_monitor.yml --tags=prometheusMonitor the status of the entire cluster by opening a browser to access the monitoring platform:
http://172.16.10.3:3000
.
Decrease the capacity of a TiKV node
For example, if you want to remove a TiKV node (node9) with the IP address 172.16.10.9
, take the following steps:
Remove the node from the cluster using
pd-ctl
:View the store ID of node9:
./pd-ctl -u "http://172.16.10.1:2379" -d storeRemove node9 from the cluster, assuming that the store ID is 10:
./pd-ctl -u "http://172.16.10.1:2379" -d store delete 10
Use
pd-ctl
to check whether the node is successfully removed:./pd-ctl -u "http://172.16.10.1:2379" -d store 10After the node is successfully removed, stop the services on node9:
ansible-playbook stop.yml -l 172.16.10.9Edit the
inventory.ini
file and remove the node information:[tidb_servers] 172.16.10.4 172.16.10.5 [pd_servers] 172.16.10.1 172.16.10.2 172.16.10.3 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 #172.16.10.9 # the removed node [monitored_servers] 172.16.10.4 172.16.10.5 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.6 172.16.10.7 172.16.10.8 #172.16.10.9 # the removed node [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3Now the topology is as follows:
Name Host IP Services node1 172.16.10.1 PD1 node2 172.16.10.2 PD2 node3 172.16.10.3 PD3, Monitor node4 172.16.10.4 TiDB1 node5 172.16.10.5 TiDB2 node6 172.16.10.6 TiKV1 node7 172.16.10.7 TiKV2 node8 172.16.10.8 TiKV3 node9 172.16.10.9 TiKV4 removed Update the Prometheus configuration and restart the cluster:
ansible-playbook rolling_update_monitor.yml --tags=prometheusMonitor the status of the entire cluster by opening a browser to access the monitoring platform:
http://172.16.10.3:3000
.
Decrease the capacity of a PD node
For example, if you want to remove a PD node (node2) with the IP address 172.16.10.2
, take the following steps:
Remove the node from the cluster using
pd-ctl
:View the name of node2:
./pd-ctl -u "http://172.16.10.1:2379" -d memberRemove node2 from the cluster, assuming that the name is pd2:
./pd-ctl -u "http://172.16.10.1:2379" -d member delete name pd2
Use Grafana or
pd-ctl
to check whether the node is successfully removed:./pd-ctl -u "http://172.16.10.1:2379" -d memberAfter the node is successfully removed, stop the services on node2:
ansible-playbook stop.yml -l 172.16.10.2Edit the
inventory.ini
file and remove the node information:[tidb_servers] 172.16.10.4 172.16.10.5 [pd_servers] 172.16.10.1 #172.16.10.2 # the removed node 172.16.10.3 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitored_servers] 172.16.10.4 172.16.10.5 172.16.10.1 #172.16.10.2 # the removed node 172.16.10.3 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3Now the topology is as follows:
Name Host IP Services node1 172.16.10.1 PD1 node2 172.16.10.2 PD2 removed node3 172.16.10.3 PD3, Monitor node4 172.16.10.4 TiDB1 node5 172.16.10.5 TiDB2 node6 172.16.10.6 TiKV1 node7 172.16.10.7 TiKV2 node8 172.16.10.8 TiKV3 node9 172.16.10.9 TiKV4 Update the cluster configuration:
ansible-playbook deploy.ymlRestart Prometheus, and disable the monitoring of PD nodes used for increasing the capacity:
ansible-playbook stop.yml --tags=prometheus ansible-playbook start.yml --tags=prometheusTo monitor the status of the entire cluster, open a browser to access the monitoring platform:
http://172.16.10.3:3000
.