Scale the TiDB Cluster Using TiDB Ansible
The capacity of a TiDB cluster can be increased or decreased without affecting the online services.
Assume that the topology is as follows:
Increase the capacity of a TiDB/TiKV node
For example, if you want to add two TiDB nodes (node101, node102) with the IP addresses 172.16.10.101 and 172.16.10.102, take the following steps:
Edit the
inventory.inifile and thehosts.inifile, and append the node information.Edit the
inventory.inifile:[tidb_servers] 172.16.10.4 172.16.10.5 172.16.10.101 172.16.10.102 [pd_servers] 172.16.10.1 172.16.10.2 172.16.10.3 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitored_servers] 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.4 172.16.10.5 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 172.16.10.101 172.16.10.102 [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3Now the topology is as follows:
Edit the
hosts.inifile:[servers] 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.4 172.16.10.5 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 172.16.10.101 172.16.10.102 [all:vars] username = tidb ntp_server = pool.ntp.org
Initialize the newly added node.
Configure the SSH mutual trust and sudo rules of the deployment machine on the central control machine:
ansible-playbook -i hosts.ini create_users.yml -l 172.16.10.101,172.16.10.102 -u root -kInstall the NTP service on the deployment target machine:
ansible-playbook -i hosts.ini deploy_ntp.yml -u tidb -bInitialize the node on the deployment target machine:
ansible-playbook bootstrap.yml -l 172.16.10.101,172.16.10.102
Deploy the newly added node:
ansible-playbook deploy.yml -l 172.16.10.101,172.16.10.102Start the newly added node:
ansible-playbook start.yml -l 172.16.10.101,172.16.10.102Update the Prometheus configuration and restart the cluster:
ansible-playbook rolling_update_monitor.yml --tags=prometheusMonitor the status of the entire cluster and the newly added node by opening a browser to access the monitoring platform:
http://172.16.10.3:3000.
You can use the same procedure to add a TiKV node. But to add a PD node, some configuration files need to be manually updated.
Increase the capacity of a PD node
For example, if you want to add a PD node (node103) with the IP address 172.16.10.103, take the following steps:
Edit the
inventory.inifile and append the node information to the end of the[pd_servers]group:[tidb_servers] 172.16.10.4 172.16.10.5 [pd_servers] 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.103 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitored_servers] 172.16.10.4 172.16.10.5 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.103 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3Now the topology is as follows:
Initialize the newly added node:
ansible-playbook bootstrap.yml -l 172.16.10.103Deploy the newly added node:
ansible-playbook deploy.yml -l 172.16.10.103Login the newly added PD node and edit the starting script:
{deploy_dir}/scripts/run_pd.shRemove the
--initial-cluster="xxxx" \configuration.Add
--join="http://172.16.10.1:2379" \. The IP address (172.16.10.1) can be any of the existing PD IP address in the cluster.Start the PD service in the newly added PD node:
{deploy_dir}/scripts/start_pd.shUse
pd-ctlto check whether the new node is added successfully:./pd-ctl -u "http://172.16.10.1:2379"
Start the monitoring service:
ansible-playbook start.yml -l 172.16.10.103Update the cluster configuration:
ansible-playbook deploy.ymlRestart Prometheus, and enable the monitoring of PD nodes used for increasing the capacity:
ansible-playbook stop.yml --tags=prometheus ansible-playbook start.yml --tags=prometheusMonitor the status of the entire cluster and the newly added node by opening a browser to access the monitoring platform:
http://172.16.10.3:3000.
Decrease the capacity of a TiDB node
For example, if you want to remove a TiDB node (node5) with the IP address 172.16.10.5, take the following steps:
Stop all services on node5:
ansible-playbook stop.yml -l 172.16.10.5Edit the
inventory.inifile and remove the node information:[tidb_servers] 172.16.10.4 #172.16.10.5 # the removed node [pd_servers] 172.16.10.1 172.16.10.2 172.16.10.3 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitored_servers] 172.16.10.4 #172.16.10.5 # the removed node 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3Now the topology is as follows:
Update the Prometheus configuration and restart the cluster:
ansible-playbook rolling_update_monitor.yml --tags=prometheusMonitor the status of the entire cluster by opening a browser to access the monitoring platform:
http://172.16.10.3:3000.
Decrease the capacity of a TiKV node
For example, if you want to remove a TiKV node (node9) with the IP address 172.16.10.9, take the following steps:
Remove the node from the cluster using
pd-ctl:View the store ID of node9:
./pd-ctl -u "http://172.16.10.1:2379" -d storeRemove node9 from the cluster, assuming that the store ID is 10:
./pd-ctl -u "http://172.16.10.1:2379" -d store delete 10
Use
pd-ctlto check whether the node is successfully removed:./pd-ctl -u "http://172.16.10.1:2379" -d store 10After the node is successfully removed, stop the services on node9:
ansible-playbook stop.yml -l 172.16.10.9Edit the
inventory.inifile and remove the node information:[tidb_servers] 172.16.10.4 172.16.10.5 [pd_servers] 172.16.10.1 172.16.10.2 172.16.10.3 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 #172.16.10.9 # the removed node [monitored_servers] 172.16.10.4 172.16.10.5 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.6 172.16.10.7 172.16.10.8 #172.16.10.9 # the removed node [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3Now the topology is as follows:
Update the Prometheus configuration and restart the cluster:
ansible-playbook rolling_update_monitor.yml --tags=prometheusMonitor the status of the entire cluster by opening a browser to access the monitoring platform:
http://172.16.10.3:3000.
Decrease the capacity of a PD node
For example, if you want to remove a PD node (node2) with the IP address 172.16.10.2, take the following steps:
Remove the node from the cluster using
pd-ctl:View the name of node2:
./pd-ctl -u "http://172.16.10.1:2379" -d memberRemove node2 from the cluster, assuming that the name is pd2:
./pd-ctl -u "http://172.16.10.1:2379" -d member delete name pd2
Use Grafana or
pd-ctlto check whether the node is successfully removed:./pd-ctl -u "http://172.16.10.1:2379" -d memberAfter the node is successfully removed, stop the services on node2:
ansible-playbook stop.yml -l 172.16.10.2Edit the
inventory.inifile and remove the node information:[tidb_servers] 172.16.10.4 172.16.10.5 [pd_servers] 172.16.10.1 #172.16.10.2 # the removed node 172.16.10.3 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitored_servers] 172.16.10.4 172.16.10.5 172.16.10.1 #172.16.10.2 # the removed node 172.16.10.3 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3Now the topology is as follows:
Update the cluster configuration:
ansible-playbook deploy.ymlRestart Prometheus, and disable the monitoring of PD nodes used for increasing the capacity:
ansible-playbook stop.yml --tags=prometheus ansible-playbook start.yml --tags=prometheusTo monitor the status of the entire cluster, open a browser to access the monitoring platform:
http://172.16.10.3:3000.