Prerequisites for TiDB in Kubernetes
This document introduces the hardware and software prerequisites for deploying a TiDB cluster in Kubernetes.
Software version
Software Name | Version |
---|---|
Docker | Docker CE 18.09.6 |
Kubernetes | v1.12.5+ |
CentOS | 7.6 and kernel 3.10.0-957 or later |
The configuration of kernel parameters
Configuration Item | Value |
---|---|
net.core.somaxconn | 32768 |
vm.swappiness | 0 |
net.ipv4.tcp_syncookies | 0 |
net.ipv4.ip_forward | 1 |
fs.file-max | 1000000 |
fs.inotify.max_user_watches | 1048576 |
fs.inotify.max_user_instances | 1024 |
net.ipv4.conf.all.rp_filter | 1 |
net.ipv4.neigh.default.gc_thresh1 | 80000 |
net.ipv4.neigh.default.gc_thresh2 | 90000 |
net.ipv4.neigh.default.gc_thresh3 | 100000 |
net.bridge.bridge-nf-call-iptables | 1 |
net.bridge.bridge-nf-call-arptables | 1 |
net.bridge.bridge-nf-call-ip6tables | 1 |
When you set net.bridge.bridge-nf-call-*
parameters, and if your option reports an error, you can check whether this module is loaded by running the following command:
lsmod|grep br_netfilter
If this module is not loaded, run the following command to load it:
modprobe br_netfilter
You also need to disable swap on each deployed Kubernetes node by running:
swapoff -a
To check whether swap is disabled:
free -m
If the above command shows that the swap column is all 0
, then swap is disabled.
In addition, to permanently disable swaps, remove all the swap-related entries in /etc/fstab
.
After all above configurations are made, check whether you have configured SMP IRQ Affinity on the machine. This configuration is to assign the interrupt of each device to different CPUs to prevent all interrupts from being sent to the same CPU, avoiding potential performance bottleneck and taking advantage of multiple cores to increase cluster throughput. For the TiDB cluster, the rate at which the network card processes packages has a great impact on the throughput of the cluster.
Follow these steps to check whether you have configured SMP IRQ Affinity on the machine:
Execute the following command to check the interrupt of a network card:
cat /proc/interrupts|grep <iface-name>|awk '{print $1,$NF}'In the output result of the above command, the first column indicates the interrupt and the second column indicates the device name. If it is a multi-queue network card, the above command outputs information in multiple rows and each queue corresponds to an interrupt.
Execute either of the following commands to check this interrupt is assigned to which CPU.
cat /proc/irq/<ir_num>/smp_affinityThe above command outputs the hexadecimal value corresponding to the CPU serial number, and the output result is not so intuitive. For the detailed calculation method, refer to SMP IRQ Affinity.
cat /proc/irq/<ir_num>/smp_affinity_listThe above command outputs the decimal value corresponding to the CPU serial number. The result is more intuitive.
If all interrupts of a network card are assigned to different CPUs, the SMP IRQ Affinity is correctly configured on the machine and you do not need further operation.
If all interrupts are sent to the same CPU, configure SMP IRQ Affinity by the following steps:
For the scenario of multi-queue network card and multiple cores:
Method 1: Enable the irqbalance service. Use the following command to enable the service on CentOS 7:
systemctl start irqbalanceMethod 2: Disable irqbalance and customize the binding relationship between interrupts and CPUs. Refer to the set_irq_affinity.sh script for more details.
For the scenario of single-queue network card and multiple cores:
To configure SMP IRQ Affinity in this scenario, you can use RPS/RFS to simulate the Receive Side Scaling (RSS) feature of the network card at the software level.
Do not use the irqbalance service as described in Method 1. Instead, use the script provided in Method 2 to configure RPS. For the configuration of RFS, refer to RFS configuration.
Hardware and deployment requirements
64-bit generic hardware server platform in the Intel x86-64 architecture and 10 Gigabit NIC (network interface card), which are the same as the server requirements for deploying a TiDB cluster using binary. For details, refer to Hardware recommendations.
The server's disk, memory and CPU choices depend on the capacity planning of the cluster and the deployment topology. It is recommended to deploy three master nodes, three etcd nodes, and several worker nodes to ensure high availability of the online Kubernetes cluster.
Meanwhile, the master node often acts as a worker node (that is, load can also be scheduled to the master node) to make full use of resources. You can set reserved resources by kubelet to ensure that the system processes on the machine and the core processes of Kubernetes have sufficient resources to run under high workloads. This ensures the stability of the entire system.
The following text analyzes the deployment plan of three Kubernetes masters, three etcd and several worker nodes. To achieve a highly available deployment of multi-master nodes in Kubernetes, see Kubernetes official documentation.
Kubernetes requirements for system resources
It is required on each machine to have a relatively large SAS disk (at least 1T) to store the data directories of Docker and kubelet.
If you need to deploy a monitoring system for the Kubernetes cluster and store the monitoring data on the disk, consider preparing a large SAS disk for Prometheus and also for the log monitoring system. This is also to guarantee that the purchased machines are homogeneous. For this reason, it is recommended to prepare two large SAS disks for each machine.
It is recommended that the number of etcd nodes be consistent with that of the Kubernetes master nodes, and you store the etcd data on the SSD disk.
TiDB cluster's requirements for resources
The TiDB cluster consists of three components: PD, TiKV and TiDB. The following recommendations on capacity planning is based on a standard TiDB cluster, namely three PDs, three TiKVs and two TiDBs:
PD component: 2C 4GB. PD occupies relatively less resources and only a small portion of local disks.
TiKV component: An NVMe disk of 8C 32GB for each TiKV instance. To deploy multiple TiKV instances on one machine, you must reserve enough buffers when planning capacity.
TiDB component: 8C 32 GB capacity. Because the TiDB component does not occupy the disk space, you only need to consider the CPU and memory resources when planning. The following example assumes that the capacity is 8C 32 GB.
A case of planning TiDB clusters
This is an example of deploying five clusters (each cluster has 3 PDs, 3 TiKVs, and 2 TiDBs), where PD is configured as 2C 4GB, TiDB as 8C 32GB, and TiKV as 8C 32GB. There are seven Kubernetes nodes, three of which are both master and worker nodes, and the other four are purely worker nodes. The distribution of components on each node is as follows:
Each master node:
- 1 etcd (2C 4GB) + 2 PDs (2 * 2C 2 * 4GB) + 3 TiKVs (3 * 8C 3 * 32GB) + 1 TiDB (8C 32GB), totalling 38C 140GB
- Two SSD disks, one for etcd and one for two PD instances
- The RAID5-applied SAS disk used for Docker and kubelet
- Three NVMe disks for TiKV instances
Each worker node:
- 3 PDs (3 * 2C 3 * 4GB) + 2 TiKVs (2 * 8C 2 * 32GB) + 2 TiDBs (2 * 8C 2 * 32GB), totalling 38C 140GB
- One SSD disk for three PD instances
- The RAID5-applied SAS disk used for Docker and kubelet
- Two NVMe disks for TiKV instances
From the above analysis, a total of seven physical machines are required to support five sets of TiDB clusters. Three of the machines are master and worker nodes, and the remaining four are worker nodes. The configuration requirements for the machines are as follows:
- master and worker node: 48C 192GB, two SSD disks, one RAID5-applied SAS disk, three NVMe disks
- worker node: 48C 192GB, one block SSD disk, one RAID5-applied SAS disk, two NVMe disks
The above recommended configuration leaves plenty of available resources in addition to those taken by the components. If you want to add the monitoring and log components, use the same method to plan and purchase the type of machines with specific configurations.