tiup cluster check
For a formal production environment, before the environment goes live, you need to perform a series of checks to ensure the clusters are in their best performance. To simplify the manual check steps, TiUP Cluster provides the check
command to check whether the hardware and software environments of the target machines of a specified cluster meet the requirements to work normally.
List of check items
Operating system version
Check the operating system distribution and version of the deployed machines. Currently, only CentOS 7 is supported for deployment. More system versions may be supported in later releases for compatibility improvement.
CPU EPOLLEXCLUSIVE
Check whether the CPU of the target machine supports EPOLLEXCLUSIVE.
numactl
Check whether numactl is installed on the target machine. If tied cores are configured on the target machine, you must install numactl.
System time
Check whether the system time of the target machine is synchronized. Compare the system time of the target machine with that of the central control machine, and report an error if the deviation exceeds a certain threshold (500 ms).
System time zone
Check whether the system time zone of the target machines is synchronized. Compare the time zone configuration of these machines and report an error if the time zone is inconsistent.
Time synchronization service
Check whether the time synchronization service is configured on the target machine. Namely, check whether ntpd is running.
Swap partitioning
Check whether swap partitioning is enabled on the target machine. It is recommended to disable swap partitioning.
Kernel parameters
Check the values of the following kernel parameters:
net.ipv4.tcp_tw_recycle
: 0net.ipv4.tcp_syncookies
: 0net.core.somaxconn
: 32768vm.swappiness
: 0vm.overcommit_memory
: 0 or 1fs.file-max
: 1000000
Transparent Huge Pages (THP)
Check whether THP is enabled on the target machine. It is recommended to disable THP.
System limits
Check the limit values in the /etc/security/limits.conf
file:
<deploy-user> soft nofile 1000000
<deploy-user> hard nofile 1000000
<deploy-user> soft stack 10240
<deploy-user>
is the user who deploys and runs the TiDB cluster, and the last column is the minimum value required for the system.
SELinux
Check whether SELinux is enabled. It is recommended to disable SELinux.
Firewall
Check whether the FirewallD service is enabled. It is recommended to either disable the FirewallD service or add permission rules for each service in the TiDB cluster.
irqbalance
Check whether the irqbalance service is enabled. It is recommended to enable the irqbalance service.
Disk mount options
Check the mount options for ext4 partitions. Make sure the mount options include the nodelalloc option and the noatime option.
Port usage
Check if the ports defined in the topology (including the auto-completion default ports) are already used by the processes on the target machine.
CPU core number
Check the CPU information of the target machine. For a production cluster, it is recommended that the number of the CPU logical core is greater than or equal to 16.
Memory size
Check the memory size of the target machine. For a production cluster, it is recommended that the total memory capacity is greater than or equal to 32GB.
Fio disk performance test
Use flexible I/O tester (fio) to test the performance of the disk where data_dir
is located, including the following three test items:
- fio_randread_write_latency
- fio_randread_write
- fio_randread
Syntax
tiup cluster check <topology.yml | cluster-name> [flags]
- If a cluster is not deployed yet, you need to pass the topology.yml file that is used to deploy the cluster. According to the content in this file, tiup-cluster connects to the corresponding machine to perform the check.
- If a cluster is already deployed, you can use the
<cluster-name>
as the check object. - If you want to check the scale-out YAML file for an existing cluster, you can use both
<scale-out.yml>
and<cluster-name>
as the check objects.
Options
--apply
- Attempts to automatically repair the failed check items. Currently, tiup-cluster only attempts to repair the following check items:
- SELinux
- firewall
- irqbalance
- kernel parameters
- System limits
- THP (Transparent Huge Pages)
- Data type:
BOOLEAN
- This option is disabled by default with the
false
value. To enable this option, add this option to the command, and either pass thetrue
value or do not pass any value.
--cluster
Indicates that the check is for a cluster that has been deployed.
Data type:
BOOLEAN
This option is disabled by default with the
false
value. To enable this option, add this option to the command, and either pass thetrue
value or do not pass any value.Command format:
tiup cluster check <topology.yml | cluster-name> --cluster [flags]
-N, --node
- Specifies the nodes to be checked. The value of this option is a comma-separated list of node IDs. You can get the node IDs from the first column of the cluster status table returned by the
tiup cluster display
command. - Data type:
STRINGS
- If this option is not specified in the command, all nodes are checked by default.
-R, --role
- Specifies the roles to be checked. The value of this option is a comma-separated list of node roles. You can get the roles of nodes from the second column of the cluster status table returned by the
tiup cluster display
command. - Data type:
STRINGS
- If this option is not specified in the command, all roles are checked by default.
--enable-cpu
- Enables the check of CPU core number.
- Data type:
BOOLEAN
- This option is disabled by default with the
false
value. To enable this option, add this option to the command, and either pass thetrue
value or do not pass any value.
--enable-disk
- Enables the fio disk performance test.
- Data type:
BOOLEAN
- This option is disabled by default with the
false
value. To enable this option, add this option to the command, and either pass thetrue
value or do not pass any value.
--enable-mem
- Enables the memory size check.
- Data type:
BOOLEAN
- This option is disabled by default with the
false
value. To enable this option, add this option to the command, and either pass thetrue
value or do not pass any value.
--u, --user
- Specifies the user name to connect to the target machine. The specified user needs to have the password-free sudo root privileges on the target machine.
- Data type:
STRING
- If this option is not specified in the command, the user who executes the command is used as the default value.
-i, --identity_file
- Specifies the key file to connect to the target machine.
- Data type:
STRING
- The option is enabled by default with
~/.ssh/id_rsa
(the default value) passed in.
-p, --password
- Logs in with a password when connecting to the target machine.
- If the
--cluster
option is added for a cluster, the password is the password of the user specified in the topology file when the cluster was deployed. - If the
--cluster
option is not added for a cluster, the password is the password of the user specified in the-u/--user
option.
- If the
- Data type:
BOOLEAN
- This option is disabled by default with the
false
value. To enable this option, add this option to the command, and either pass thetrue
value or do not pass any value.
-h, --help
- Prints the help information of the related commands.
- Data type:
BOOLEAN
- This option is disabled by default with the
false
value. To enable this option, add this option to the command, and either pass thetrue
value or do not pass any value.
Output
A table containing the following fields:
Node
: the target nodeCheck
: the check itemResult
: the check result (Pass, Warn, or Fail)Message
: the result description