Enable TidbCluster Auto-scaling
Kubernetes provides Horizontal Pod Autoscaler, a native API based on CPU utilization. Based on Kubernetes, TiDB 4.0 has implemented an elastic scheduling mechanism. Correspondingly, in TiDB Operator 1.1 and later versions, you can enable the auto-scaling feature to enable elastic scheduling. This document introduces how to enable and use the auto-scaling feature of TidbCluster
.
Enable the auto-scaling feature
To turn this feature on, you need to enable some related configurations in TiDB Operator. The auto-scaling feature is disabled by default. Take the following steps to manually turn it on.
Edit the
values.yaml
file in TiDB Operator.Enable
AutoScaling
in thefeatures
option:features: - AutoScaling=trueEnable the
Operator Webhook
feature:admissionWebhook: create: true mutation: pods: trueFor more information about
Operator Webhook
, see Enable Admission Controller in TiDB Operator.Install or update TiDB Operator.
To install or update TiDB Operator, see Deploy TiDB Operator in Kubernetes.
Confirm the resource configuration of the target TiDB cluster.
Before using the auto-scaling feature on the target TiDB cluster, first you need to configure the CPU setting of the corresponding components. For example, you need to configure
spec.tikv.requests.cpu
in TiKV:spec: tikv: requests: cpu: "1" tidb: requests: cpu: "1"
TidbClusterAutoScaler
The TidbClusterAutoScaler
CR object is used to control the behavior of the auto-scaling in the TiDB cluster. If you have used Horizontal Pod Autoscaler, presumably you are familiar with the notion TidbClusterAutoScaler
. The following is an auto-scaling example in TiKV.
apiVersion: pingcap.com/v1alpha1
kind: TidbClusterAutoScaler
metadata:
name: auto-scaling-demo
spec:
cluster:
name: auto-scaling-demo
namespace: default
monitor:
name: auto-scaling-demo
namespace: default
tikv:
minReplicas: 3
maxReplicas: 4
metrics:
- type: "Resource"
resource:
name: "cpu"
target:
type: "Utilization"
averageUtilization: 80
The TiDB component can be configured using spec.tidb
. Currently, the auto-scaling API of TiDB is the same as that of TiKV.
In a TidbClusterAutoScaler
object, the cluster
attribute specifies the TiDB clusters to be auto-scaled. These clusters are marked by name
and namespace
. You need to provide the metrics collection and query service to TidbClusterAutoScaler
because it captures resource usage through the metrics collection component. The monitor
attribute refers to the TidbMonitor
object. For more information, see Deploy Monitoring and Alerts for a TiDB Cluster.
For the external Prometheus
other than TidbMonitor
, you can fill in the Host by configuring spec.metricsUrl
to specify the monitoring metrics collection service for the TiDB cluster. If you deploy the monitoring of the TiDB cluster using Helm
, take the following steps to specify spec.metricsUrl
.
apiVersion: pingcap.com/v1alpha1
kind: TidbClusterAutoScaler
metadata:
name: auto-scaling-demo
spec:
cluster:
name: auto-scaling-demo
namespace: default
metricsUrl: "http://${release_name}-prometheus.${namespace}.svc:9090"
......
Example
Run the following commands to quickly deploy a TiDB cluster with 3 PD instances, 3 TiKV instances, 2 TiDB instances, and the monitoring and the auto-scaling features.
kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.1.15/examples/auto-scale/tidb-cluster.yaml -n ${namespace}kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.1.15/examples/auto-scale/tidb-monitor.yaml -n ${namespace}kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.1.15/examples/auto-scale/tidb-cluster-auto-scaler.yaml -n ${namespace}After the TiDB cluster is created, expose the TiDB cluster service to the local machine by running the following command:
kubectl port-forward svc/auto-scaling-demo-tidb 4000:4000 &Copy the following content and paste it to the local
sysbench.config
file:mysql-host=127.0.0.1 mysql-port=4000 mysql-user=root mysql-password= mysql-db=test time=120 threads=20 report-interval=5 db-driver=mysqlPrepare data and perform the stress test against the auto-scaling feature using sysbench.
Copy the following content and paste it to the local
sysbench.config
file:mysql-host=127.0.0.1 mysql-port=4000 mysql-user=root mysql-password= mysql-db=test time=120 threads=20 report-interval=5 db-driver=mysqlPrepare data by running the following command:
sysbench --config-file=${path-to-file}/sysbench.config oltp_point_select --tables=1 --table-size=20000 prepareStart the stress test:
sysbench --config-file=${path-to-file}/sysbench.config oltp_point_select --tables=1 --table-size=20000 runThe command above will return the following result:
Initializing worker threads... Threads started! [ 5s ] thds: 20 tps: 37686.35 qps: 37686.35 (r/w/o: 37686.35/0.00/0.00) lat (ms,95%): 0.99 err/s: 0.00 reconn/s: 0.00 [ 10s ] thds: 20 tps: 38487.20 qps: 38487.20 (r/w/o: 38487.20/0.00/0.00) lat (ms,95%): 0.95 err/s: 0.00 reconn/s: 0.00Create a new terminal session and view the Pod changing status of the TiDB cluster by running the following command:
watch -n1 "kubectl -n ${namespace} get pod"The output is as follows:
auto-scaling-demo-discovery-fbd95b679-f4cb9 1/1 Running 0 17m auto-scaling-demo-monitor-6857c58564-ftkp4 3/3 Running 0 17m auto-scaling-demo-pd-0 1/1 Running 0 17m auto-scaling-demo-tidb-0 2/2 Running 0 15m auto-scaling-demo-tidb-1 2/2 Running 0 15m auto-scaling-demo-tikv-0 1/1 Running 0 15m auto-scaling-demo-tikv-1 1/1 Running 0 15m auto-scaling-demo-tikv-2 1/1 Running 0 15mView the changing status of Pods and the TPS and QPS of sysbench. When new Pods are created in TiKV and TiDB, the TPS and QPS of sysbench increase significantly.
After sysbench finishes the test, the newly created Pods in TiKV and TiDB disappear automatically.
Destroy the environment by running the following commands:
kubectl delete tidbcluster auto-scaling-demo -n ${namespace} kubectl delete tidbmonitor auto-scaling-demo -n ${namespace} kubectl delete tidbclusterautoscaler auto-scaling-demo -n ${namespace}
TidbClusterAutoScaler
configurations
Set the auto-scaling interval.
Compared with the stateless web service, a distributed database software is often sensitive to the instance auto-scaling. You need to make sure that there is a certain interval between each auto-scaling in case scaling operations are too frequent.
You can set the interval (in seconds) between each auto-scaling by configuring
spec.tikv.scaleInIntervalSeconds
andspec.tikv.ScaleOutIntervalSeconds
in TiKV. This also applies to TiDB.apiVersion: pingcap.com/v1alpha1 kind: TidbClusterAutoScaler metadata: name: auto-scaler spec: tidb: scaleInIntervalSeconds: 500 ScaleOutIntervalSeconds: 300 tikv: scaleInIntervalSeconds: 500 ScaleOutIntervalSeconds: 300Set the maximum value and the minimum value.
You can set the maximum value and the minimum value of each component in
TidbClusterAutoScaler
to control the scaling range ofTiDB
andTiKV
, which is similar to Horizontal Pod Autoscaler.apiVersion: pingcap.com/v1alpha1 kind: TidbClusterAutoScaler metadata: name: auto-scaling-demo spec: tikv: minReplicas: 3 maxReplicas: 4 tidb: minReplicas: 2 maxReplicas: 3Set the CPU auto-scaling configurations.
Currently,
TidbClusterAutoScaler
only supports CPU utilization based auto-scaling. The descriptive API is as follows.averageUtilization
refers to the threshold of CPU utilization. If the utilization exceeds 80%, the auto-scaling is triggered.apiVersion: pingcap.com/v1alpha1 kind: TidbClusterAutoScaler metadata: name: auto-scaling-demo spec: tikv: minReplicas: 3 maxReplicas: 4 metrics: - type: "Resource" resource: name: "cpu" target: type: "Utilization" averageUtilization: 80Set the time window configurations.
The CPU utilization based auto-scaling allows
TidbClusterAutoScaler
to get the CPU metrics ofTiDB
andTiKV
from the specified monitoring system. You can specify the time window of metrics collection.apiVersion: pingcap.com/v1alpha1 kind: TidbClusterAutoScaler metadata: name: basic tidb: metricsTimeDuration: "1m" metrics: - type: "Resource" resource: name: "cpu" target: type: "Utilization" averageUtilization: 60