TiDB 3.0 GA Release Notes
Release date: June 28, 2019
TiDB version: 3.0.0
TiDB Ansible version: 3.0.0
Overview
On June 28, 2019, TiDB 3.0 GA is released. The corresponding TiDB Ansible version is 3.0.0. Compared with TiDB 2.1, this release has greatly improved in the following aspects:
- Stability. TiDB 3.0 has demonstrated long-term stability for large-scale clusters with up to 150+ nodes and 300+ TB of storage.
- Usability. TiDB 3.0 has multi-facet improvements in usability, including standardized slow query logs, well-developed log file specification, and new features such as
EXPLAIN ANALYZE
and SQL Trace to save operation costs for users. - Performance. The performance of TiDB 3.0 is 4.5 times greater than TiDB 2.1 in TPC-C benchmarks, and over 1.5 times in Sysbench benchmarks. Thanks to the support for Views, TPC-H 50G Q15 can now run normally.
- New features including Window Functions, Views (Experimental), partitioned tables, the plugin framework, pessimistic locking (Experimental), and
SQL Plan Management
.
TiDB
- New Features
- Support Window Functions; compatible with all window functions in MySQL 8.0, including
NTILE
,LEAD
,LAG
,PERCENT_RANK
,NTH_VALUE
,CUME_DIST
,FIRST_VALUE
,LAST_VALUE
,RANK
,DENSE_RANK
, andROW_NUMBER
- Support Views (Experimental)
- Improve Table Partition
- Support Range Partition
- Support Hash Partition
- Add the plug-in framework, supporting plugins such as IP Whitelist (Enterprise) and Audit Log (Enterprise).
- Support the SQL Plan Management function to create SQL execution plan binding to ensure query stability (Experimental)
- Support Window Functions; compatible with all window functions in MySQL 8.0, including
- SQL Optimizer
- Optimize the
NOT EXISTS
subquery and convert it toAnti Semi Join
to improve performance - Optimize the constant propagation on the
Outer Join
, and add the optimization rule ofOuter Join
elimination to reduce non-effective computations and improve performance - Optimize the
IN
subquery to executeInner Join
after aggregation to improve performance - Optimize
Index Join
to adapt to more scenarios - Improve the Partition Pruning optimization rule of Range Partition
- Optimize the query logic for
_tidb_rowid
to avoid full table scan and improve performance - Match more prefix columns of the indexes when extracting access conditions of composite indexes if there are relevant columns in the filter to improve performance
- Improve the accuracy of cost estimates by using order correlation between columns
- Optimize
Join Order
based on the greedy strategy and the dynamic programming algorithm to speed up the join operation of multiple tables - Support Skyline Pruning, with some rules to prevent the execution plan from relying too heavily on statistics to improve query stability
- Improve the accuracy of row count estimation for single-column indexes with NULL values
- Support
FAST ANALYZE
that randomly samples in each Region to avoid full table scan and improve performance with statistics collection - Support the incremental Analyze operation on monotonically increasing index columns to improve performance with statistics collection
- Support using subqueries in the
DO
statement - Support using
Index Join
in transactions - Optimize
prepare
/execute
to support DDL statements with no parameters - Modify the system behavior to auto load statistics when the
stats-lease
variable value is 0 - Support exporting historical statistics
- Support the
dump
/load
correlation of histograms
- Optimize the
- SQL Execution Engine
- Optimize log output:
EXECUTE
outputs user variables andCOMMIT
outputs slow query logs to facilitate troubleshooting - Support the
EXPLAIN ANALYZE
function to improve SQL tuning usability - Support the
admin show next_row_id
command to get the ID of the next row - Add six built-in functions:
JSON_QUOTE
,JSON_ARRAY_APPEND
,JSON_MERGE_PRESERVE
,BENCHMARK
,COALESCE
, andNAME_CONST
- Optimize control logics on the chunk size to dynamically adjust based on the query context, to reduce the SQL execution time and resource consumption
- Support tracking and controlling memory usage in three operators -
TableReader
,IndexReader
andIndexLookupReader
- Optimize the Merge Join operator to support an empty
ON
condition - Optimize write performance for single tables that contains too many columns
- Improve the performance of
admin show ddl jobs
by supporting scanning data in reverse order - Add the
split table region
statement to manually split the table Region to alleviate hotspot issues - Add the
split index region
statement to manually split the index Region to alleviate hotspot issues - Add a blocklist to prohibit pushing down expressions to Coprocessor
- Optimize the
Expensive Query
log to print the SQL query in the log when it exceeds the configured limit of execution time or memory
- Optimize log output:
- DDL
- Support migrating from character set
utf8
toutf8mb4
- Change the default character set from
utf8
toutf8mb4
- Add the
alter schema
statement to modify the character set and the collation of the database - Support ALTER algorithm
INPLACE
/INSTANT
- Support
SHOW CREATE VIEW
- Support
SHOW CREATE USER
- Support fast recovery of mistakenly deleted tables
- Support adjusting the number of concurrencies of ADD INDEX dynamically
- Add the
pre_split_regions
option that pre-allocates Regions when creating the table using theCREATE TABLE
statement, to relieve write hot Regions caused by lots of writes after the table creation - Support splitting Regions by the index and range of the table specified using SQL statements to relieve hotspot issues
- Add the
ddl_error_count_limit
global variable to limit the number of DDL task retries - Add a feature to use
SHARD_ROW_ID_BITS
to scatter row IDs when the column contains an AUTO_INCREMENT attribute to relieve hotspot issues - Optimize the lifetime of invalid DDL metadata to speed up recovering the normal execution of DDL operations after upgrading the TiDB cluster
- Support migrating from character set
- Transactions
- Support the pessimistic transaction model (Experimental)
- Optimize transaction processing logics to adapt to more scenarios:
- Change the default value
tidb_disable_txn_auto_retry
toon
, which means non-auto committed transactions will not be retried - Add the
tidb_batch_commit
system variable to split a transaction into multiple ones to be executed concurrently - Add the
tidb_low_resolution_tso
system variable to control the number of TSOs to obtain in batches and reduce the number of times that transactions request for TSOs, to improve performance in scenarios with relatively low requirement of consistency - Add the
tidb_skip_isolation_level_check
variable to control whether to report errors when the isolation level is set to SERIALIZABLE - Modify the
tidb_disable_txn_auto_retry
system variable to make it work on all retryable errors
- Change the default value
- Permission Management
- Perform permission check on the
ANALYZE
,USE
,SET GLOBAL
, andSHOW PROCESSLIST
statements - Support Role Based Access Control (RBAC) (Experimental)
- Perform permission check on the
- Server
- Optimize slow query logs:
- Restructure the log format
- Optimize the log content
- Optimize the log query method to support using the
INFORMATION_SCHEMA.SLOW_QUERY
andADMIN SHOW SLOW
statements of the memory table to query slow query logs
- Develop a unified log format specification with restructured log system to facilitate collection and analysis by tools
- Support using SQL statements to manage TiDB Binlog services, including querying status, enabling TiDB Binlog, maintaining and sending TiDB Binlog strategies.
- Support using
unix_socket
to connect to the database - Support
Trace
for SQL statements - Support getting information for a TiDB instance via the
/debug/zip
HTTP interface to facilitate troubleshooting. - Optimize monitoring items to facilitate troubleshooting:
- Add the
high_error_rate_feedback_total
monitoring item to monitor the difference between the actual data volume and the estimated data volume based on statistics - Add a QPS monitoring item in the database dimension
- Add the
- Optimize the system initialization process to only allow the DDL owner to perform the initialization. This reduces the startup time for initialization or upgrading.
- Optimize the execution logic of
kill query
to improve performance and ensure resource is release properly - Add a startup option
config-check
to check the validity of the configuration file - Add the
tidb_back_off_weight
system variable to control the backoff time of internal error retries - Add the
wait_timeout
andinteractive_timeout
system variables to control the maximum idle connections allowed - Add the connection pool for TiKV to shorten the connection establishing time
- Optimize slow query logs:
- Compatibility
- Support the
ALLOW_INVALID_DATES
SQL mode - Support the MySQL 320 Handshake protocol
- Support manifesting unsigned BIGINT columns as auto-increment columns
- Support the
SHOW CREATE DATABASE IF NOT EXISTS
syntax - Optimize the fault tolerance of
load data
for CSV files - Abandon the predicate pushdown operation when the filtering condition contains a user variable to improve the compatibility with MySQL’s behavior of using user variables to simulate Window Functions
- Support the
PD
- Support re-creating a cluster from a single node
- Migrate Region metadata from etcd to the go-leveldb storage engine to solve the storage bottleneck in etcd for large-scale clusters
- API
- Add the
remove-tombstone
API to clear Tombstone stores - Add the
ScanRegions
API to batch query Region information - Add the
GetOperator
API to query running operators - Optimize the performance of the
GetStores
API
- Add the
- Configurations
- Optimize configuration check logic to avoid configuration item errors
- Add
enable-two-way-merge
to control the direction of Region merge - Add
hot-region-schedule-limit
to control the scheduling rate for hot Regions - Add
hot-region-cache-hits-threshold
to identify hotspot when hitting multiple thresholds consecutively - Add the
store-balance-rate
configuration item to control the maximum numbers of balance Region operators allowed per minute
- Scheduler Optimizations
- Add the store limit mechanism for separately controlling the speed of operators for each store
- Support the
waitingOperator
queue to optimize the resource race among different schedulers - Support scheduling rate limit to actively send scheduling operations to TiKV. This improves the scheduling rate by limiting the number of concurrent scheduling tasks on a single node.
- Optimize the
Region Scatter
scheduling to be not restrained by the limit mechanism - Add the
shuffle-hot-region
scheduler to facilitate TiKV stability test in scenarios of poor hotspot scheduling
- Simulator
- Add simulator for data import scenarios
- Support setting different heartbeats intervals for the Store
- Others
- Upgrade etcd to solve the issues of inconsistent log output formats, Leader selection failure in prevote, and lease deadlocking
- Develop a unified log format specification with restructured log system to facilitate collection and analysis by tools
- Add monitoring metrics including scheduling parameters, cluster label information, time consumed by PD to process TSO requests, Store ID and address information, etc.
TiKV
- Support distributed GC and concurrent lock resolving for improved GC performance
- Support reversed
raw_scan
andraw_batch_scan
- Support Multi-thread Raftstore and Multi-thread Apply to improve scalabilities, concurrency capacity, and resource usage within a single node. Performance improves by 70% under the same level of pressure
- Support batch receiving and sending Raft messages, improving TPS by 7% for write intensive scenarios
- Support checking RocksDB Level 0 files before applying snapshots to avoid write stall
- Introduce Titan, a key-value plugin that improves write performance for scenarios with value sizes greater than 1KiB, and relieves write amplification in certain degrees
- Support the pessimistic transaction model (Experimental)
- Support getting monitoring information via HTTP
- Modify the semantics of
Insert
to allow Prewrite to succeed only when there is no Key - Develop a unified log format specification with restructured log system to facilitate collection and analysis by tools
- Add performance metrics related to configuration information and key bound crossing
- Support Local Reader in RawKV to improve performance
- Engine
- Optimize memory management to reduce memory allocation and copying for
Iterator Key Bound Option
- Support
block cache
sharing among different column families
- Optimize memory management to reduce memory allocation and copying for
- Server
- Reduce context switch overhead from
batch commands
- Remove
txn scheduler
- Add monitoring items related to
read index
andGC worker
- Reduce context switch overhead from
- RaftStore
- Support Hibernate Regions to optimize CPU consumption from RaftStore (Experimental)
- Remove the local reader thread
- Coprocessor
- Refactor the computation framework to implement vector operators, computation using vector expressions, and vector aggregations to improve performance
- Support providing operator execution status for the
EXPLAIN ANALYZE
statement in TiDB - Switch to the
work-stealing
thread pool model to reduce context switch cost
Tools
- TiDB Lightning
- Support redirected replication of data tables
- Support importing CSV files
- Improve performance for conversion from SQL to KV pairs
- Support batch import of single tables to improve performance
- Support separately importing data and indexes for big tables to improve the performance of TiKV-importer
- Support filling the missing column using the
row_id
or the default column value when column data is missing in the new file - Support setting a speed limit in
TIKV-importer
when uploading SST files to TiKV
- TiDB Binlog
- Add the
advertise-addr
configuration in Drainer to support the bridge mode in the container environment - Add the
GetMvccByEncodeKey
function in Pump to speed up querying the transaction status - Support compressing communication data among components to reduce network resource consumption
- Add the Arbiter tool that supports reading binlog from Kafka and replicate the data into MySQL
- Support filtering out files that don’t require replication via Reparo
- Support replicating generated columns
- Add the
syncer.sql-mode
configuration item to support using different sql-modes to parse DDL queries - Add the
syncer.ignore-table
configuration item to support filtering tables not to be replicated
- Add the
- sync-diff-inspector
- Support checkpoint to record verification status and continue the verification from last saved point after restarting
- Add the
only-use-checksum
configuration item to check data consistency by calculating checksum - Support using TiDB statistics and multiple columns to split chunks for comparison to adapt to more scenarios
TiDB Ansible
- Upgrade the following monitoring components to a stable version:
- Prometheus from V2.2.1 to V2.8.1
- Pushgateway from V0.4.0 to V0.7.0
- Node_exporter from V0.15.2 to V0.17.0
- Alertmanager from V0.14.0 to V0.17.0
- Grafana from V4.6.3 to V6.1.6
- Ansible from V2.5.14 to V2.7.11
- Add the TiKV summary monitoring dashboard to view cluster status conveniently
- Add the TiKV trouble_shooting monitoring dashboard to remove duplicate items and facilitate troubleshooting
- Add the TiKV details monitoring dashboard to facilitate debugging and troubleshooting
- Add concurrent check for version consistency during rolling updates to improve the update performance
- Support deployment and operations for TiDB Lightning
- Optimize the
table-regions.py
script to support displaying Leader distribution by tables - Optimize TiDB monitoring and add latency related monitoring items by SQL categories
- Modify the operating system version limit to only support the CentOS 7.0+ and Red Hat 7.0+ operating systems
- Add the monitoring item to predict the maximum QPS of the cluster (hidden by default)