EXPLAIN ANALYZE

The EXPLAIN ANALYZE statement works similar to EXPLAIN, with the major difference being that it will actually execute the statement. This allows you to compare the estimates used as part of query planning to actual values encountered during execution. If the estimates differ significantly from the actual values, you should consider running ANALYZE TABLE on the affected tables.

Synopsis

ExplainSym
EXPLAINDESCRIBEDESC
ExplainStmt
ExplainSymTableNameColumnNameANALYZEExplainableStmtFORCONNECTIONNUMFORMAT=stringLitExplainFormatTypeFORCONNECTIONNUMExplainableStmt
ExplainableStmt
SelectStmtDeleteFromStmtUpdateStmtInsertIntoStmtReplaceIntoStmtUnionStmt

EXPLAIN ANALYZE output format

Different from EXPLAIN, EXPLAIN ANALYZE executes the corresponding SQL statement, records its runtime information, and returns the information together with the execution plan. Therefore, you can regard EXPLAIN ANALYZE as an extension of the EXPLAIN statement. Compared to EXPLAIN (for debugging query exeuction), the return results of EXPLAIN ANALYZE also include columns of information such as actRows, execution info, memory, and disk. The details of these columns are shown as follows:

attribute namedescription
actRowsNumber of rows output by the operator.
execution infoExecution information of the operator. time represents the total wall time from entering the operator to leaving the operator, including the total execution time of all sub-operators. If the operator is called many times by the parent operator (in loops), then the time refers to the accumulated time. loops is the number of times the current operator is called by the parent operator.
memoryMemory space occupied by the operator.
diskDisk space occupied by the operator.

Examples

CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, c1 INT NOT NULL);
Query OK, 0 rows affected (0.12 sec)
INSERT INTO t1 (c1) VALUES (1), (2), (3);
Query OK, 3 rows affected (0.02 sec) Records: 3 Duplicates: 0 Warnings: 0
EXPLAIN ANALYZE SELECT * FROM t1 WHERE id = 1;
+-------------+---------+---------+------+---------------+----------------------------------------------------------------+---------------+--------+------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +-------------+---------+---------+------+---------------+----------------------------------------------------------------+---------------+--------+------+ | Point_Get_1 | 1.00 | 1 | root | table:t1 | time:757.205µs, loops:2, Get:{num_rpc:1, total_time:697.051µs} | handle:1 | N/A | N/A | +-------------+---------+---------+------+---------------+----------------------------------------------------------------+---------------+--------+------+ 1 row in set (0.01 sec)
EXPLAIN ANALYZE SELECT * FROM t1;
+-------------------+----------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+-----------+------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +-------------------+----------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+-----------+------+ | TableReader_5 | 10000.00 | 3 | root | | time:278.2µs, loops:2, cop_task: {num: 1, max: 437.6µs, proc_keys: 3, rpc_num: 1, rpc_time: 423.9µs, copr_cache_hit_ratio: 0.00} | data:TableFullScan_4 | 251 Bytes | N/A | | └─TableFullScan_4 | 10000.00 | 3 | cop[tikv] | table:t1 | tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 3, total_process_keys_size: 111, total_keys: 4, rocksdb: {delete_skipped_count: 0, key_skipped_count: 3, block: {cache_hit_count: 0, read_count: 0, read_byte: 0 Bytes}}} | keep order:false, stats:pseudo | N/A | N/A | +-------------------+----------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+-----------+------+ 2 rows in set (0.00 sec)

Execution information of operators

In addition to the basic time and loop execution information, execution info also contains operator-specific execution information, which mainly includes the time consumed for the operator to send RPC requests and the duration of other steps.

Point_Get

The execution information from a Point_Get operator will typically contain the following information:

  • Get:{num_rpc:1, total_time:697.051µs}: The number of the Get RPC requests (num_rpc) sent to TiKV and the total duration (total_time) of all RPC requests.
  • ResolveLock:{num_rpc:1, total_time:12.117495ms}: If TiDB encounters a lock when reading data, it has to resolve the lock first, which generally occurs in the scenario of read-write conflict. This information indicates the duration of resolving locks.
  • regionMiss_backoff:{num:11, total_time:2010 ms},tikvRPC_backoff:{num:11, total_time:10691 ms}: When an RPC request fails, TiDB will wait the backoff time before retrying the request. Backoff statistics include the type of backoff (such as regionMiss and tikvRPC), the total waiting time (total_time), and the total number of backoffs (num).

Batch_Point_Get

The execution information of the Batch_Point_Get operator is similar to that of the Point_Get operator, but Batch_Point_Get generally sends BatchGet RPC requests to TiKV to read data.

BatchGet:{num_rpc:2, total_time:83.13µs}: The number of RPC requests (num_rpc) of the BatchGet type sent to TiKV and the total time consumed (total_time) for all RPC requests.

TableReader

The execution information of a TableReader operator is typically as follows:

cop_task: {num: 6, max: 1.07587ms, min: 844.312µs, avg: 919.601µs, p95: 1.07587ms, max_proc_keys: 16, p95_proc_keys: 16, tot_proc: 1ms, tot_wait: 1ms, rpc_num: 6, rpc_time: 5.313996 ms, copr_cache_hit_ratio: 0.00}
  • cop_task: Contains the execution information of cop tasks. For example:
    • num: The number of cop tasks.
    • max, min, avg, p95: The maximum, minimum, average, and P95 values of the execution time consumed for executing cop tasks.
    • max_proc_keys and p95_proc_keys: The maximum and P95 key-values scanned by TiKV in all cop tasks. If the difference between the maximum value and the P95 value is large, the data distribution might be imbalanced.
    • rpc_num, rpc_time: The total number and total time consumed for Cop RPC requests sent to TiKV.
    • copr_cache_hit_ratio: The hit rate of Coprocessor Cache for cop task requests.
  • backoff: Contains different types of backoff and the total waiting time of backoff.

Insert

The execution information of an Insert operator is typically as follows:

prepare:109.616µs, check_insert:{total_time:1.431678ms, mem_insert_time:667.878µs, prefetch:763.8µs, rpc:{BatchGet:{num_rpc:1, total_time:699.166µs},Get:{num_rpc:1, total_time:378.276µs }}}
  • prepare: The time consumed for preparing to write, including expression, default value and auto-increment value calculations.
  • check_insert: This information generally appears in insert ignore and insert on duplicate statements, including conflict checking and the time consumed for writing data to TiDB transaction cache. Note that this time consumption does not include the time consumed for transaction commit. It contains the following information:
    • total_time: The total time spent on the check_insert step.
    • mem_insert_time: The time consumed for writing data to the TiDB transaction cache.
    • prefetch: The duration of retrieving the data that needs to be checked for conflicts from TiKV. This step sends a Batch_Get RPC request to TiKV to retrieve data.
    • rpc: The total time consumed for sending RPC requests to TiKV, which generally includes two types of RPC time, BatchGet and Get, among which:
      • BatchGet RPC request is sent in the prefetch step.
      • Get RPC request is sent when the insert on duplicate statement executes duplicate update.
  • backoff: Contains different types of backoff and the total waiting time of backoff.

IndexJoin

The IndexJoin operator has 1 outer worker and N inner workers for concurrent execution. The join result preserves the order of the outer table. The detailed execution process is as follows:

  1. The outer worker reads N outer rows, then wraps it into a task, and sends it to the result channel and the inner worker channel.
  2. The inner worker receives the task, build key ranges from the task, and fetches inner rows according to the key ranges. It then builds the inner row hash table.
  3. The main IndexJoin thread receives the task from the result channel and waits for the inner worker to finish handling the task.
  4. The main IndexJoin thread joins each outer row by looking up to the inner rows' hash table.

The IndexJoin operator contains the following execution information:

inner:{total:4.297515932s, concurrency:5, task:17, construct:97.96291ms, fetch:4.164310088s, build:35.219574ms}, probe:53.574945ms
  • Inner: The execution information of inner worker:
    • total: The total time consumed by the inner worker.
    • concurrency: The number of concurrent inner workers.
    • task: The total number of tasks processed by the inner worker.
    • construct: The preparation time before the inner worker reads the inner table rows corresponding to the task.
    • fetch: The total time consumed for it takes for the inner worker to read inner table rows.
    • Build: The total time consumed for it takes for the inner worker to construct the hash table of the corresponding inner table rows.
  • probe: The total time consumed by the main IndexJoin thread to perform join operations with the hash table of the outer table rows and the inner table rows.

IndexHashJoin

The execution process of the IndexHashJoin operator is similar to that of the IndexJoin operator. IndexHashJoin operator also has 1 outer worker and N inner workers to execute in parallel, but the output order is not guaranteed to be consistent with that of the outer table. The detailed execution process is as follows:

  1. The outer worker reads N outer rows, builds a task, and sends it to the inner worker channel.
  2. The inner worker receives the tasks from the inner worker channel and performs the following three operations in order for every task: a. Build a hash table from the outer rows b. Build key ranges from outer rows and fetches inner rows c. Probe the hash table and sends the join result to the result channel. Note: step a and step b are running concurrently.
  3. The main thread of IndexHashJoin receives the join results from the result channel.

The IndexHashJoin operator contains the following execution information:

inner:{total:4.429220003s, concurrency:5, task:17, construct:96.207725ms, fetch:4.239324006s, build:24.567801ms, join:93.607362ms}
  • Inner: the execution information of inner worker:
    • total: the total time consumed by the inner worker.
    • concurrency: the number of inner workers.
    • task: The total number of tasks processed by the inner worker.
    • construct: The preparation time before the inner worker reads the inner table rows.
    • fetch: The total time consumed for inner worker to read inner table rows.
    • Build: The total time consumed for inner worker to construct the hash table of the outer table rows.
    • join: The total time consumed for inner worker to do join with the inner table rows and the hash table of outer table rows.

HashJoin

The HashJoin operator has an inner worker, an outer worker, and N join workers. The detailed execution process is as follows:

  1. The inner worker reads inner table rows and constructs a hash table.
  2. The outer worker reads the outer table rows, then wraps it into a task and sends it to the join worker.
  3. The join worker waits for the hash table construction in step 1 to finish.
  4. The join worker uses the outer table rows and hash table in the task to perform join operations, and then sends the join result to the result channel.
  5. The main thread of HashJoin receives the join result from the result channel.

The HashJoin operator contains the following execution information:

build_hash_table:{total:146.071334ms, fetch:110.338509ms, build:35.732825ms}, probe:{concurrency:5, total:857.162518ms, max:171.48271ms, probe:125.341665ms, fetch:731.820853ms}
  • build_hash_table: Reads the data of the inner table and constructs the execution information of the hash table:
    • total: The total time consumption.
    • fetch: The total time spent reading inner table data.
    • build: The total time spent constructing a hash table.
  • probe: The execution information of join workers:
    • concurrency: The number of join workers.
    • total: The total time consumed by all join workers.
    • max: The longest time for a single join worker to execute.
    • probe: The total time consumed for joining with outer table rows and the hash table.
    • fetch: The total time that the join worker waits to read the outer table rows data.

TableFullScan (TiFlash)

The TableFullScan operator executed on a TiFlash node contains the following execution information:

tiflash_scan: { dtfile: { total_scanned_packs: 2, total_skipped_packs: 1, total_scanned_rows: 16000, total_skipped_rows: 8192, total_rough_set_index_load_time: 2ms, total_read_time: 20ms }, total_create_snapshot_time: 1ms }
  • dtfile: the DTFile (DeltaTree File) related information during the table scan, which reflects the data scan status of the TiFlash Stable layer.
    • total_scanned_packs: the total number of packs that have been scanned in the DTFile. A pack is the minimum unit that can be read in the TiFlash DTFile. By default, every 8192 rows constitute a pack.
    • total_skipped_packs: the total number of packs that have been skipped by the scan in the DTFile. When a WHERE clause hits rough set indexes or matches the range filtering of a primary key, the irrelevant packs are skipped.
    • total_scanned_rows: the total number of rows that have been scanned in the DTFile. If there are multiple versions of updates or deletions because of MVCC, each version is counted independently.
    • total_skipped_rows: the total number of rows that are skipped by the scan in the DTFile.
    • total_rs_index_load_time: the total time used to read DTFile rough set indexes.
    • total_read_time: the total time used to read DTFile data.
  • total_create_snapshot_time: the total time used to create snapshots during the table scan.

lock_keys execution information

When a DML statement is executed in a pessimistic transaction, the execution information of the operator might also include the execution information of lock_keys. For example:

lock_keys: {time:94.096168ms, region:6, keys:8, lock_rpc:274.503214ms, rpc_count:6}
  • time: The total duration of executing the lock_keys operation.
  • region: The number of Regions involved in executing the lock_keys operation.
  • keys: The number of Keys that need Lock.
  • lock_rpc: The total time spent sending an RPC request of the Lock type to TiKV. Because multiple RPC requests can be sent in parallel, the total RPC time consumption might be greater than the total time consumption of the lock_keys operation.
  • rpc_count: The total number of RPC requests of the Lock type sent to TiKV.

commit_txn execution information

When a write-type DML statement is executed in a transaction with autocommit=1, the execution information of the write operator will also include the duration information of the transaction commit. For example:

commit_txn: {prewrite:48.564544ms, wait_prewrite_binlog:47.821579, get_commit_ts:4.277455ms, commit:50.431774ms, region_num:7, write_keys:16, write_byte:536}
  • prewrite: The time consumed for the prewrite phase of the 2PC commit of the transaction.
  • wait_prewrite_binlog:: The time consumed for waiting to write the prewrite Binlog.
  • get_commit_ts: The time consumed for getting the transaction commit timestamp.
  • commit: The time consumed for the commit phase during the 2PC commit of the transaction.
  • write_keys: The total keys written in the transaction.
  • write_byte: The total bytes of key-value written in the transaction, and the unit is byte.

RU (Request Unit) consumption

Request Unit (RU) is a unified abstraction unit of system resources, which is defined in TiDB resource control. The execution info of the top-level operator shows the overall RU consumption of this particular SQL statement.

RU:273.842670

You can calculate the RU from the other values in EXPLAIN ANALYZE, specifically the execution info column. For example:

'executeInfo': time:2.55ms, loops:2, RU:0.329460, Get:{ num_rpc:1, total_time:2.13ms }, total_process_time: 231.5µs, total_wait_time: 732.9µs, tikv_wall_time: 995.8µs, scan_detail: { total_process_keys: 1, total_process_keys_size: 150, total_keys: 1, get_snapshot_time: 691.7µs, rocksdb: { block: { cache_hit_count: 2, read_count: 1, read_byte: 8.19 KB, read_time: 10.3µs } } },

The base costs are defined in the tikv/pd source code and the calculations are performed in the model.go file.

If you are using TiDB v7.1, the calculation is the sum of BeforeKVRequest() and AfterKVRequest() in pd/pd-client/model.go, that is:

before key/value request is processed: consumption.RRU += float64(kc.ReadBaseCost) -> kv.ReadBaseCost * rpc_nums after key/value request is processed: consumption.RRU += float64(kc.ReadBytesCost) * readBytes -> kc.ReadBytesCost * total_process_keys_size consumption.RRU += float64(kc.CPUMsCost) * kvCPUMs -> kc.CPUMsCost * total_process_time

For writes and batch gets, the calculation is similar with different base costs.

Other common execution information

The Coprocessor operators usually contain two parts of execution time information: cop_task and tikv_task. cop_task is the time recorded by TiDB, and it is from the moment that the request is sent to the server to the moment that the response is received. tikv_task is the time recorded by TiKV Coprocessor itself. If there is much difference between the two, it might indicate that the time spent waiting for the response is too long, or the time spent on gRPC or network is too long.

MySQL compatibility

EXPLAIN ANALYZE is a feature of MySQL 8.0, but both the output format and the potential execution plans in TiDB differ substantially from MySQL.

See also