Index Selection

Reading data from storage engines is one of the most time-consuming steps during the SQL execution. Currently, TiDB supports reading data from different storage engines and different indexes. Query execution performance depends largely on whether you select a suitable index or not.

This document introduces how to select an index to access a table, and some related ways to control index selection.

Access tables

Before introducing index selection, it is important to understand the ways TiDB accesses tables, what triggers each way, what differences each way makes, and what the pros and cons are.

Operators for accessing tables

OperatorTrigger ConditionsApplicable ScenariosExplanations
PointGet / BatchPointGetWhen accessing tables in one or more single point ranges.Any scenarioIf triggered, it is usually considered as the fastest operator, since it calls the kvget interface directly to perform the calculations rather than calls the coprocessor interface.
TableReaderNoneAny scenarioThis TableReader operator is for TiKV. It is generally considered as the least efficient operator that scans table data directly from the TiKV layer. It can be selected only if there is a range query on the _tidb_rowid column, or if there are no other operators for accessing tables to choose from.
TableReaderA table has a replica on the TiFlash node.There are fewer columns to read, but many rows to evaluate.This TableReader operator is for TiFlash. TiFlash is column-based storage. If you need to calculate a small number of columns and a large number of rows, it is recommended to choose this operator.
IndexReaderA table has one or more indexes, and the columns needed for the calculation are included in the indexes.When there is a smaller range query on the indexes, or when there is an order requirement for indexed columns.When multiple indexes exist, a reasonable index is selected based on the cost estimation.
IndexLookupReaderA table has one or more indexes, and the columns needed for calculation are not completely included in the indexes.Same as IndexReader.Since the index does not completely cover calculated columns, TiDB needs to retrieve rows from a table after reading indexes. There is an extra cost compared to the IndexReader operator.
IndexMergeA table has multiple indexes or a multi-valued index.When a multi-valued index or multiple indexes are used.To use the operator, you can specify the optimizer hints, or let the optimizer to automatically select this operator based on cost estimation. For details, see Explain Statements Using Index Merge.

Index selection rules

TiDB selects indexes based on rules or cost. The based rules include pre-rules and skyline-pruning. When selecting an index, TiDB tries the pre-rule first. If an index satisfies a pre-rule, TiDB directly selects this index. Otherwise, TiDB uses skyline-pruning to exclude unsuitable indexes, and then selects the index with the lowest cost based on the cost estimation of each operator that accesses tables.

Rule-based selection

Pre-rules

TiDB uses the following heuristic pre-rules to select indexes:

  • Rule 1: If an index satisfies "unique index with full match + no need to retrieve rows from a table (which means that the plan generated by the index is the IndexReader operator)", TiDB directly selects this index.

  • Rule 2: If an index satisfies "unique index with full match + the need to retrieve rows from a table (which means that the plan generated by the index is the IndexLookupReader operator)", TiDB selects the index with the smallest number of rows to be retrieved from a table as a candidate index.

  • Rule 3: If an index satisfies "ordinary index + no need to retrieve rows from a table + the number of rows to be read is less than the value of a certain threshold", TiDB selects the index with the smallest number of rows to be read as a candidate index.

  • Rule 4: If only one candidate index is selected based on rule 2 and 3, select this candidate index. If two candidate indexes are respectively selected based on rule 2 and 3, select the index with the smaller number of rows to be read (the number of rows with index + the number of rows to be retrieved from a table).

The "index with full match" in the above rules means each indexed column has the equal condition. When executing the EXPLAIN FORMAT = 'verbose' ... statement, if the pre-rules match an index, TiDB outputs a NOTE-level warning indicating that the index matches the pre-rule.

In the following example, because the index idx_b meets the condition "unique index with full match + the need to retrieve rows from a table" in rule 2, TiDB selects the index idx_b as the access path, and SHOW WARNING returns a note indicating that the index idx_b matches the pre-rule.

mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, UNIQUE INDEX idx_b(b)); Query OK, 0 rows affected (0.01 sec) mysql> EXPLAIN FORMAT = 'verbose' SELECT b, c FROM t WHERE b = 3 OR b = 6; +-------------------+---------+---------+------+-------------------------+------------------------------+ | id | estRows | estCost | task | access object | operator info | +-------------------+---------+---------+------+-------------------------+------------------------------+ | Batch_Point_Get_5 | 2.00 | 8.80 | root | table:t, index:idx_b(b) | keep order:false, desc:false | +-------------------+---------+---------+------+-------------------------+------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS; +-------+------+-------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+-------------------------------------------------------------------------------------------+ | Note | 1105 | unique index idx_b of t is selected since the path only has point ranges with double scan | +-------+------+-------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)

Skyline-pruning

Skyline-pruning is a heuristic filtering rule for indexes, which can reduce the probability of wrong index selection caused by wrong estimation. To judge an index, the following three dimensions are needed:

  • How many access conditions are covered by the indexed columns. An "access condition" is a where condition that can be converted to a column range. And the more access conditions an indexed column set covers, the better it is in this dimension.

  • Whether it needs to retrieve rows from a table when you select the index to access the table (that is, the plan generated by the index is IndexReader operator or IndexLookupReader operator). Indexes that do not retrieve rows from a table are better on this dimension than indexes that do. If both indexes need TiDB to retrieve rows from the table, compare how many filtering conditions are covered by the indexed columns. Filtering conditions mean the where condition that can be judged based on the index. If the column set of an index covers more access conditions, the smaller the number of retrieved rows from a table, and the better the index is in this dimension.

  • Select whether the index satisfies a certain order. Because index reading can guarantee the order of certain column sets, indexes that satisfy the query order are superior to indexes that do not satisfy on this dimension.

For these three dimensions above, if the index idx_a performs no worse than the index idx_b in all three dimensions and performs better than idx_b in one dimension, then idx_a is preferred. When executing the EXPLAIN FORMAT = 'verbose' ... statement, if skyline-pruning excludes some indexes, TiDB outputs a NOTE-level warning listing the remaining indexes after the skyline-pruning exclusion.

In the following example, the indexes idx_b and idx_e are both inferior to idx_b_c, so they are excluded by skyline-pruning. The returned result of SHOW WARNING displays the remaining indexes after skyline-pruning.

mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, d INT, e INT, INDEX idx_b(b), INDEX idx_b_c(b, c), INDEX idx_e(e)); Query OK, 0 rows affected (0.01 sec) mysql> EXPLAIN FORMAT = 'verbose' SELECT * FROM t WHERE b = 2 AND c > 4; +-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+ | id | estRows | estCost | task | access object | operator info | +-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+ | IndexLookUp_10 | 33.33 | 738.29 | root | | | | ├─IndexRangeScan_8(Build) | 33.33 | 2370.00 | cop[tikv] | table:t, index:idx_b_c(b, c) | range:(2 4,2 +inf], keep order:false, stats:pseudo | | └─TableRowIDScan_9(Probe) | 33.33 | 2370.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | +-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+ 3 rows in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS; +-------+------+------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+------------------------------------------------------------------------------------------+ | Note | 1105 | [t,idx_b_c] remain after pruning paths for t given Prop{SortItems: [], TaskTp: rootTask} | +-------+------+------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)

Cost estimation-based selection

After using the skyline-pruning rule to rule out inappropriate indexes, the selection of indexes is based entirely on the cost estimation. The cost estimation of accessing tables requires the following considerations:

  • The average length of each row of the indexed data in the storage engine.
  • The number of rows in the query range generated by the index.
  • The cost for retrieving rows from a table.
  • The number of ranges generated by index during the query execution.

According to these factors and the cost model, the optimizer selects an index with the lowest cost to access the table.

Common tuning problems with cost estimation based selection

  1. The estimated number of rows is not accurate?

    This is usually due to stale or inaccurate statistics. You can re-execute the ANALYZE TABLE statement or modify the parameters of the ANALYZE TABLE statement.

  2. Statistics are accurate, and reading from TiFlash is faster, but why does the optimizer choose to read from TiKV?

    At present, the cost model of distinguishing TiFlash from TiKV is still rough. You can decrease the value of tidb_opt_seek_factor parameter, then the optimizer prefers to choose TiFlash.

  3. The statistics are accurate. Index A needs to retrieve rows from tables, but it actually executes faster than Index B that does not retrieve rows from tables. Why does the optimizer choose Index B?

    In this case, the cost estimation may be too large for retrieving rows from tables. You can decrease the value of tidb_opt_network_factor parameter to reduce the cost of retrieving rows from tables.

Control index selection

The index selection can be controlled by a single query through Optimizer Hints.

  • USE_INDEX / IGNORE_INDEX can force the optimizer to use / not use certain indexes. FORCE_INDEX and USE_INDEX have the same effect.

  • READ_FROM_STORAGE can force the optimizer to choose the TiKV / TiFlash storage engine for certain tables to execute queries.

Use multi-valued indexes

Multi-valued indexes are different from normal indexes. TiDB currently only uses IndexMerge to access multi-valued indexes. Therefore, to use multi-valued indexes for data access, make sure that the value of the system variable tidb_enable_index_merge is set to ON.

For the limitations of multi-valued indexes, refer to CREATE INDEX.

Supported scenarios

Currently, TiDB supports accessing multi-valued indexes using IndexMerge that is automatically converted from json_member_of, json_contains, and json_overlaps conditions. You can either rely on the optimizer to automatically select IndexMerge based on cost, or specify the selection of multi-valued indexes through the optimizer hint use_index_merge or use_index. See the following examples:

mysql> CREATE TABLE t1 (j JSON, INDEX idx((CAST(j->'$.path' AS SIGNED ARRAY)))); -- Uses '$.path' as the path to create a multi-valued index Query OK, 0 rows affected (0.04 sec) mysql> EXPLAIN SELECT /*+ use_index_merge(t1, idx) */ * FROM t1 WHERE (1 MEMBER OF (j->'$.path')); +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+------------------------------------------------------------------------+ | Selection_5 | 8000.00 | root | | json_memberof(cast(1, json BINARY), json_extract(test.t1.j, "$.path")) | | └─IndexMerge_8 | 10.00 | root | | type: union | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t1, index:idx(cast(json_extract(`j`, _utf8'$.path') as signed array)) | range:[1,1], keep order:false, stats:pseudo | | └─TableRowIDScan_7(Probe) | 10.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo | +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+------------------------------------------------------------------------+ 4 rows in set, 1 warning (0.00 sec) mysql> EXPLAIN SELECT /*+ use_index_merge(t1, idx) */ * FROM t1 WHERE JSON_CONTAINS((j->'$.path'), '[1, 2, 3]'); +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | IndexMerge_9 | 10.00 | root | | type: intersection | | ├─IndexRangeScan_5(Build) | 10.00 | cop[tikv] | table:t1, index:idx(cast(json_extract(`j`, _utf8'$.path') as signed array)) | range:[1,1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t1, index:idx(cast(json_extract(`j`, _utf8'$.path') as signed array)) | range:[2,2], keep order:false, stats:pseudo | | ├─IndexRangeScan_7(Build) | 10.00 | cop[tikv] | table:t1, index:idx(cast(json_extract(`j`, _utf8'$.path') as signed array)) | range:[3,3], keep order:false, stats:pseudo | | └─TableRowIDScan_8(Probe) | 10.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ 5 rows in set (0.00 sec) mysql> EXPLAIN SELECT /*+ use_index_merge(t1, idx) */ * FROM t1 WHERE JSON_OVERLAPS((j->'$.path'), '[1, 2, 3]'); +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | Selection_5 | 8000.00 | root | | json_overlaps(json_extract(test.t1.j, "$.path"), cast("[1, 2, 3]", json BINARY)) | | └─IndexMerge_10 | 10.00 | root | | type: union | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t1, index:idx(cast(json_extract(`j`, _utf8'$.path') as signed array)) | range:[1,1], keep order:false, stats:pseudo | | ├─IndexRangeScan_7(Build) | 10.00 | cop[tikv] | table:t1, index:idx(cast(json_extract(`j`, _utf8'$.path') as signed array)) | range:[2,2], keep order:false, stats:pseudo | | ├─IndexRangeScan_8(Build) | 10.00 | cop[tikv] | table:t1, index:idx(cast(json_extract(`j`, _utf8'$.path') as signed array)) | range:[3,3], keep order:false, stats:pseudo | | └─TableRowIDScan_9(Probe) | 10.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo | +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------+ 6 rows in set, 1 warning (0.00 sec)

The composite multi-valued index can also be accessed through IndexMerge:

CREATE TABLE t2 (a INT, j JSON, b INT, k JSON, INDEX idx(a, (CAST(j->'$.path' AS SIGNED ARRAY)), b), INDEX idx2(b, (CAST(k->'$.path' AS SIGNED ARRAY)))); EXPLAIN SELECT /*+ use_index_merge(t2, idx) */ * FROM t2 WHERE a=1 AND (1 MEMBER OF (j->'$.path')) AND b=2; EXPLAIN SELECT /*+ use_index_merge(t2, idx) */ * FROM t2 WHERE a=1 AND JSON_CONTAINS((j->'$.path'), '[1, 2, 3]'); EXPLAIN SELECT /*+ use_index_merge(t2, idx) */ * FROM t2 WHERE a=1 AND JSON_OVERLAPS((j->'$.path'), '[1, 2, 3]'); EXPLAIN SELECT /*+ use_index_merge(t2, idx, idx2) */ * FROM t2 WHERE (a=1 AND 1 member of (j->'$.path')) AND (b=1 AND 2 member of (k->'$.path'));
> EXPLAIN SELECT /*+ use_index_merge(t2, idx) */ * FROM t2 WHERE a=1 AND (1 MEMBER OF (j->'$.path')) AND b=2; +-------------------------------+---------+-----------+-----------------------------------------------------------------------------------+-----------------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------------+-----------------------------------------------------+ | IndexMerge_7 | 0.00 | root | | type: union | | ├─IndexRangeScan_5(Build) | 0.00 | cop[tikv] | table:t2, index:idx(a, cast(json_extract(`j`, _utf8'$.path') as signed array), b) | range:[1 1 2,1 1 2], keep order:false, stats:pseudo | | └─TableRowIDScan_6(Probe) | 0.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------------+-----------------------------------------------------+ > EXPLAIN SELECT /*+ use_index_merge(t2, idx) */ * FROM t2 WHERE a=1 AND JSON_CONTAINS((j->'$.path'), '[1, 2, 3]'); +-------------------------------+---------+-----------+-----------------------------------------------------------------------------------+-------------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------------+-------------------------------------------------+ | IndexMerge_9 | 0.00 | root | | type: intersection | | ├─IndexRangeScan_5(Build) | 0.10 | cop[tikv] | table:t2, index:idx(a, cast(json_extract(`j`, _utf8'$.path') as signed array), b) | range:[1 1,1 1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 0.10 | cop[tikv] | table:t2, index:idx(a, cast(json_extract(`j`, _utf8'$.path') as signed array), b) | range:[1 2,1 2], keep order:false, stats:pseudo | | ├─IndexRangeScan_7(Build) | 0.10 | cop[tikv] | table:t2, index:idx(a, cast(json_extract(`j`, _utf8'$.path') as signed array), b) | range:[1 3,1 3], keep order:false, stats:pseudo | | └─TableRowIDScan_8(Probe) | 0.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------------+-------------------------------------------------+ > EXPLAIN SELECT /*+ use_index_merge(t2, idx) */ * FROM t2 WHERE a=1 AND JSON_OVERLAPS((j->'$.path'), '[1, 2, 3]'); +---------------------------------+---------+-----------+-----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +---------------------------------+---------+-----------+-----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | Selection_5 | 0.24 | root | | json_overlaps(json_extract(test.t2.j, "$.path"), cast("[1, 2, 3]", json BINARY)) | | └─IndexMerge_10 | 0.30 | root | | type: union | | ├─IndexRangeScan_6(Build) | 0.10 | cop[tikv] | table:t2, index:idx(a, cast(json_extract(`j`, _utf8'$.path') as signed array), b) | range:[1 1,1 1], keep order:false, stats:pseudo | | ├─IndexRangeScan_7(Build) | 0.10 | cop[tikv] | table:t2, index:idx(a, cast(json_extract(`j`, _utf8'$.path') as signed array), b) | range:[1 2,1 2], keep order:false, stats:pseudo | | ├─IndexRangeScan_8(Build) | 0.10 | cop[tikv] | table:t2, index:idx(a, cast(json_extract(`j`, _utf8'$.path') as signed array), b) | range:[1 3,1 3], keep order:false, stats:pseudo | | └─TableRowIDScan_9(Probe) | 0.30 | cop[tikv] | table:t2 | keep order:false, stats:pseudo | +---------------------------------+---------+-----------+-----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ > EXPLAIN SELECT /*+ use_index_merge(t2, idx, idx2) */ * FROM t2 WHERE (a=1 AND 1 member of (j->'$.path')) AND (b=1 AND 2 member of (k->'$.path')); +-------------------------------+---------+-----------+-----------------------------------------------------------------------------------+-----------------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------------+-----------------------------------------------------+ | IndexMerge_8 | 0.00 | root | | type: intersection | | ├─IndexRangeScan_5(Build) | 0.00 | cop[tikv] | table:t2, index:idx(a, cast(json_extract(`j`, _utf8'$.path') as signed array), b) | range:[1 1 1,1 1 1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 0.10 | cop[tikv] | table:t2, index:idx2(b, cast(json_extract(`k`, _utf8'$.path') as signed array)) | range:[1 2,1 2], keep order:false, stats:pseudo | | └─TableRowIDScan_7(Probe) | 0.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------------+-----------------------------------------------------+

TiDB can also use IndexMerge to access both multi-valued indexes and normal indexes. For example:

CREATE TABLE t3(j1 JSON, j2 JSON, a INT, INDEX k1((CAST(j1->'$.path' AS SIGNED ARRAY))), INDEX k2((CAST(j2->'$.path' AS SIGNED ARRAY))), INDEX ka(a)); EXPLAIN SELECT /*+ use_index_merge(t3, k1, k2, ka) */ * FROM t3 WHERE 1 member of (j1->'$.path') OR a = 3; EXPLAIN SELECT /*+ use_index_merge(t3, k1, k2, ka) */ * FROM t3 WHERE 1 member of (j1->'$.path') AND 2 member of (j2->'$.path') AND (a = 3);
> EXPLAIN SELECT /*+ use_index_merge(t3, k1, k2, ka) */ * FROM t3 WHERE 1 member of (j1->'$.path') OR a = 3; +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | IndexMerge_8 | 19.99 | root | | type: union | | ├─IndexRangeScan_5(Build) | 10.00 | cop[tikv] | table:t3, index:k1(cast(json_extract(`j1`, _utf8'$.path') as signed array)) | range:[1,1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t3, index:ka(a) | range:[3,3], keep order:false, stats:pseudo | | └─TableRowIDScan_7(Probe) | 19.99 | cop[tikv] | table:t3 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ > EXPLAIN SELECT /*+ use_index_merge(t3, k1, k2, ka) */ * FROM t3 WHERE 1 member of (j1->'$.path') AND 2 member of (j2->'$.path') AND (a = 3); +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | IndexMerge_9 | 0.00 | root | | type: intersection | | ├─IndexRangeScan_5(Build) | 10.00 | cop[tikv] | table:t3, index:ka(a) | range:[3,3], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t3, index:k1(cast(json_extract(`j1`, _utf8'$.path') as signed array)) | range:[1,1], keep order:false, stats:pseudo | | ├─IndexRangeScan_7(Build) | 10.00 | cop[tikv] | table:t3, index:k2(cast(json_extract(`j2`, _utf8'$.path') as signed array)) | range:[2,2], keep order:false, stats:pseudo | | └─TableRowIDScan_8(Probe) | 0.00 | cop[tikv] | table:t3 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+

If several json_member_of, json_contains or json_overlaps conditions are connected with OR or AND, they need to meet the following requirements to access multi-valued indexes with IndexMerge:

CREATE TABLE t4(a INT, j JSON, INDEX mvi1((CAST(j->'$.a' AS UNSIGNED ARRAY))), INDEX mvi2((CAST(j->'$.b' AS UNSIGNED ARRAY))));
  • For conditions connected with OR, each of them needs to be able to be accessed with IndexMerge respectively. For example:

    EXPLAIN SELECT /*+ use_index_merge(t4, mvi1) */ * FROM t4 WHERE json_overlaps(j->'$.a', '[1, 2]') OR json_overlaps(j->'$.a', '[3, 4]'); EXPLAIN SELECT /*+ use_index_merge(t4, mvi1) */ * FROM t4 WHERE json_overlaps(j->'$.a', '[1, 2]') OR json_length(j->'$.a') = 3; SHOW WARNINGS;
    > EXPLAIN SELECT /*+ use_index_merge(t4, mvi1) */ * FROM t4 WHERE json_overlaps(j->'$.a', '[1, 2]') OR json_overlaps(j->'$.a', '[3, 4]'); +----------------------------------+---------+-----------+-----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +----------------------------------+---------+-----------+-----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Selection_5 | 31.95 | root | | or(json_overlaps(json_extract(test.t4.j, "$.a"), cast("[1, 2]", json BINARY)), json_overlaps(json_extract(test.t4.j, "$.a"), cast("[3, 4]", json BINARY))) | | └─IndexMerge_11 | 39.94 | root | | type: union | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[1,1], keep order:false, stats:pseudo | | ├─IndexRangeScan_7(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[2,2], keep order:false, stats:pseudo | | ├─IndexRangeScan_8(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[3,3], keep order:false, stats:pseudo | | ├─IndexRangeScan_9(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[4,4], keep order:false, stats:pseudo | | └─TableRowIDScan_10(Probe) | 39.94 | cop[tikv] | table:t4 | keep order:false, stats:pseudo | +----------------------------------+---------+-----------+-----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ -- json_length(j->'$.a') = 3 cannot be accessed with IndexMerge directly, so TiDB cannot use IndexMerge for this SQL statement. > EXPLAIN SELECT /*+ use_index_merge(t4, mvi1) */ * FROM t4 WHERE json_overlaps(j->'$.a', '[1, 2]') OR json_length(j->'$.a') = 3; +-------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------+ | Selection_5 | 8000.00 | root | | or(json_overlaps(json_extract(test.t4.j, "$.a"), cast("[1, 2]", json BINARY)), eq(json_length(json_extract(test.t4.j, "$.a")), 3)) | | └─TableReader_7 | 10000.00 | root | | data:TableFullScan_6 | | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t4 | keep order:false, stats:pseudo | +-------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------+ > SHOW WARNINGS; +---------+------+----------------------------+ | Level | Code | Message | +---------+------+----------------------------+ | Warning | 1105 | IndexMerge is inapplicable | +---------+------+----------------------------+
  • For conditions connected with AND, some of them need to be able to be accessed with IndexMerge respectively. TiDB can only use IndexMerge to access multi-valued indexes with these conditions. For example:

    EXPLAIN SELECT /*+ use_index_merge(t4, mvi1) */ * FROM t4 WHERE json_contains(j->'$.a', '[1, 2]') AND json_contains(j->'$.a', '[3, 4]'); EXPLAIN SELECT /*+ use_index_merge(t4, mvi1) */ * FROM t4 WHERE json_contains(j->'$.a', '[1, 2]') AND json_contains(j->'$.a', '[3, 4]') AND json_length(j->'$.a') = 2;
    > EXPLAIN SELECT /*+ use_index_merge(t4, mvi1) */ * FROM t4 WHERE json_contains(j->'$.a', '[1, 2]') AND json_contains(j->'$.a', '[3, 4]'); +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | IndexMerge_10 | 0.00 | root | | type: intersection | | ├─IndexRangeScan_5(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[1,1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[2,2], keep order:false, stats:pseudo | | ├─IndexRangeScan_7(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[3,3], keep order:false, stats:pseudo | | ├─IndexRangeScan_8(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[4,4], keep order:false, stats:pseudo | | └─TableRowIDScan_9(Probe) | 0.00 | cop[tikv] | table:t4 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ -- json_length(j->'$.a') = 3 cannot be accessed with IndexMerge directly, so TiDB uses IndexMerge to access the other two json_contains conditions, and json_length(j->'$.a') = 3 becomes a Selection operator. > EXPLAIN SELECT /*+ use_index_merge(t4, mvi1) */ * FROM t4 WHERE json_contains(j->'$.a', '[1, 2]') AND json_contains(j->'$.a', '[3, 4]') AND json_length(j->'$.a') = 2; +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+----------------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+----------------------------------------------------+ | IndexMerge_11 | 0.00 | root | | type: intersection | | ├─IndexRangeScan_5(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[1,1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[2,2], keep order:false, stats:pseudo | | ├─IndexRangeScan_7(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[3,3], keep order:false, stats:pseudo | | ├─IndexRangeScan_8(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[4,4], keep order:false, stats:pseudo | | └─Selection_10(Probe) | 0.00 | cop[tikv] | | eq(json_length(json_extract(test.t4.j, "$.a")), 2) | | └─TableRowIDScan_9 | 0.00 | cop[tikv] | table:t4 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+----------------------------------------------------+
  • All the conditions that are used for the IndexMerge must match the semantics of OR or AND that connects them.

    • If json_contains is connected with AND, it matches the semantics. For example:

      EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE json_contains(j->'$.a', '[1]') AND json_contains(j->'$.b', '[2, 3]'); EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE json_contains(j->'$.a', '[1]') OR json_contains(j->'$.b', '[2, 3]');
      > EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE json_contains(j->'$.a', '[1]') AND json_contains(j->'$.b', '[2, 3]'); +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | IndexMerge_9 | 0.00 | root | | type: intersection | | ├─IndexRangeScan_5(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[1,1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t4, index:mvi2(cast(json_extract(`j`, _utf8'$.b') as unsigned array)) | range:[2,2], keep order:false, stats:pseudo | | ├─IndexRangeScan_7(Build) | 10.00 | cop[tikv] | table:t4, index:mvi2(cast(json_extract(`j`, _utf8'$.b') as unsigned array)) | range:[3,3], keep order:false, stats:pseudo | | └─TableRowIDScan_8(Probe) | 0.00 | cop[tikv] | table:t4 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ -- The conditions do not match the semantics, so TiDB cannot use IndexMerge for this SQL statement as explained above. > EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE json_contains(j->'$.a', '[1]') OR json_contains(j->'$.b', '[2, 3]'); +-------------------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ | TableReader_7 | 10.01 | root | | data:Selection_6 | | └─Selection_6 | 10.01 | cop[tikv] | | or(json_contains(json_extract(test.t4.j, "$.a"), cast("[1]", json BINARY)), json_contains(json_extract(test.t4.j, "$.b"), cast("[2, 3]", json BINARY))) | | └─TableFullScan_5 | 10000.00 | cop[tikv] | table:t4 | keep order:false, stats:pseudo | +-------------------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
    • If json_overlaps is connected with OR, it matches the semantics. For example:

      EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE json_overlaps(j->'$.a', '[1]') OR json_overlaps(j->'$.b', '[2, 3]'); EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE json_overlaps(j->'$.a', '[1]') AND json_overlaps(j->'$.b', '[2, 3]');
      > EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE json_overlaps(j->'$.a', '[1]') OR json_overlaps(j->'$.b', '[2, 3]'); +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ | Selection_5 | 23.98 | root | | or(json_overlaps(json_extract(test.t4.j, "$.a"), cast("[1]", json BINARY)), json_overlaps(json_extract(test.t4.j, "$.b"), cast("[2, 3]", json BINARY))) | | └─IndexMerge_10 | 29.97 | root | | type: union | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[1,1], keep order:false, stats:pseudo | | ├─IndexRangeScan_7(Build) | 10.00 | cop[tikv] | table:t4, index:mvi2(cast(json_extract(`j`, _utf8'$.b') as unsigned array)) | range:[2,2], keep order:false, stats:pseudo | | ├─IndexRangeScan_8(Build) | 10.00 | cop[tikv] | table:t4, index:mvi2(cast(json_extract(`j`, _utf8'$.b') as unsigned array)) | range:[3,3], keep order:false, stats:pseudo | | └─TableRowIDScan_9(Probe) | 29.97 | cop[tikv] | table:t4 | keep order:false, stats:pseudo | +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ -- The conditions do not match the semantics, so TiDB can only use IndexMerge for part of the conditions of this SQL statement as explained above. > EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE json_overlaps(j->'$.a', '[1]') AND json_overlaps(j->'$.b', '[2, 3]'); +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ | Selection_5 | 15.99 | root | | json_overlaps(json_extract(test.t4.j, "$.a"), cast("[1]", json BINARY)), json_overlaps(json_extract(test.t4.j, "$.b"), cast("[2, 3]", json BINARY)) | | └─IndexMerge_8 | 10.00 | root | | type: union | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[1,1], keep order:false, stats:pseudo | | └─TableRowIDScan_7(Probe) | 10.00 | cop[tikv] | table:t4 | keep order:false, stats:pseudo | +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
    • If json_member_of is connected with OR or AND, it matches the semantics. For example:

      EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE 1 member of (j->'$.a') AND 2 member of (j->'$.b') AND 3 member of (j->'$.a'); EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE 1 member of (j->'$.a') OR 2 member of (j->'$.b') OR 3 member of (j->'$.a');
      > EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE 1 member of (j->'$.a') AND 2 member of (j->'$.b') AND 3 member of (j->'$.a'); +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | IndexMerge_9 | 0.00 | root | | type: intersection | | ├─IndexRangeScan_5(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[1,1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[3,3], keep order:false, stats:pseudo | | ├─IndexRangeScan_7(Build) | 10.00 | cop[tikv] | table:t4, index:mvi2(cast(json_extract(`j`, _utf8'$.b') as unsigned array)) | range:[2,2], keep order:false, stats:pseudo | | └─TableRowIDScan_8(Probe) | 0.00 | cop[tikv] | table:t4 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ > EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE 1 member of (j->'$.a') OR 2 member of (j->'$.b') OR 3 member of (j->'$.a'); +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | IndexMerge_9 | 29.97 | root | | type: union | | ├─IndexRangeScan_5(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[1,1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t4, index:mvi2(cast(json_extract(`j`, _utf8'$.b') as unsigned array)) | range:[2,2], keep order:false, stats:pseudo | | ├─IndexRangeScan_7(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[3,3], keep order:false, stats:pseudo | | └─TableRowIDScan_8(Probe) | 29.97 | cop[tikv] | table:t4 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+
    • If json_contains conditions that contain multiple values are connected with OR, or json_overlaps conditions that contain multiple values are connected with AND, they do not match the semantics, but they match the semantics if they only contain one value. For example:

      -- Refer to the preceding examples for conditions that do not match the semantics. The following only provides examples of conditions that match the semantics. EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE json_overlaps(j->'$.a', '[1]') AND json_overlaps(j->'$.b', '[2]'); EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE json_contains(j->'$.a', '[1]') OR json_contains(j->'$.b', '[2]');
      > EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE json_overlaps(j->'$.a', '[1]') AND json_overlaps(j->'$.b', '[2]'); +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ | Selection_5 | 8.00 | root | | json_overlaps(json_extract(test.t4.j, "$.a"), cast("[1]", json BINARY)), json_overlaps(json_extract(test.t4.j, "$.b"), cast("[2]", json BINARY)) | | └─IndexMerge_9 | 0.01 | root | | type: intersection | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[1,1], keep order:false, stats:pseudo | | ├─IndexRangeScan_7(Build) | 10.00 | cop[tikv] | table:t4, index:mvi2(cast(json_extract(`j`, _utf8'$.b') as unsigned array)) | range:[2,2], keep order:false, stats:pseudo | | └─TableRowIDScan_8(Probe) | 0.01 | cop[tikv] | table:t4 | keep order:false, stats:pseudo | +---------------------------------+---------+-----------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ > EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE json_contains(j->'$.a', '[1]') OR json_contains(j->'$.b', '[2]'); +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+ | IndexMerge_8 | 19.99 | root | | type: union | | ├─IndexRangeScan_5(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[1,1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t4, index:mvi2(cast(json_extract(`j`, _utf8'$.b') as unsigned array)) | range:[2,2], keep order:false, stats:pseudo | | └─TableRowIDScan_7(Probe) | 19.99 | cop[tikv] | table:t4 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+---------------------------------------------+
    • When both OR and AND are used to connect conditions (essentially nested OR and AND), the conditions that constitute an IndexMerge must either all match the semantics of OR or all match the semantics of AND, not partially match the semantics of OR and partially match the semantics of AND. For example:

      EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE 1 member of (j->'$.a') AND (2 member of (j->'$.b') OR 3 member of (j->'$.a')); EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE 1 member of (j->'$.a') OR (2 member of (j->'$.b') AND 3 member of (j->'$.a'));
      -- Only 2 member of (j->'$.b') and 3 member of (j->'$.a') that match the semantics of OR constitute the IndexMerge. 1 member of (j->'$.a') that matches the semantics of AND is not included. > EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE 1 member of (j->'$.a') AND (2 member of (j->'$.b') OR 3 member of (j->'$.a')); +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | IndexMerge_9 | 0.00 | root | | type: union | | ├─IndexRangeScan_5(Build) | 10.00 | cop[tikv] | table:t4, index:mvi2(cast(json_extract(`j`, _utf8'$.b') as unsigned array)) | range:[2,2], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[3,3], keep order:false, stats:pseudo | | └─Selection_8(Probe) | 0.00 | cop[tikv] | | json_memberof(cast(1, json BINARY), json_extract(test.t4.j, "$.a")), or(json_memberof(cast(2, json BINARY), json_extract(test.t4.j, "$.b")), json_memberof(cast(3, json BINARY), json_extract(test.t4.j, "$.a"))) | | └─TableRowIDScan_7 | 19.99 | cop[tikv] | table:t4 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -- Only 1 member of (j->'$.a') and 2 member of (j->'$.a') that match the semantics of OR constitute the IndexMerge. 2 member of (j->'$.b') that matches the semantics of AND is not included. > EXPLAIN SELECT /*+ use_index_merge(t4, mvi1, mvi2) */ * FROM t4 WHERE 1 member of (j->'$.a') OR (2 member of (j->'$.b') AND 3 member of (j->'$.a')); +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | IndexMerge_9 | 0.02 | root | | type: union | | ├─IndexRangeScan_5(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[1,1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 10.00 | cop[tikv] | table:t4, index:mvi1(cast(json_extract(`j`, _utf8'$.a') as unsigned array)) | range:[3,3], keep order:false, stats:pseudo | | └─Selection_8(Probe) | 0.02 | cop[tikv] | | or(json_memberof(cast(1, json BINARY), json_extract(test.t4.j, "$.a")), and(json_memberof(cast(2, json BINARY), json_extract(test.t4.j, "$.b")), json_memberof(cast(3, json BINARY), json_extract(test.t4.j, "$.a")))) | | └─TableRowIDScan_7 | 19.99 | cop[tikv] | table:t4 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

If conditions contain nested OR/AND, or if conditions correspond to only indexed columns after transformations, such as expansion, TiDB might not be able to use IndexMerge or make full use of all conditions. It is recommended to verify the behavior of each specific case.

The following are some examples:

CREATE TABLE t5 (a INT, j JSON, b INT, k JSON, INDEX idx(a, (CAST(j AS SIGNED ARRAY))), INDEX idx2(b, (CAST(k as SIGNED ARRAY)))); CREATE TABLE t6 (a INT, j JSON, b INT, k JSON, INDEX idx(a, (CAST(j AS SIGNED ARRAY)), b), INDEX idx2(a, (CAST(k as SIGNED ARRAY)), b));

If AND is nested in conditions connected by OR and subconditions connected by AND correspond to the exact columns of a multi-column index, TiDB can usually make full use of the conditions. For example:

EXPLAIN SELECT /*+ use_index_merge(t5, idx, idx2) */ * FROM t5 WHERE (a=1 AND 1 member of (j)) OR (b=2 AND 2 member of (k));
> EXPLAIN SELECT /*+ use_index_merge(t5, idx, idx2) */ * FROM t5 WHERE (a=1 AND 1 member of (j)) OR (b=2 AND 2 member of (k)); +-------------------------------+---------+-----------+----------------------------------------------------+-------------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+----------------------------------------------------+-------------------------------------------------+ | IndexMerge_8 | 0.20 | root | | type: union | | ├─IndexRangeScan_5(Build) | 0.10 | cop[tikv] | table:t5, index:idx(a, cast(`j` as signed array)) | range:[1 1,1 1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 0.10 | cop[tikv] | table:t5, index:idx2(b, cast(`k` as signed array)) | range:[2 2,2 2], keep order:false, stats:pseudo | | └─TableRowIDScan_7(Probe) | 0.20 | cop[tikv] | table:t5 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+----------------------------------------------------+-------------------------------------------------+

If a single OR is nested in conditions connected by AND and subconditions connected by OR correspond to the indexed columns after expansion, TiDB can usually make full use of the conditions. For example:

EXPLAIN SELECT /*+ use_index_merge(t6, idx, idx2) */ * FROM t6 WHERE a=1 AND (1 member of (j) OR 2 member of (k));
+-------------------------------+---------+-----------+-------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+ | IndexMerge_9 | 0.20 | root | | type: union | | ├─IndexRangeScan_5(Build) | 0.10 | cop[tikv] | table:t6, index:idx(a, cast(`j` as signed array), b) | range:[1 1,1 1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 0.10 | cop[tikv] | table:t6, index:idx2(a, cast(`k` as signed array), b) | range:[1 2,1 2], keep order:false, stats:pseudo | | └─Selection_8(Probe) | 0.20 | cop[tikv] | | eq(test2.t6.a, 1), or(json_memberof(cast(1, json BINARY), test2.t6.j), json_memberof(cast(2, json BINARY), test2.t6.k)) | | └─TableRowIDScan_7 | 0.20 | cop[tikv] | table:t6 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+

If multiple OR are nested in conditions connected by AND and subconditions connected by OR need expansion to correspond to the index columns, TiDB might not be able to make full use of all conditions. For example:

EXPLAIN SELECT /*+ use_index_merge(t6, idx, idx2) */ * FROM t6 WHERE a=1 AND (1 member of (j) OR 2 member of (k)) and (b = 1 OR b = 2); EXPLAIN SELECT /*+ use_index_merge(t6, idx, idx2) */ * FROM t6 WHERE a=1 AND ((1 member of (j) AND b = 1) OR (1 member of (j) AND b = 2) OR (2 member of (k) AND b = 1) OR (2 member of (k) AND b = 2));
-- Due to current implementation limitations, (b = 1 or b = 2) does not constitute the IndexMerge, but becomes a Selection operator > EXPLAIN SELECT /*+ use_index_merge(t6, idx, idx2) */ * FROM t6 WHERE a=1 AND (1 member of (j) OR 2 member of (k)) AND (b = 1 OR b = 2); +-------------------------------+---------+-----------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ | IndexMerge_9 | 0.20 | root | | type: union | | ├─IndexRangeScan_5(Build) | 0.10 | cop[tikv] | table:t6, index:idx(a, cast(`j` as signed array), b) | range:[1 1,1 1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 0.10 | cop[tikv] | table:t6, index:idx2(a, cast(`k` as signed array), b) | range:[1 2,1 2], keep order:false, stats:pseudo | | └─Selection_8(Probe) | 0.20 | cop[tikv] | | eq(test.t6.a, 1), or(eq(test.t6.b, 1), eq(test.t6.b, 2)), or(json_memberof(cast(1, json BINARY), test.t6.j), json_memberof(cast(2, json BINARY), test.t6.k)) | | └─TableRowIDScan_7 | 0.20 | cop[tikv] | table:t6 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ -- If you manually expand the two OR conditions connected with AND, TiDB can make full use of these conditions > EXPLAIN SELECT /*+ use_index_merge(t6, idx, idx2) */ * FROM t6 WHERE a=1 AND ((1 member of (j) AND b = 1) OR (1 member of (j) AND b = 2) OR (2 member of (k) AND b = 1) OR (2 member of (k) AND b = 2)); +-------------------------------+---------+-----------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------------+---------+-----------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | IndexMerge_11 | 0.00 | root | | type: union | | ├─IndexRangeScan_5(Build) | 0.00 | cop[tikv] | table:t6, index:idx(a, cast(`j` as signed array), b) | range:[1 1 1,1 1 1], keep order:false, stats:pseudo | | ├─IndexRangeScan_6(Build) | 0.00 | cop[tikv] | table:t6, index:idx(a, cast(`j` as signed array), b) | range:[1 1 2,1 1 2], keep order:false, stats:pseudo | | ├─IndexRangeScan_7(Build) | 0.00 | cop[tikv] | table:t6, index:idx2(a, cast(`k` as signed array), b) | range:[1 2 1,1 2 1], keep order:false, stats:pseudo | | ├─IndexRangeScan_8(Build) | 0.00 | cop[tikv] | table:t6, index:idx2(a, cast(`k` as signed array), b) | range:[1 2 2,1 2 2], keep order:false, stats:pseudo | | └─Selection_10(Probe) | 0.00 | cop[tikv] | | eq(test.t6.a, 1), or(or(and(json_memberof(cast(1, json BINARY), test.t6.j), eq(test.t6.b, 1)), and(json_memberof(cast(1, json BINARY), test.t6.j), eq(test.t6.b, 2))), or(and(json_memberof(cast(2, json BINARY), test.t6.k), eq(test.t6.b, 1)), and(json_memberof(cast(2, json BINARY), test.t6.k), eq(test.t6.b, 2)))) | | └─TableRowIDScan_9 | 0.00 | cop[tikv] | table:t6 | keep order:false, stats:pseudo | +-------------------------------+---------+-----------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Limited by the current implementation of multi-valued indexes, using use_index might return the Can't find a proper physical plan for this query error while using use_index_merge will not return such an error. Therefore, it is recommended to use use_index_merge if you want to use multi-valued indexes.

mysql> EXPLAIN SELECT /*+ use_index(t3, idx) */ * FROM t3 WHERE ((1 member of (j)) AND (2 member of (j))) OR ((3 member of (j)) AND (4 member of (j))); ERROR 1815 (HY000): Internal : Cant find a proper physical plan for this query mysql> EXPLAIN SELECT /*+ use_index_merge(t3, idx) */ * FROM t3 WHERE ((1 member of (j)) AND (2 member of (j))) OR ((3 member of (j)) AND (4 member of (j))); +-------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Selection_5 | 8000.00 | root | | or(and(json_memberof(cast(1, json BINARY), test.t3.j), json_memberof(cast(2, json BINARY), test.t3.j)), and(json_memberof(cast(3, json BINARY), test.t3.j), json_memberof(cast(4, json BINARY), test.t3.j))) | | └─TableReader_7 | 10000.00 | root | | data:TableFullScan_6 | | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t3 | keep order:false, stats:pseudo | +-------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 3 rows in set, 2 warnings (0.00 sec)

Multi-valued indexes and plan cache

A query plan that uses member of to choose multi-valued indexes can be cached. A query plan that uses the JSON_CONTAINS() or JSON_OVERLAPS() function to choose multi-valued indexes cannot be cached.

The following are some examples that query plans can be cached:

mysql> CREATE TABLE t5 (j1 JSON, j2 JSON, INDEX idx1((CAST(j1 AS SIGNED ARRAY)))); Query OK, 0 rows affected (0.04 sec) mysql> PREPARE st FROM 'SELECT /*+ use_index(t5, idx1) */ * FROM t5 WHERE (? member of (j1))'; Query OK, 0 rows affected (0.00 sec) mysql> SET @a=1; Query OK, 0 rows affected (0.00 sec) mysql> EXECUTE st USING @a; Empty set (0.01 sec) mysql> EXECUTE st USING @a; Empty set (0.00 sec) mysql> SELECT @@last_plan_from_cache; +------------------------+ | @@last_plan_from_cache | +------------------------+ | 1 | +------------------------+ 1 row in set (0.00 sec) mysql> PREPARE st FROM 'SELECT /*+ use_index(t5, idx1) */ * FROM t5 WHERE (? member of (j1)) AND JSON_CONTAINS(j2, ?)'; Query OK, 0 rows affected (0.00 sec) mysql> SET @a=1, @b='[1,2]'; Query OK, 0 rows affected (0.00 sec) mysql> EXECUTE st USING @a, @b; Empty set (0.00 sec) mysql> EXECUTE st USING @a, @b; Empty set (0.00 sec) mysql> SELECT @@LAST_PLAN_FROM_CACHE; -- can hit plan cache if the JSON_CONTAINS doesn't impact index selection +------------------------+ | @@LAST_PLAN_FROM_CACHE | +------------------------+ | 1 | +------------------------+ 1 row in set (0.00 sec)

The following are some examples that query plans cannot be cached:

mysql> PREPARE st2 FROM 'SELECT /*+ use_index(t5, idx1) */ * FROM t5 WHERE JSON_CONTAINS(j1, ?)'; Query OK, 0 rows affected (0.00 sec) mysql> SET @a='[1,2]'; Query OK, 0 rows affected (0.01 sec) mysql> EXECUTE st2 USING @a; Empty set, 1 warning (0.00 sec) mysql> SHOW WARNINGS; -- cannot hit plan cache since the JSON_CONTAINS predicate might affect index selection +---------+------+-------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+-------------------------------------------------------------------------------------------------------+ | Warning | 1105 | skip prepared plan-cache: json_contains function with immutable parameters can affect index selection | +---------+------+-------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec)