Optimizer Hints

TiDB supports optimizer hints, which are based on the comment-like syntax introduced in MySQL 5.7. For example, one of the common syntaxes is /*+ HINT_NAME([t1_name [, t2_name] ...]) */. Use of optimizer hints is recommended in cases where the TiDB optimizer selects a less optimal query plan.

Syntax

Optimizer hints are case insensitive and specified within /*+ ... */ comments following the SELECT, UPDATE or DELETE keyword in a SQL statement. Optimizer hints are not currently supported for INSERT statements.

Multiple hints can be specified by separating with commas. For example, the following query uses three different hints:

SELECT /*+ USE_INDEX(t1, idx1), HASH_AGG(), HASH_JOIN(t1) */ count(*) FROM t t1, t t2 WHERE t1.a = t2.b;

How optimizer hints affect query execution plans can be observed in the output of EXPLAIN and EXPLAIN ANALYZE.

An incorrect or incomplete hint will not result in a statement error. This is because hints are intended to have only a hint (suggestion) semantic to query execution. Similarly, TiDB will at most return a warning if a hint is not applicable.

Currently, TiDB supports two categories of hints, which are different in scope. The first category of hints takes effect in the scope of query blocks, such as /*+ HASH_AGG() */; the second category of hints takes effect in the whole query, such as /*+ MEMORY_QUOTA(1024 MB)*/.

Each query or sub-query in a statement corresponds to a different query block, and each query block has its own name. For example:

SELECT * FROM (SELECT * FROM t) t1, (SELECT * FROM t) t2;

The above query statement has three query blocks: the outermost SELECT corresponds to the first query block, whose name is sel_1; the two SELECT sub-queries correspond to the second and the third query block, whose names are sel_2 and sel_3, respectively. The sequence of the numbers is based on the appearance of SELECT from left to right. If you replace the first SELECT with DELETE or UPDATE, then the corresponding query block names are del_1 or upd_1.

Hints that take effect in query blocks

This category of hints can follow behind any SELECT, UPDATE or DELETE keywords. To control the effective scope of the hint, use the name of the query block in the hint. You can make the hint parameters clear by accurately identifying each table in the query (in case of duplicated table names or aliases). If no query block is specified in the hint, the hint takes effect in the current block by default.

For example:

SELECT /*+ HASH_JOIN(@sel_1 t1@sel_1, t3) */ * FROM (SELECT t1.a, t1.b FROM t t1, t t2 WHERE t1.a = t2.a) t1, t t3 WHERE t1.b = t3.b;

This hint takes effect in the sel_1 query block, and its parameters are the t1 and t3 tables in sel_1 (sel_2 also contains a t1 table).

As described above, you can specify the name of the query block in the hint in the following ways:

  • Set the query block name as the first parameter of the hint, and separate it from other parameters with a space. In addition to QB_NAME, all the hints listed in this section also have another optional hidden parameter @QB_NAME. By using this parameter, you can specify the effective scope of this hint.
  • Append @QB_NAME to a table name in the parameter to explicitly specify which query block this table belongs to.

QB_NAME

If the query statement is a complicated statement that includes multiple nested queries, the ID and name of a certain query block might be mistakenly identified. The hint QB_NAME can help us in this regard.

QB_NAME means Query Block Name. You can specify a new name to a query block. The specified QB_NAME and the previous default name are both valid. For example:

SELECT /*+ QB_NAME(QB1) */ * FROM (SELECT * FROM t) t1, (SELECT * FROM t) t2;

This hint specifies the outer SELECT query block's name to QB1, which makes QB1 and the default name sel_1 both valid for the query block.

SM_JOIN(t1_name [, tl_name ...])

The SM_JOIN(t1_name [, tl_name ...]) hint tells the optimizer to use the sort-merge join algorithm for the given table(s). Generally, this algorithm consumes less memory but takes longer processing time. If there is a very large data volume or insufficient system memory, it is recommended to use this hint. For example:

select /*+ SM_JOIN(t1, t2) */ * from t1,t2 where t1.id = t2.id;

INL_JOIN(t1_name [, tl_name ...])

The INL_JOIN(t1_name [, tl_name ...]) hint tells the optimizer to use the index nested loop join algorithm for the given table(s). This algorithm might consume less system resources and take shorter processing time in some scenarios and might produce an opposite result in other scenarios. If the result set is less than 10,000 rows after the outer table is filtered by the WHERE condition, it is recommended to use this hint. For example:

select /*+ INL_JOIN(t1, t2) */ * from t1,t2 where t1.id = t2.id;

The parameter(s) given in INL_JOIN() is the candidate table for the inner table when you create the query plan. For example, INL_JOIN(t1) means that TiDB only considers using t1 as the inner table to create a query plan. If the candidate table has an alias, you must use the alias as the parameter in INL_JOIN(); if it does not has an alias, use the table's original name as the parameter. For example, in the select /*+ INL_JOIN(t1) */ * from t t1, t t2 where t1.a = t2.b; query, you must use the t table's alias t1 or t2 rather than t as INL_JOIN()'s parameter.

HASH_JOIN(t1_name [, tl_name ...])

The HASH_JOIN(t1_name [, tl_name ...]) hint tells the optimizer to use the hash join algorithm for the given table(s). This algorithm allows the query to be executed concurrently with multiple threads, which achieves a higher processing speed but consumes more memory. For example:

select /*+ HASH_JOIN(t1, t2) */ * from t1,t2 where t1.id = t2.id;

HASH_AGG()

The HASH_AGG() hint tells the optimizer to use the hash aggregation algorithm in all the aggregate functions in the specified query block. This algorithm allows the query to be executed concurrently with multiple threads, which achieves a higher processing speed but consumes more memory. For example:

select /*+ HASH_AGG() */ count(*) from t1,t2 where t1.a > 10 group by t1.id;

STREAM_AGG()

The STREAM_AGG() hint tells the optimizer to use the stream aggregation algorithm in all the aggregate functions in the specified query block. Generally, this algorithm consumes less memory but takes longer processing time. If there is a very large data volume or insufficient system memory, it is recommended to use this hint. For example:

select /*+ STREAM_AGG() */ count(*) from t1,t2 where t1.a > 10 group by t1.id;

USE_INDEX(t1_name, idx1_name [, idx2_name ...])

The USE_INDEX(t1_name, idx1_name [, idx2_name ...]) hint tells the optimizer to use only the given index(es) for a specified t1_name table. For example, applying the following hint has the same effect as executing the select * from t t1 use index(idx1, idx2); statement.

SELECT /*+ USE_INDEX(t1, idx1, idx2) */ * FROM t1;

IGNORE_INDEX(t1_name, idx1_name [, idx2_name ...])

The IGNORE_INDEX(t1_name, idx1_name [, idx2_name ...]) hint tells the optimizer to ignore the given index(es) for a specified t1_name table. For example, applying the following hint has the same effect as executing the select * from t t1 ignore index(idx1, idx2); statement.

select /*+ IGNORE_INDEX(t1, idx1, idx2) */ * from t t1;

AGG_TO_COP()

The AGG_TO_COP() hint tells the optimizer to push down the aggregate operation in the specified query block to the coprocessor. If the optimizer does not push down some aggregate function that is suitable for pushdown, then it is recommended to use this hint. For example:

select /*+ AGG_TO_COP() */ sum(t1.a) from t t1;

READ_FROM_STORAGE(TIFLASH[t1_name [, tl_name ...]], TIKV[t2_name [, tl_name ...]])

The READ_FROM_STORAGE(TIFLASH[t1_name [, tl_name ...]], TIKV[t2_name [, tl_name ...]]) hint tells the optimizer to read specific table(s) from specific storage engine(s). Currently, this hint supports two storage engine parameters - TIKV and TIFLASH. For example:

select /*+ READ_FROM_STORAGE(TIFLASH[t1], TIKV[t2]) */ t1.a from t t1, t t2 where t1.a = t2.a;

USE_TOJA(boolean_value)

The boolean_value parameter can be TRUE or FALSE. The USE_TOJA(TRUE) hint enables the optimizer to convert an in condition (containing a sub-query) to join and aggregation operations. Comparatively, the USE_TOJA(FALSE) hint disables this feature.

For example, the following query will convert in (select t2.a from t2) subq to corresponding join and aggregation operations:

select /*+ USE_TOJA(TRUE) */ t1.a, t1.b from t1 where t1.a in (select t2.a from t2) subq;

In addition to this hint, setting the tidb_opt_insubq_to_join_and_agg system variable also controls whether to enable this feature.

MAX_EXECUTION_TIME(N)

The MAX_EXECUTION_TIME(N) hint places a limit N (a timeout value in milliseconds) on how long a statement is permitted to execute before the server terminates it. In the following hint, MAX_EXECUTION_TIME(1000) means that the timeout is 1000 milliseconds (that is, 1 second):

select /*+ MAX_EXECUTION_TIME(1000) */ * from t1 inner join t2 where t1.id = t2.id;

In addition to this hint, the global.max_execution_time system variable can also limit the execution time of a statement.

MEMORY_QUOTA(N)

The MEMORY_QUOTA(N) hint places a limit N (a threshold value in MB or GB) on how much memory a statement is permitted to use. When a statement's memory usage exceeds this limit, TiDB produces a log message based on the statement's over-limit behavior or just terminates it.

In the following hint, MEMORY_QUOTA(1024 MB) means that the memory usage is limited to 1024 MB:

select /*+ MEMORY_QUOTA(1024 MB) */ * from t;

In addition to this hint, the tidb_mem_quota_query system variable can also limit the memory usage of a statement.

READ_CONSISTENT_REPLICA()

The READ_CONSISTENT_REPLICA() hint enables the feature of reading consistent data from the TiKV follower node. For example:

select /*+ READ_CONSISTENT_REPLICA() */ * from t;

In addition to this hint, setting the tidb_replica_read environment variable to 'follower' or 'leader' also controls whether to enable this feature.