SQL Plan Management (SPM)

SQL Plan Management is a set of functions that execute SQL bindings to manually interfere with SQL execution plans. These functions include SQL binding, baseline capturing, and baseline evolution.

SQL binding

An SQL binding is the basis of SPM. The Optimizer Hints document introduces how to select a specific execution plan using hints. However, sometimes you need to interfere with execution selection without modifying SQL statements. With SQL bindings, you can select a specified execution plan without modifying SQL statements.

Create a binding

CREATE [GLOBAL | SESSION] BINDING FOR BindableStmt USING BindableStmt

This statement binds SQL execution plans at the GLOBAL or SESSION level. Currently, supported bindable SQL statements (BindableStmt) in TiDB include SELECT, DELETE, UPDATE, and INSERT / REPLACE with SELECT subqueries.

Specifically, two types of these statements cannot be bound to execution plans due to syntax conflicts. See the following examples:

-- Type one: Statements that get the Cartesian product by using the `join` keyword and not specifying the associated columns with the `using` keyword. create global binding for select * from t t1 join t t2 using select * from t t1 join t t2; -- Type two: `DELETE` statements that contain the `using` keyword. create global binding for delete from t1 using t1 join t2 on t1.a = t2.a using delete from t1 using t1 join t2 on t1.a = t2.a;

You can bypass syntax conflicts by using equivalent statements. For example, you can rewrite the above statements in the following ways:

-- First rewrite of type one statements: Add a `using` clause for the `join` keyword. create global binding for select * from t t1 join t t2 using (a) using select * from t t1 join t t2 using (a); -- Second rewrite of type one statements: Delete the `join` keyword. create global binding for select * from t t1, t t2 using select * from t t1, t t2; -- Rewrite of type two statements: Remove the `using` keyword from the `delete` statement. create global binding for delete t1 from t1 join t2 on t1.a = t2.a using delete t1 from t1 join t2 on t1.a = t2.a;

Here are two examples:

-- The hint takes effect in the following statement. create global binding for insert into t1 select * from t2 where a > 1 and b = 1 using insert into t1 select /*+ use_index(@sel_1 t2, a) */ * from t2 where a > 1 and b = 1; -- The hint cannot take effect in the following statement. create global binding for insert into t1 select * from t2 where a > 1 and b = 1 using insert /*+ use_index(@sel_1 t2, a) */ into t1 select * from t2 where a > 1 and b = 1;

If you do not specify the scope when creating an execution plan binding, the default scope is SESSION. The TiDB optimizer normalizes bound SQL statements and stores them in the system table. When processing SQL queries, if a normalized statement matches one of the bound SQL statements in the system table and the system variable tidb_use_plan_baselines is set to on (the default value is on), TiDB then uses the corresponding optimizer hint for this statement. If there are multiple matchable execution plans, the optimizer chooses the least costly one to bind.

Normalization is a process that converts a constant in an SQL statement to a variable parameter and explicitly specifies the database for tables referenced in the query, with standardized processing on the spaces and line breaks in the SQL statement. See the following example:

select * from t where a > 1 -- After normalization, the above statement is as follows: select * from test . t where a > ?

When a SQL statement has bound execution plans in both GLOBAL and SESSION scopes, because the optimizer ignores the bound execution plan in the GLOBAL scope when it encounters the SESSION binding, the bound execution plan of this statement in the SESSION scope shields the execution plan in the GLOBAL scope.

For example:

-- Creates a GLOBAL binding and specifies using `sort merge join` in this binding. create global binding for select * from t1, t2 where t1.id = t2.id using select /*+ merge_join(t1, t2) */ * from t1, t2 where t1.id = t2.id; -- The execution plan of this SQL statement uses the `sort merge join` specified in the GLOBAL binding. explain select * from t1, t2 where t1.id = t2.id; -- Creates another SESSION binding and specifies using `hash join` in this binding. create binding for select * from t1, t2 where t1.id = t2.id using select /*+ hash_join(t1, t2) */ * from t1, t2 where t1.id = t2.id; -- In the execution plan of this statement, `hash join` specified in the SESSION binding is used, instead of `sort merge join` specified in the GLOBAL binding. explain select * from t1, t2 where t1.id = t2.id;

When the first select statement is being executed, the optimizer adds the sm_join(t1, t2) hint to the statement through the binding in the GLOBAL scope. The top node of the execution plan in the explain result is MergeJoin. When the second select statement is being executed, the optimizer uses the binding in the SESSION scope instead of the binding in the GLOBAL scope and adds the hash_join(t1, t2) hint to the statement. The top node of the execution plan in the explain result is HashJoin.

Each standardized SQL statement can have only one binding created using CREATE BINDING at a time. When multiple bindings are created for the same standardized SQL statement, the last created binding is retained, and all previous bindings (created and evolved) are marked as deleted. But session bindings and global bindings can coexist and are not affected by this logic.

In addition, when you create a binding, TiDB requires that the session is in a database context, which means that a database is specified when the client is connected or use ${database} is executed.

The original SQL statement and the bound statement must have the same text after normalization and hint removal, or the binding will fail. Take the following examples:

  • This binding can be created successfully because the texts before and after parameterization and hint removal are the same: select * from test . t where a > ?

    CREATE BINDING FOR SELECT * FROM t WHERE a > 1 USING SELECT * FROM t use index (idx) WHERE a > 2
  • This binding will fail because the original SQL statement is processed as select * from test . t where a > ?, while the bound SQL statement is processed differently as select * from test . t where b > ?.

    CREATE BINDING FOR SELECT * FROM t WHERE a > 1 USING SELECT * FROM t use index(idx) WHERE b > 2

Remove binding

DROP [GLOBAL | SESSION] BINDING FOR BindableStmt;

This statement removes a specified execution plan binding at the GLOBAL or SESSION level. The default scope is SESSION.

Generally, the binding in the SESSION scope is mainly used for test or in special situations. For a binding to take effect in all TiDB processes, you need to use the GLOBAL binding. A created SESSION binding shields the corresponding GLOBAL binding until the end of the SESSION, even if the SESSION binding is dropped before the session closes. In this case, no binding takes effect and the plan is selected by the optimizer.

The following example is based on the example in create binding in which the SESSION binding shields the GLOBAL binding:

-- Drops the binding created in the SESSION scope. drop session binding for select * from t1, t2 where t1.id = t2.id; -- Views the SQL execution plan again. explain select * from t1,t2 where t1.id = t2.id;

In the example above, the dropped binding in the SESSION scope shields the corresponding binding in the GLOBAL scope. The optimizer does not add the sm_join(t1, t2) hint to the statement. The top node of the execution plan in the explain result is not fixed to MergeJoin by this hint. Instead, the top node is independently selected by the optimizer according to the cost estimation.

View binding

SHOW [GLOBAL | SESSION] BINDINGS [ShowLikeOrWhere]

This statement outputs the execution plan bindings at the GLOBAL or SESSION level. The default scope is SESSION. Currently SHOW BINDINGS outputs eight columns, as shown below:

Column NameNote
original_sqlOriginal SQL statement after parameterization
bind_sqlBound SQL statement with hints
default_dbDefault database
statusStatus including using, deleted, invalid, rejected, and pending verify
create_timeCreating time
update_timeUpdating time
charsetCharacter set
collationOrdering rule
sourceThe way in which a binding is created, including manual (created by the create [global] binding SQL statement), capture (captured automatically by TiDB), and evolve (evolved automatically by TiDB)

Baseline capturing

To enable baseline capturing, set tidb_capture_plan_baselines to on. The default value is off.

After automatic binding creation is enabled, the historical SQL statements in the Statement Summary are traversed every bind-info-lease (the default value is 3s), and a binding is automatically created for SQL statements that appear at least twice. For these SQL statements, TiDB automatically binds the execution plan recorded in Statement Summary.

However, TiDB does not automatically capture bindings for the following types of SQL statements:

  • EXPLAIN and EXPLAIN ANALYZE statements.
  • SQL statements executed internally in TiDB, such as SELECT queries used for automatically loading statistical information.
  • SQL statements that are bound to a manually created execution plan.

For PREPARE / EXECUTE statements and for queries executed with binary protocols, TiDB automatically captures bindings for the real query statements, not for the PREPARE / EXECUTE statements.

Baseline evolution

Baseline evolution is an important feature of SPM introduced in TiDB v4.0.

As data updates, the previously bound execution plan might no longer be optimal. The baseline evolution feature can automatically optimize the bound execution plan.

In addition, baseline evolution, to a certain extent, can also avoid the jitter brought to the execution plan caused by the change of statistical information.

Usage

Use the following statement to enable automatic binding evolution:

set global tidb_evolve_plan_baselines = on;

The default value of tidb_evolve_plan_baselines is off.

After the automatic binding evolution feature is enabled, if the optimal execution plan selected by the optimizer is not among the binding execution plans, the optimizer marks the plan as an execution plan that waits for verification. At every bind-info-lease (the default value is 3s) interval, an execution plan to be verified is selected and compared with the binding execution plan that has the least cost in terms of the actual execution time. If the plan to be verified has shorter execution time (the current criterion for the comparison is that the execution time of the plan to be verified is no longer than 2/3 that of the binding execution plan), this plan is marked as a usable binding. The following example describes the process above.

Assume that table t is defined as follows:

create table t(a int, b int, key(a), key(b));

Perform the following query on table t:

select * from t where a < 100 and b < 100;

In the table defined above, few rows meet the a < 100 condition. But for some reason, the optimizer mistakenly selects the full table scan instead of the optimal execution plan that uses index a. You can first use the following statement to create a binding:

create global binding for select * from t where a < 100 and b < 100 using select * from t use index(a) where a < 100 and b < 100;

When the query above is executed again, the optimizer selects index a (influenced by the binding created above) to reduce the query time.

Assuming that as insertions and deletions are performed on table t, an increasing number of rows meet the a < 100 condition and a decreasing number of rows meet the b < 100 condition. At this time, using index a under the binding might no longer be the optimal plan.

The binding evolution can address this kind of issues. When the optimizer recognizes data change in a table, it generates an execution plan for the query that uses index b. However, because the binding of the current plan exists, this query plan is not adopted and executed. Instead, this plan is stored in the backend evolution list. During the evolution process, if this plan is verified to have an obviously shorter execution time than that of the current execution plan that uses index a, index b is added into the available binding list. After this, when the query is executed again, the optimizer first generates the execution plan that uses index b and makes sure that this plan is in the binding list. Then the optimizer adopts and executes this plan to reduce the query time after data changes.

To reduce the impact that the automatic evolution has on clusters, use the following configurations:

  • Set tidb_evolve_plan_task_max_time to limit the maximum execution time of each execution plan. The default value is 600s. In the actual verification process, the maximum execution time is also limited to no more than twice the time of the verified execution plan.
  • Set tidb_evolve_plan_task_start_time (00:00 +0000 by default) and tidb_evolve_plan_task_end_time (23:59 +0000 by default) to limit the time window.

Notes

Because the baseline evolution automatically creates a new binding, when the query environment changes, the automatically created binding might have multiple behavior choices. Pay attention to the following notes:

  • Baseline evolution only evolves standardized SQL statements that have at least one global binding.

  • Because creating a new binding deletes all previous bindings (for a standardized SQL statement), the automatically evolved binding will be deleted after manually creating a new binding.

  • All hints related to the calculation process are retained during the evolution. These hints are as follows:

    HintDescription
    memory_quotaThe maximum memory that can be used for a query.
    use_tojaWhether the optimizer transforms sub-queries to Join.
    use_cascadesWhether to use the cascades optimizer.
    no_index_mergeWhether the optimizer uses Index Merge as an option for reading tables.
    read_consistent_replicaWhether to forcibly enable Follower Read when reading tables.
    max_execution_timeThe longest duration for a query.
  • read_from_storage is a special hint in that it specifies whether to read data from TiKV or from TiFlash when reading tables. Because TiDB provides isolation reads, when the isolation condition changes, this hint has a great influence on the evolved execution plan. Therefore, when this hint exists in the initially created binding, TiDB ignores all its evolved bindings.