Eliminate Max/Min
When a SQL statement contains max
/min
functions, the query optimizer tries to convert the max
/min
aggregate functions to the TopN operator by applying the max
/min
optimization rule. In this way, TiDB can perform the query more efficiently through indexes.
This optimization rule is divided into the following two types according to the number of max
/min
functions in the select
statement:
One max
/min
function
When a SQL statement meets the following conditions, this rule is applied:
- The statement contains only one aggregate function, which is
max
ormin
. - The aggregate function has no related
group by
clause.
For example:
select max(a) from t
The optimization rule rewrites the statement as follows:
select max(a) from (select a from t where a is not null order by a desc limit 1) t
When column a
has an index, or when column a
is the prefix of some composite index, with the help of index, the new SQL statement can find the maximum or minimum value by scanning only one row of data. This optimization avoids full table scan.
The example statement has the following execution plan:
mysql> explain select max(a) from t;
+------------------------------+---------+-----------+-------------------------+-------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+---------+-----------+-------------------------+-------------------------------------+
| StreamAgg_13 | 1.00 | root | | funcs:max(test.t.a)->Column#4 |
| └─Limit_17 | 1.00 | root | | offset:0, count:1 |
| └─IndexReader_27 | 1.00 | root | | index:Limit_26 |
| └─Limit_26 | 1.00 | cop[tikv] | | offset:0, count:1 |
| └─IndexFullScan_25 | 1.00 | cop[tikv] | table:t, index:idx_a(a) | keep order:true, desc, stats:pseudo |
+------------------------------+---------+-----------+-------------------------+-------------------------------------+
5 rows in set (0.00 sec)
Multiple max
/min
functions
When a SQL statement meets the following conditions, this rule is applied:
- The statement contains multiple aggregate functions, which are all
max
ormin
functions. - None of the aggregate functions has a related
group by
clause. - The columns in each
max
/min
function has indexes to preserve the order.
For example:
select max(a) - min(a) from t
The optimization rule first checks whether column a
has an index to preserve its order. If yes, the SQL statement is rewritten as the Cartesian product of two subqueries:
select max_a - min_a
from
(select max(a) as max_a from t) t1,
(select min(a) as min_a from t) t2
Through the rewrite, the optimizer can apply the rule for statements with only one max
/min
function to the two subqueries respectively. The statement is then rewritten as follows:
select max_a - min_a
from
(select max(a) as max_a from (select a from t where a is not null order by a desc limit 1) t) t1,
(select min(a) as min_a from (select a from t where a is not null order by a asc limit 1) t) t2
Similarly, if column a
has an index to preserve its order, the optimized execution only scans two rows of data instead of the whole table. However, if column a
does not have an index to preserve its order, this rule results in two full table scans, but the execution only needs one full table scan if it is not rewritten. Therefore, in such cases, this rule is not applied.
The final execution plan is as follows:
mysql> explain select max(a)-min(a) from t;
+------------------------------------+---------+-----------+-------------------------+-------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------------+---------+-----------+-------------------------+-------------------------------------+
| Projection_17 | 1.00 | root | | minus(Column#4, Column#5)->Column#6 |
| └─HashJoin_18 | 1.00 | root | | CARTESIAN inner join |
| ├─StreamAgg_45(Build) | 1.00 | root | | funcs:min(test.t.a)->Column#5 |
| │ └─Limit_49 | 1.00 | root | | offset:0, count:1 |
| │ └─IndexReader_59 | 1.00 | root | | index:Limit_58 |
| │ └─Limit_58 | 1.00 | cop[tikv] | | offset:0, count:1 |
| │ └─IndexFullScan_57 | 1.00 | cop[tikv] | table:t, index:idx_a(a) | keep order:true, stats:pseudo |
| └─StreamAgg_24(Probe) | 1.00 | root | | funcs:max(test.t.a)->Column#4 |
| └─Limit_28 | 1.00 | root | | offset:0, count:1 |
| └─IndexReader_38 | 1.00 | root | | index:Limit_37 |
| └─Limit_37 | 1.00 | cop[tikv] | | offset:0, count:1 |
| └─IndexFullScan_36 | 1.00 | cop[tikv] | table:t, index:idx_a(a) | keep order:true, desc, stats:pseudo |
+------------------------------------+---------+-----------+-------------------------+-------------------------------------+
12 rows in set (0.01 sec)