分区裁剪

分区裁剪是只有当目标表为分区表时,才可以进行的一种优化方式。分区裁剪通过分析查询语句中的过滤条件,只选择可能满足条件的分区,不扫描匹配不上的分区,进而显著地减少计算的数据量。

例如:

CREATE TABLE t1 ( id INT NOT NULL PRIMARY KEY, pad VARCHAR(100) ) PARTITION BY RANGE COLUMNS(id) ( PARTITION p0 VALUES LESS THAN (100), PARTITION p1 VALUES LESS THAN (200), PARTITION p2 VALUES LESS THAN (MAXVALUE) ); INSERT INTO t1 VALUES (1, 'test1'),(101, 'test2'), (201, 'test3'); EXPLAIN SELECT * FROM t1 WHERE id BETWEEN 80 AND 120;
+----------------------------+---------+-----------+------------------------+------------------------------------------------+ | id | estRows | task | access object | operator info | +----------------------------+---------+-----------+------------------------+------------------------------------------------+ | PartitionUnion_8 | 80.00 | root | | | | ├─TableReader_10 | 40.00 | root | | data:TableRangeScan_9 | | │ └─TableRangeScan_9 | 40.00 | cop[tikv] | table:t1, partition:p0 | range:[80,120], keep order:false, stats:pseudo | | └─TableReader_12 | 40.00 | root | | data:TableRangeScan_11 | | └─TableRangeScan_11 | 40.00 | cop[tikv] | table:t1, partition:p1 | range:[80,120], keep order:false, stats:pseudo | +----------------------------+---------+-----------+------------------------+------------------------------------------------+ 5 rows in set (0.00 sec)

分区裁剪的使用场景

分区表有 Range 分区和 hash 分区两种形式,分区裁剪对两种分区表也有不同的使用场景。

分区裁剪在 Hash 分区表上的应用

Hash 分区表上可以使用分区裁剪的场景

只有等值比较的查询条件能够支持 Hash 分区表的裁剪。

create table t (x int) partition by hash(x) partitions 4; explain select * from t where x = 1;
+-------------------------+----------+-----------+-----------------------+--------------------------------+ | id | estRows | task | access object | operator info | +-------------------------+----------+-----------+-----------------------+--------------------------------+ | TableReader_8 | 10.00 | root | | data:Selection_7 | | └─Selection_7 | 10.00 | cop[tikv] | | eq(test.t.x, 1) | | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo | +-------------------------+----------+-----------+-----------------------+--------------------------------+

在这条 SQL 中,由条件 x = 1 可以知道所有结果均在一个分区上。数值 1 在经过 Hash 后,可以确定其在分区 p1 中。因此只需要扫描分区 p1,而无需访问一定不会出现相关结果的 p2p3p4 分区。从执行计划来看,其中只出现了一个 TableFullScan 算子,且在 access object 中指定了 p1 分区,确认 partition pruning 生效了。

Hash 分区表上不能使用分区裁剪的场景

场景一

不能确定查询结果只在一个分区上的条件:如 in, between, > < >= <= 等查询条件,不能使用分区裁剪的优化。

create table t (x int) partition by hash(x) partitions 4; explain select * from t where x > 2;
+------------------------------+----------+-----------+-----------------------+--------------------------------+ | id | estRows | task | access object | operator info | +------------------------------+----------+-----------+-----------------------+--------------------------------+ | Union_10 | 13333.33 | root | | | | ├─TableReader_13 | 3333.33 | root | | data:Selection_12 | | │ └─Selection_12 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) | | │ └─TableFullScan_11 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo | | ├─TableReader_16 | 3333.33 | root | | data:Selection_15 | | │ └─Selection_15 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) | | │ └─TableFullScan_14 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo | | ├─TableReader_19 | 3333.33 | root | | data:Selection_18 | | │ └─Selection_18 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) | | │ └─TableFullScan_17 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo | | └─TableReader_22 | 3333.33 | root | | data:Selection_21 | | └─Selection_21 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) | | └─TableFullScan_20 | 10000.00 | cop[tikv] | table:t, partition:p3 | keep order:false, stats:pseudo | +------------------------------+----------+-----------+-----------------------+--------------------------------+

在这条 SQL 中,x > 2 条件无法确定对应的 Hash Partition,所以不能使用分区裁剪。

场景二

由于分区裁剪的规则优化是在查询计划的生成阶段,对于执行阶段才能获取到过滤条件的场景,无法利用分区裁剪的优化。

create table t (x int) partition by hash(x) partitions 4; explain select * from t2 where x = (select * from t1 where t2.x = t1.x and t2.x < 2);
+--------------------------------------+----------+-----------+------------------------+----------------------------------------------+ | id | estRows | task | access object | operator info | +--------------------------------------+----------+-----------+------------------------+----------------------------------------------+ | Projection_13 | 9990.00 | root | | test.t2.x | | └─Apply_15 | 9990.00 | root | | inner join, equal:[eq(test.t2.x, test.t1.x)] | | ├─TableReader_18(Build) | 9990.00 | root | | data:Selection_17 | | │ └─Selection_17 | 9990.00 | cop[tikv] | | not(isnull(test.t2.x)) | | │ └─TableFullScan_16 | 10000.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo | | └─Selection_19(Probe) | 0.80 | root | | not(isnull(test.t1.x)) | | └─MaxOneRow_20 | 1.00 | root | | | | └─Union_21 | 2.00 | root | | | | ├─TableReader_24 | 2.00 | root | | data:Selection_23 | | │ └─Selection_23 | 2.00 | cop[tikv] | | eq(test.t2.x, test.t1.x), lt(test.t2.x, 2) | | │ └─TableFullScan_22 | 2500.00 | cop[tikv] | table:t1, partition:p0 | keep order:false, stats:pseudo | | └─TableReader_27 | 2.00 | root | | data:Selection_26 | | └─Selection_26 | 2.00 | cop[tikv] | | eq(test.t2.x, test.t1.x), lt(test.t2.x, 2) | | └─TableFullScan_25 | 2500.00 | cop[tikv] | table:t1, partition:p1 | keep order:false, stats:pseudo | +--------------------------------------+----------+-----------+------------------------+----------------------------------------------+

这个查询每从 t2 读取一行,都会去分区表 t1 上进行查询,理论上这时会满足 t1.x = val 的过滤条件,但实际上由于分区裁剪只作用于查询计划生成阶段,而不是执行阶段,因而不会做裁剪。

分区裁剪在 Range 分区表上的应用

Range 分区表上可以使用分区裁剪的场景

场景一

等值比较的查询条件可以使用分区裁剪。

create table t (x int) partition by range (x) ( partition p0 values less than (5), partition p1 values less than (10), partition p2 values less than (15) ); explain select * from t where x = 3;
+-------------------------+----------+-----------+-----------------------+--------------------------------+ | id | estRows | task | access object | operator info | +-------------------------+----------+-----------+-----------------------+--------------------------------+ | TableReader_8 | 10.00 | root | | data:Selection_7 | | └─Selection_7 | 10.00 | cop[tikv] | | eq(test.t.x, 3) | | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo | +-------------------------+----------+-----------+-----------------------+--------------------------------+

使用 in 条件的等值比较查询条件也可以使用分区裁剪。

create table t (x int) partition by range (x) ( partition p0 values less than (5), partition p1 values less than (10), partition p2 values less than (15) ); explain select * from t where x in(1,13);
+-----------------------------+----------+-----------+-----------------------+--------------------------------+ | id | estRows | task | access object | operator info | +-----------------------------+----------+-----------+-----------------------+--------------------------------+ | Union_8 | 40.00 | root | | | | ├─TableReader_11 | 20.00 | root | | data:Selection_10 | | │ └─Selection_10 | 20.00 | cop[tikv] | | in(test.t.x, 1, 13) | | │ └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo | | └─TableReader_14 | 20.00 | root | | data:Selection_13 | | └─Selection_13 | 20.00 | cop[tikv] | | in(test.t.x, 1, 13) | | └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo | +-----------------------------+----------+-----------+-----------------------+--------------------------------+

在这条 SQL 中,由条件 x in(1,13) 可以知道所有结果只会分布在几个分区上。经过分析,发现所有 x = 1 的记录都在分区 p0 上,所有 x = 13 的记录都在分区 p2 上,因此只需要访问 p0p2 这两个分区,

场景二

区间比较的查询条件如 between, > < = >= <= 可以使用分区裁剪。

create table t (x int) partition by range (x) ( partition p0 values less than (5), partition p1 values less than (10), partition p2 values less than (15) ); explain select * from t where x between 7 and 14;
+-----------------------------+----------+-----------+-----------------------+-----------------------------------+ | id | estRows | task | access object | operator info | +-----------------------------+----------+-----------+-----------------------+-----------------------------------+ | Union_8 | 500.00 | root | | | | ├─TableReader_11 | 250.00 | root | | data:Selection_10 | | │ └─Selection_10 | 250.00 | cop[tikv] | | ge(test.t.x, 7), le(test.t.x, 14) | | │ └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo | | └─TableReader_14 | 250.00 | root | | data:Selection_13 | | └─Selection_13 | 250.00 | cop[tikv] | | ge(test.t.x, 7), le(test.t.x, 14) | | └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo | +-----------------------------+----------+-----------+-----------------------+-----------------------------------+
场景三

分区表达式为 fn(col) 的简单形式,查询条件是 > < = >= <= 之一,且 fn 是单调函数,可以使用分区裁剪。

关于 fn 函数,对于任意 x y,如果 x > y,则 fn(x) > fn(y),那么这种是严格递增的单调函数。非严格递增的单调函数也可以符合分区裁剪要求,只要函数 fn 满足:对于任意 x y,如果 x > y,则 fn(x) >= fn(y)。理论上,所有满足单调条件(严格或者非严格)的函数都支持分区裁剪。目前,TiDB 支持的单调函数如下:

unix_timestamp to_days

例如,分区表达式是 fn(col) 形式,fn 为我们支持的单调函数 to_days,就可以使用分区裁剪:

create table t (id datetime) partition by range (to_days(id)) ( partition p0 values less than (to_days('2020-04-01')), partition p1 values less than (to_days('2020-05-01'))); explain select * from t where id > '2020-04-18';
+-------------------------+----------+-----------+-----------------------+-------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------+----------+-----------+-----------------------+-------------------------------------------+ | TableReader_8 | 3333.33 | root | | data:Selection_7 | | └─Selection_7 | 3333.33 | cop[tikv] | | gt(test.t.id, 2020-04-18 00:00:00.000000) | | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo | +-------------------------+----------+-----------+-----------------------+-------------------------------------------+

Range 分区表上不能使用分区裁剪的场景

由于分区裁剪的规则优化是在查询计划的生成阶段,对于执行阶段才能获取到过滤条件的场景,无法利用分区裁剪的优化。

create table t1 (x int) partition by range (x) ( partition p0 values less than (5), partition p1 values less than (10)); create table t2 (x int); explain select * from t2 where x < (select * from t1 where t2.x < t1.x and t2.x < 2);
+--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+ | id | estRows | task | access object | operator info | +--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+ | Projection_13 | 9990.00 | root | | test.t2.x | | └─Apply_15 | 9990.00 | root | | CARTESIAN inner join, other cond:lt(test.t2.x, test.t1.x) | | ├─TableReader_18(Build) | 9990.00 | root | | data:Selection_17 | | │ └─Selection_17 | 9990.00 | cop[tikv] | | not(isnull(test.t2.x)) | | │ └─TableFullScan_16 | 10000.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo | | └─Selection_19(Probe) | 0.80 | root | | not(isnull(test.t1.x)) | | └─MaxOneRow_20 | 1.00 | root | | | | └─Union_21 | 2.00 | root | | | | ├─TableReader_24 | 2.00 | root | | data:Selection_23 | | │ └─Selection_23 | 2.00 | cop[tikv] | | lt(test.t2.x, 2), lt(test.t2.x, test.t1.x) | | │ └─TableFullScan_22 | 2.50 | cop[tikv] | table:t1, partition:p0 | keep order:false, stats:pseudo | | └─TableReader_27 | 2.00 | root | | data:Selection_26 | | └─Selection_26 | 2.00 | cop[tikv] | | lt(test.t2.x, 2), lt(test.t2.x, test.t1.x) | | └─TableFullScan_25 | 2.50 | cop[tikv] | table:t1, partition:p1 | keep order:false, stats:pseudo | +--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+ 14 rows in set (0.00 sec)

这个查询每从 t2 读取一行,都会去分区表 t1 上进行查询,理论上这时会满足 t1.x > val 的过滤条件,但实际上由于分区裁剪只作用于查询计划生成阶段,而不是执行阶段,因而不会做裁剪。