Merge and Migrate Data from Sharded Tables in Optimistic Mode
This document introduces the sharding support feature provided by Data Migration (DM) in the optimistic mode. This feature allows you to merge and migrate the data of tables with the same or different table schema(s) in the upstream MySQL or MariaDB instances into one same table in the downstream TiDB.
DM supports executing DDL statements on sharded tables online, which is called sharding DDL, and uses the "pessimistic mode" by default. In this mode, when a DDL statement is executed in an upstream sharded table, data migration of this table is paused until the same DDL statement is executed in all other sharded tables. Only by then this DDL statement is executed in the downstream and data migration resumes.
The pessimistic mode guarantees that the data migrated to the downstream is always correct, but it pauses the data migration, which is bad for making A/B changes in the upstream. In some cases, users might spend a long time executing DDL statements in a single sharded table and change the schemas of other sharded tables only after a period of validation. In the pessimistic mode, these DDL statements block data migration and cause many binlog events to pile up.
Therefore, an "optimistic mode" is needed. In this mode, a DDL statement executed on a sharded table is automatically converted to a statement that is compatible with other sharded tables, and then immediately migrated to the downstream. In this way, the DDL statement does not block any sharded table from executing DML migration.
Configuration of the optimistic mode
To use the optimistic mode, specify the
shard-mode item in the task configuration file as
optimistic. For the detailed sample configuration file, see DM Advanced Task Configuration File.
It takes some risks to use the optimistic mode. Follow these rules when you use it:
Ensure that the schema of every sharded table is consistent with each other before and after you execute a batch of DDL statements.
If you perform an A/B test, perform the test ONLY on one sharded table.
After the A/B test is finished, migrate only the most direct DDL statement(s) to the final schema. Do not re-execute every right or wrong step of the test.
For example, if you have executed
ADD COLUMN A INT; DROP COLUMN A; ADD COLUMN A FLOAT;in a sharded table, you only need to execute
ADD COLUMN A FLOATin other sharded tables. You do not need to executed all of the three DDL statements again.
Observe the status of the DM migration when executing the DDL statement. When an error is reported, you need to determine whether this batch of DDL statements will cause data inconsistency.
Currently, the following statements are not supported in the optimistic mode:
ALTER TABLE table_name ADD COLUMN column_name datatype NOT NULL(To add a
NOT NULLcolumn without a default value).
ALTER TABLE table_name ADD COLUMN column_name datetime DEFAULT NOW()(To add a column with a varying value).
ALTER TABLE table_name ADD COLUMN col1 INT, DROP COLUMN col2(Contains both
DROP COLUMNin one DDL statement).
ALTER TABLE table_name RENAME COLUMN column_1 TO column_2;(To rename a column).
ALTER TABLE table_name RENAME INDEX index_1 TO index_2;(To rename an index).
In addition, the following restrictions apply to both the optimistic mode and the pessimistic mode:
- In an incremental replication task, ensure that each sharded table's schema that corresponds to the binlog position at the start of the task is consistent with each other.
- The new table added to a sharding group must have a consistent table schema with that of other members. The
CREATE/RENAME TABLEstatement is forbidden when a batch of DDL statements is being executed.
DROP DATABASEis not supported.
TRUNCATE TABLEis not supported.
- Each DDL statement must involve operations on only one table.
- The DDL statement that is not supported in TiDB is also not supported in DM.
- The default value of a newly added column must not contain
uuid(); otherwise, data inconsistency between the upstream and the downstream might occur.
When you use the optimistic mode for a migration task, a DDL statement is migrated to the downstream immediately. If this mode is misused, data inconsistency between the upstream and the downstream might occur.
Operations that cause data inconsistency
The schema of each sharded table is incompatible with each other. For example:
- Two columns of the same name are added to two sharded tables respectively, but the columns are of different types.
- Two columns of the same name are added to two sharded tables respectively, but the columns have different default values.
- Two generated columns of the same name are added to two sharded tables respectively, but the columns are generated using different expressions.
- Two indexes of the same name are added to two sharded tables respectively, but the keys are different.
- Other different table schemas with the same name.
Execute the DDL statement that can corrupt data in the sharded table and then try to roll back.
For example, drop a column
Xand then add this column back.
Merge and migrate the following three sharded tables to TiDB:
Add a new column
tbl01 and set the default value of the column to
ALTER TABLE `tbl01` ADD COLUMN `Age` INT DEFAULT 0;
Add a new column
tbl00 and set the default value of the column to
ALTER TABLE `tbl00` ADD COLUMN `Age` INT DEFAULT -1;
By then, the
Age column of
tbl00 is inconsistent because
DEFAULT 0 and
DEFAULT -1 are incompatible with each other. In this situation, DM will report the error, but you have to manually fix the data inconsistency.
In the optimistic mode, after DM-worker receives the DDL statement from the upstream, it forwards the updated table schema to DM-master. DM-worker tracks the current schema of each sharded table, and DM-master merges these schemas into a composite schema that is compatible with DML statements of every sharded table. Then DM-master migrates the corresponding DDL statement to the downstream. DML statements are directly migrated to the downstream.
Assume the upstream MySQL has three sharded tables (
tbl02). Merge and migrate these sharded tables to the
tbl table in the downstream TiDB. See the following image:
Level column in the upstream:
ALTER TABLE `tbl00` ADD COLUMN `Level` INT;
Then TiDB will receive the DML statement from
tbl00 (with the
Level column) and the DML statement from the
tbl02 tables (without the
The following DML statements can be migrated to the downstream without any modification:
UPDATE `tbl00` SET `Level` = 9 WHERE `ID` = 1; INSERT INTO `tbl02` (`ID`, `Name`) VALUES (27, 'Tony');
Also add a
Level column in
ALTER TABLE `tbl01` ADD COLUMN `Level` INT;
At this time, the downstream already have had the same
Level column, so DM-master performs no operation after comparing the table schemas.
Name column in
ALTER TABLE `tbl01` DROP COLUMN `Name`;
Then the downstream will receive the DML statements from
tbl02 with the
Name column, so this column is not immediately dropped.
In the same way, all DML statements can still be migrated to the downstream:
INSERT INTO `tbl01` (`ID`, `Level`) VALUES (15, 7); UPDATE `tbl00` SET `Level` = 5 WHERE `ID` = 5;
Level column in
ALTER TABLE `tbl02` ADD COLUMN `Level` INT;
By then, all sharded tables have the
Name columns in
ALTER TABLE `tbl00` DROP COLUMN `Name`; ALTER TABLE `tbl02` DROP COLUMN `Name`;
By then, the
Name columns are dropped from all sharded tables and can be safely dropped in the downstream:
ALTER TABLE `tbl` DROP COLUMN `Name`;