Handle Failed DDL Statements

This document introduces how to handle failed DDL statements when you're using the TiDB Data Migration (DM) tool to migrate data.

Currently, TiDB is not completely compatible with all MySQL syntax (see the DDL statements supported by TiDB). Therefore, when DM is migrating data from MySQL to TiDB and TiDB does not support the corresponding DDL statement, an error might occur and break the migration process. In this case, you can use the handle-error command of DM to resume the migration.

Restrictions

If it is unacceptable in the actual production environment that the failed DDL statement is skipped in the downstream TiDB and it cannot be replaced with other DDL statements, then do not use this command.

For example, DROP PRIMARY KEY. In this scenario, you can only create a new table in the downstream with the new table schema (after executing the DDL statement), and re-import all the data into this new table.

Supported scenarios

During the migration, the DDL statement unsupported by TiDB is executed in the upstream and migrated to the downstream, and as a result, the migration task gets interrupted.

  • If it is acceptable that this DDL statement is skipped in the downstream TiDB, then you can use handle-error <task-name> skip to skip migrating this DDL statement and resume the migration.
  • If it is acceptable that this DDL statement is replaced with other DDL statements, then you can use handle-error <task-name> replace to replace this DDL statement and resume the migration.

Command

When you use dmctl to manually handle the failed DDL statements, the commonly used commands include query-status and handle-error.

query-status

The query-status command is used to query the current status of items such as the subtask and the relay unit in each MySQL instance. For details, see query status.

handle-error

The handle-error command is used to handle the failed DDL statements.

Command usage

» handle-error -h
Usage: dmctl handle-error <task-name | task-file> [-s source ...] [-b binlog-pos] <skip/replace/revert> [replace-sql1;replace-sql2;] [flags] Flags: -b, --binlog-pos string position used to match binlog event if matched the handler-error operation will be applied. The format like "mysql-bin|000001.000003:3270" -h, --help help for handle-error Global Flags: -s, --source strings MySQL Source ID

Flags descriptions

  • task-name:

    • Non-flag parameter, string, required
    • task-name specifies the name of the task in which the presetted operation is going to be executed.
  • source:

    • Flag parameter, string, --source
    • source specifies the MySQL instance in which the preset operation is to be executed.
  • skip: Skip the error

  • replace: Replace the failed DDL statement

  • revert: Reset the previous skip/replace operation before the error occurs (only reset it when the previous skip/replace operation has not finally taken effect)

  • binlog-pos:

    • Flag parameter, string, --binlog-pos
    • If it is not specified, DM automatically handles the currently failed DDL statement.
    • If it is specified, the skip operation is executed when binlog-pos matches with the position of the binlog event. The format is binlog-filename:binlog-pos, for example, mysql-bin|000001.000003:3270.
    • After the migration returns an error, the binlog position can be obtained from position in startLocation returned by query-status. Before the migration returns an error, the binlog position can be obtained by using SHOW BINLOG EVENTS in the upstream MySQL instance.

Usage examples

Skip DDL if the migration gets interrupted

Non-shard-merge scenario

Assume that you need to migrate the upstream table db1.tbl1 to the downstream TiDB. The initial table schema is:

SHOW CREATE TABLE db1.tbl1;
+-------+--------------------------------------------------+ | Table | Create Table | +-------+--------------------------------------------------+ | tbl1 | CREATE TABLE `tbl1` ( `c1` int(11) NOT NULL, `c2` decimal(11,3) DEFAULT NULL, PRIMARY KEY (`c1`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+--------------------------------------------------+

Now, the following DDL statement is executed in the upstream to alter the table schema (namely, alter DECIMAL(11, 3) of c2 into DECIMAL(10, 3)):

ALTER TABLE db1.tbl1 CHANGE c2 c2 DECIMAL (10, 3);

Because this DDL statement is not supported by TiDB, the migration task of DM gets interrupted. Execute the query-status <task-name> command, and you can see the following error:

ERROR 8200 (HY000): Unsupported modify column: can't change decimal column precision

Assume that it is acceptable in the actual production environment that this DDL statement is not executed in the downstream TiDB (namely, the original table schema is retained). Then you can use handle-error <task-name> skip to skip this DDL statement to resume the migration. The procedures are as follows:

  1. Execute handle-error <task-name> skip to skip the currently failed DDL statement:

    » handle-error test skip
    { "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "source": "mysql-replica-01", "worker": "worker1" } ] }
  2. Execute query-status <task-name> to view the task status:

    » query-status test
    See the execution result.
    { "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "sourceStatus": { "source": "mysql-replica-01", "worker": "worker1", "result": null, "relayStatus": null }, "subTaskStatus": [ { "name": "test", "stage": "Running", "unit": "Sync", "result": null, "unresolvedDDLLockID": "", "sync": { "totalEvents": "4", "totalTps": "0", "recentTps": "0", "masterBinlog": "(DESKTOP-T561TSO-bin.000001, 2388)", "masterBinlogGtid": "143bdef3-dd4a-11ea-8b00-00155de45f57:1-10", "syncerBinlog": "(DESKTOP-T561TSO-bin.000001, 2388)", "syncerBinlogGtid": "143bdef3-dd4a-11ea-8b00-00155de45f57:1-4", "blockingDDLs": [ ], "unresolvedGroups": [ ], "synced": true, "binlogType": "remote" } } ] } ] }

    You can see that the task runs normally and the wrong DDL is skipped.

Shard merge scenario

Assume that you need to merge and migrate the following four tables in the upstream to one same table `shard_db`.`shard_table` in the downstream. The task mode is "pessimistic".

  • MySQL instance 1 contains the shard_db_1 schema, which includes the shard_table_1 and shard_table_2 tables.
  • MySQL instance 2 contains the shard_db_2 schema, which includes the shard_table_1 and shard_table_2 tables.

The initial table schema is:

SHOW CREATE TABLE shard_db.shard_table;
+-------+-----------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------+ | tb | CREATE TABLE `shard_table` ( `id` int(11) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin | +-------+-----------------------------------------------------------------------------------------------------------+

Now, execute the following DDL statement to all upstream sharded tables to alter their character set:

ALTER TABLE `shard_db_*`.`shard_table_*` CHARACTER SET LATIN1 COLLATE LATIN1_DANISH_CI;

Because this DDL statement is not supported by TiDB, the migration task of DM gets interrupted. Execute the query-status command, and you can see the following errors reported by the shard_db_1.shard_table_1 table in MySQL instance 1 and the shard_db_2.shard_table_1 table in MySQL instance 2:

{ "Message": "cannot track DDL: ALTER TABLE `shard_db_1`.`shard_table_1` CHARACTER SET UTF8 COLLATE UTF8_UNICODE_CI", "RawCause": "[ddl:8200]Unsupported modify charset from latin1 to utf8" }
{ "Message": "cannot track DDL: ALTER TABLE `shard_db_2`.`shard_table_1` CHARACTER SET UTF8 COLLATE UTF8_UNICODE_CI", "RawCause": "[ddl:8200]Unsupported modify charset from latin1 to utf8" }

Assume that it is acceptable in the actual production environment that this DDL statement is not executed in the downstream TiDB (namely, the original table schema is retained). Then you can use handle-error <task-name> skip to skip this DDL statement to resume the migration. The procedures are as follows:

  1. Execute handle-error <task-name> skip to skip the currently failed DDL statements in MySQL instance 1 and 2:

    » handle-error test skip
    { "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "source": "mysql-replica-01", "worker": "worker1" }, { "result": true, "msg": "", "source": "mysql-replica-02", "worker": "worker2" } ] }
  2. Execute the query-status command, and you can see the errors reported by the shard_db_1.shard_table_2 table in MySQL instance 1 and the shard_db_2.shard_table_2 table in MySQL instance 2:

    { "Message": "cannot track DDL: ALTER TABLE `shard_db_1`.`shard_table_2` CHARACTER SET UTF8 COLLATE UTF8_UNICODE_CI", "RawCause": "[ddl:8200]Unsupported modify charset from latin1 to utf8" }
    { "Message": "cannot track DDL: ALTER TABLE `shard_db_2`.`shard_table_2` CHARACTER SET UTF8 COLLATE UTF8_UNICODE_CI", "RawCause": "[ddl:8200]Unsupported modify charset from latin1 to utf8" }
  3. Execute handle-error <task-name> skip again to skip the currently failed DDL statements in MySQL instance 1 and 2:

    » handle-error test skip
    { "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "source": "mysql-replica-01", "worker": "worker1" }, { "result": true, "msg": "", "source": "mysql-replica-02", "worker": "worker2" } ] }
  4. Use query-status <task-name> to view the task status:

    » query-status test
    See the execution result.
    { "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "sourceStatus": { "source": "mysql-replica-01", "worker": "worker1", "result": null, "relayStatus": null }, "subTaskStatus": [ { "name": "test", "stage": "Running", "unit": "Sync", "result": null, "unresolvedDDLLockID": "", "sync": { "totalEvents": "4", "totalTps": "0", "recentTps": "0", "masterBinlog": "(DESKTOP-T561TSO-bin.000001, 2388)", "masterBinlogGtid": "143bdef3-dd4a-11ea-8b00-00155de45f57:1-10", "syncerBinlog": "(DESKTOP-T561TSO-bin.000001, 2388)", "syncerBinlogGtid": "143bdef3-dd4a-11ea-8b00-00155de45f57:1-4", "blockingDDLs": [ ], "unresolvedGroups": [ ], "synced": true, "binlogType": "remote" } } ] }, { "result": true, "msg": "", "sourceStatus": { "source": "mysql-replica-02", "worker": "worker2", "result": null, "relayStatus": null }, "subTaskStatus": [ { "name": "test", "stage": "Running", "unit": "Sync", "result": null, "unresolvedDDLLockID": "", "sync": { "totalEvents": "4", "totalTps": "0", "recentTps": "0", "masterBinlog": "(DESKTOP-T561TSO-bin.000001, 2388)", "masterBinlogGtid": "143bdef3-dd4a-11ea-8b00-00155de45f57:1-10", "syncerBinlog": "(DESKTOP-T561TSO-bin.000001, 2388)", "syncerBinlogGtid": "143bdef3-dd4a-11ea-8b00-00155de45f57:1-4", "blockingDDLs": [ ], "unresolvedGroups": [ ], "synced": true, "binlogType": "remote" } } ] } ] }

    You can see that the task runs normally with no error and all four wrong DDL statements are skipped.

Replace DDL if the migration gets interrupted

Non-shard-merge scenario

Assume that you need to migrate the upstream table db1.tbl1 to the downstream TiDB. The initial table schema is:

SHOW CREATE TABLE db1.tbl1;
+-------+-----------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------+ | tb | CREATE TABLE `tbl1` ( `id` int(11) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin | +-------+-----------------------------------------------------------------------------------------------------------+

Now, perform the following DDL operation in the upstream to add a new column with the UNIQUE constraint:

ALTER TABLE `db1`.`tbl1` ADD COLUMN new_col INT UNIQUE;

Because this DDL statement is not supported by TiDB, the migration task gets interrupted. Execute the query-status command, and you can see the following error:

{ "Message": "cannot track DDL: ALTER TABLE `db1`.`tbl1` ADD COLUMN `new_col` INT UNIQUE KEY", "RawCause": "[ddl:8200]unsupported add column 'new_col' constraint UNIQUE KEY when altering 'db1.tbl1'", }

You can replace this DDL statement with two equivalent DDL statements. The steps are as follows:

  1. Replace the wrong DDL statement by the following command:

    » handle-error test replace "ALTER TABLE `db1`.`tbl1` ADD COLUMN `new_col` INT;ALTER TABLE `db1`.`tbl1` ADD UNIQUE(`new_col`)";
    { "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "source": "mysql-replica-01", "worker": "worker1" } ] }
  2. Use query-status <task-name> to view the task status:

    » query-status test
    See the execution result.
    { "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "sourceStatus": { "source": "mysql-replica-01", "worker": "worker1", "result": null, "relayStatus": null }, "subTaskStatus": [ { "name": "test", "stage": "Running", "unit": "Sync", "result": null, "unresolvedDDLLockID": "", "sync": { "totalEvents": "4", "totalTps": "0", "recentTps": "0", "masterBinlog": "(DESKTOP-T561TSO-bin.000001, 2388)", "masterBinlogGtid": "143bdef3-dd4a-11ea-8b00-00155de45f57:1-10", "syncerBinlog": "(DESKTOP-T561TSO-bin.000001, 2388)", "syncerBinlogGtid": "143bdef3-dd4a-11ea-8b00-00155de45f57:1-4", "blockingDDLs": [ ], "unresolvedGroups": [ ], "synced": true, "binlogType": "remote" } } ] } ] }

    You can see that the task runs normally and the wrong DDL statement is replaced by new DDL statements that execute successfully.

Shard merge scenario

Assume that you need to merge and migrate the following four tables in the upstream to one same table `shard_db`.`shard_table` in the downstream. The task mode is "pessimistic".

  • In the MySQL instance 1, there is a schema shard_db_1, which has two tables shard_table_1 and shard_table_2.
  • In the MySQL instance 2, there is a schema shard_db_2, which has two tables shard_table_1 and shard_table_2.

The initial table schema is:

SHOW CREATE TABLE shard_db.shard_table;
+-------+-----------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------+ | tb | CREATE TABLE `shard_table` ( `id` int(11) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin | +-------+-----------------------------------------------------------------------------------------------------------+

Now, perform the following DDL operation to all upstream sharded tables to add a new column with the UNIQUE constraint:

ALTER TABLE `shard_db_*`.`shard_table_*` ADD COLUMN new_col INT UNIQUE;

Because this DDL statement is not supported by TiDB, the migration task gets interrupted. Execute the query-status command, and you can see the following errors reported by the shard_db_1.shard_table_1 table in MySQL instance 1 and the shard_db_2.shard_table_1 table in MySQL instance 2:

{ "Message": "cannot track DDL: ALTER TABLE `shard_db_1`.`shard_table_1` ADD COLUMN `new_col` INT UNIQUE KEY", "RawCause": "[ddl:8200]unsupported add column 'new_col' constraint UNIQUE KEY when altering 'shard_db_1.shard_table_1'", }
{ "Message": "cannot track DDL: ALTER TABLE `shard_db_2`.`shard_table_1` ADD COLUMN `new_col` INT UNIQUE KEY", "RawCause": "[ddl:8200]unsupported add column 'new_col' constraint UNIQUE KEY when altering 'shard_db_2.shard_table_1'", }

You can replace this DDL statement with two equivalent DDL statements. The steps are as follows:

  1. Replace the wrong DDL statements respectively in MySQL instance 1 and MySQL instance 2 by the following commands:

    » handle-error test -s mysql-replica-01 replace "ALTER TABLE `shard_db_1`.`shard_table_1` ADD COLUMN `new_col` INT;ALTER TABLE `shard_db_1`.`shard_table_1` ADD UNIQUE(`new_col`)";
    { "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "source": "mysql-replica-01", "worker": "worker1" } ] }
    » handle-error test -s mysql-replica-02 replace "ALTER TABLE `shard_db_2`.`shard_table_1` ADD COLUMN `new_col` INT;ALTER TABLE `shard_db_2`.`shard_table_1` ADD UNIQUE(`new_col`)";
    { "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "source": "mysql-replica-02", "worker": "worker2" } ] }
  2. Use query-status <task-name> to view the task status, and you can see the following errors reported by the shard_db_1.shard_table_2 table in MySQL instance 1 and the shard_db_2.shard_table_2 table in MySQL instance 2:

    { "Message": "detect inconsistent DDL sequence from source ... ddls: [ALTER TABLE `shard_db`.`tb` ADD COLUMN `new_col` INT UNIQUE KEY] source: `shard_db_1`.`shard_table_2`], right DDL sequence should be ..." }
    { "Message": "detect inconsistent DDL sequence from source ... ddls: [ALTER TABLE `shard_db`.`tb` ADD COLUMN `new_col` INT UNIQUE KEY] source: `shard_db_2`.`shard_table_2`], right DDL sequence should be ..." }
  3. Execute handle-error <task-name> replace again to replace the wrong DDL statements in MySQL instance 1 and 2:

    » handle-error test -s mysql-replica-01 replace "ALTER TABLE `shard_db_1`.`shard_table_2` ADD COLUMN `new_col` INT;ALTER TABLE `shard_db_1`.`shard_table_2` ADD UNIQUE(`new_col`)";
    { "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "source": "mysql-replica-01", "worker": "worker1" } ] }
    » handle-error test -s mysql-replica-02 replace "ALTER TABLE `shard_db_2`.`shard_table_2` ADD COLUMN `new_col` INT;ALTER TABLE `shard_db_2`.`shard_table_2` ADD UNIQUE(`new_col`)";
    { "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "source": "mysql-replica-02", "worker": "worker2" } ] }
  4. Use query-status <task-name> to view the task status:

    » query-status test
    See the execution result.
    { "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "sourceStatus": { "source": "mysql-replica-01", "worker": "worker1", "result": null, "relayStatus": null }, "subTaskStatus": [ { "name": "test", "stage": "Running", "unit": "Sync", "result": null, "unresolvedDDLLockID": "", "sync": { "totalEvents": "4", "totalTps": "0", "recentTps": "0", "masterBinlog": "(DESKTOP-T561TSO-bin.000001, 2388)", "masterBinlogGtid": "143bdef3-dd4a-11ea-8b00-00155de45f57:1-10", "syncerBinlog": "(DESKTOP-T561TSO-bin.000001, 2388)", "syncerBinlogGtid": "143bdef3-dd4a-11ea-8b00-00155de45f57:1-4", "blockingDDLs": [ ], "unresolvedGroups": [ ], "unresolvedGroups": [ ], "synced": true, "binlogType": "remote" } } ] }, { "result": true, "msg": "", "sourceStatus": { "source": "mysql-replica-02", "worker": "worker2", "result": null, "relayStatus": null }, "subTaskStatus": [ { "name": "test", "stage": "Running", "unit": "Sync", "result": null, "unresolvedDDLLockID": "", "sync": { "totalEvents": "4", "totalTps": "0", "recentTps": "0", "masterBinlog": "(DESKTOP-T561TSO-bin.000001, 2388)", "masterBinlogGtid": "143bdef3-dd4a-11ea-8b00-00155de45f57:1-10", "syncerBinlog": "(DESKTOP-T561TSO-bin.000001, 2388)", "syncerBinlogGtid": "143bdef3-dd4a-11ea-8b00-00155de45f57:1-4", "blockingDDLs": [ ], "unresolvedGroups": [ ], "unresolvedGroups": [ ], "synced": try, "binlogType": "remote" } } ] } ] }

    You can see that the task runs normally with no error and all four wrong DDL statements are replaced.