Migration Task Precheck
Before using DM to migrate data from upstream to downstream, a precheck helps detect errors in the upstream database configurations and ensures that the migration goes smoothly. This document introduces the DM precheck feature, including its usage scenario, check items, and arguments.
Usage scenario
To run a data migration task smoothly, DM triggers a precheck automatically at the start of the task and returns the check results. DM starts the migration only after the precheck is passed.
To trigger a precheck manually, run the check-task
command.
For example:
tiup dmctl check-task ./task.yaml
Descriptions of check items
After a precheck is triggered for a task, DM checks the corresponding items according to your migration mode configuration.
This section lists all the precheck items.
If a mandatory check item does not pass, DM returns an error after the check and does not proceed with the migration task. In this case, modify the configurations according to the error message and retry the task after meeting the precheck requirements.
If a non-mandatory check item does not pass, DM returns a warning after the check. DM automatically starts a migration task if the check result contains only warnings but no errors.
Common check items
Regardless of the migration mode you choose, the precheck always includes the following common check items:
Database version
MySQL version > 5.5
MariaDB version >= 10.1.2
Compatibility of the upstream MySQL table schema
Check whether the upstream tables have foreign keys, which are not supported by TiDB. A warning is returned if a foreign key is found in the precheck.
Check whether the upstream tables use character sets that are incompatible with TiDB. For more information, see TiDB Supported Character Sets.
Check whether the upstream tables have primary key constraints or unique key constraints (introduced from v1.0.7).
Check items for full data migration
For the full data migration mode (task-mode: full
), in addition to the common check items, the precheck also includes the following check items:
(Mandatory) dump permission of the upstream database
- SELECT permission on INFORMATION_SCHEMA and dump tables
- RELOAD permission if
consistency=flush
- LOCK TABLES permission on the dump tables if
consistency=flush/lock
(Mandatory) Consistency of upstream MySQL multi-instance sharding tables
In the pessimistic mode, check whether the table schemas of all sharded tables are consistent in the following items:
- Number of columns
- Column name
- Column order
- Column type
- Primary key
- Unique index
In the optimistic mode, check whether the schemas of all sharded tables meet the optimistic compatibility.
If a migration task was started successfully by the
start-task
command, the precheck of this task skips the consistency check.
Auto-increment primary key in sharded tables
- If sharded tables have auto-increment primary keys, the precheck returns a warning. If there are conflicts in auto-increment primary keys, see Handle conflicts of auto-increment primary key for solutions.
Check items for physical import
If you set import-mode: "physical"
in the task configuration, the following check items are added to ensure that Physical Import runs normally. After following the prompts, if you find it difficult to meet the requirements of these check items, you can try to use the logical import mode to import data.
Empty Regions in the downstream database
- If the number of empty Regions is greater than
max(1000, 3 * the number of tables)
(the larger of "1000" and "3 times the number of tables"), the precheck returns a warning. You can adjust related PD parameters to speed up the merging of empty Regions and wait for the number of empty Regions to decrease. See PD Scheduling Best Practices - Slow Region Merge.
- If the number of empty Regions is greater than
Region distribution in the downstream database
- Checks the number of Regions on different TiKV nodes. Assuming that the TiKV node with the lowest Region count has
a
Regions and the TiKV node with the highest Region count hasb
Regions, ifa / b
is less than 0.75, the precheck returns a warning. You can adjust related PD parameters to speed up the scheduling of Regions and wait for the number of Regions to change. See PD Scheduling Best Practices - Leader/Region distribution is not balanced.
- Checks the number of Regions on different TiKV nodes. Assuming that the TiKV node with the lowest Region count has
The versions of TiDB, PD, and TiKV in the downstream database
- Physical import must call the interfaces of TiDB, PD, and TiKV. If the versions are not compatible, the precheck returns an error.
The free space of the downstream database
- Estimates the total sizes of all tables in the allow list in the upstream database (
source_size
). If the free space of the downstream database is less thansource_size
, the precheck returns an error. If the free space of the downstream database is less than the number of TiKV replicas *source_size
* 2, the precheck returns a warning.
- Estimates the total sizes of all tables in the allow list in the upstream database (
Whether the downstream database is running tasks that are incompatible with physical import
Check items for incremental data migration
For the incremental data migration mode (task-mode: incremental
), in addition to the common check items, the precheck also includes the following check items:
(Mandatory) Upstream database REPLICATION permission
- REPLICATION CLIENT permission
- REPLICATION SLAVE permission
Database primary-secondary configuration
- To avoid primary-secondary replication failures, it is recommended that you specify the database ID
server_id
for the upstream database (GTID is recommended for non-AWS Aurora environments).
- To avoid primary-secondary replication failures, it is recommended that you specify the database ID
(Mandatory) MySQL binlog configuration
- Check whether binlog is enabled (required by DM).
- Check whether
binlog_format=ROW
is configured (DM only supports the migration of binlog in the ROW format). - Check whether
binlog_row_image=FULL
is configured (DM only supportsbinlog_row_image=FULL
). - If
binlog_do_db
orbinlog_ignore_db
is configured, check whether the database tables to be migrated meet the conditions ofbinlog_do_db
andbinlog_ignore_db
.
(Mandatory) Check if the upstream database is in an Online-DDL process (in which the
ghost
table is created but therename
phase is not executed yet). If the upstream is in the online-DDL process, the precheck returns an error. In this case, wait until the DDL to complete and retry.
Check items for full and incremental data migration
For the full and incremental data migration mode (task-mode: all
), in addition to the common check items, the precheck also includes the full data migration check items and the incremental data migration check items.
Ignorable check items
Prechecks can find potential risks in your environments. It is not recommended to ignore check items. If your data migration task has special needs, you can use the ignore-checking-items
configuration item to skip some check items.
Check item | Description |
---|---|
dump_privilege | Checks the dump privilege of the user in the upstream MySQL instance. |
replication_privilege | Checks the replication privilege of the user in the upstream MySQL instance. |
version | Checks the version of the upstream database. |
server_id | Checks whether server_id is configured in the upstream database. |
binlog_enable | Checks whether binlog is enabled in the upstream database. |
table_schema | Checks the compatibility of the table schemas in the upstream MySQL tables. |
schema_of_shard_tables | Checks the consistency of the table schemas in the upstream MySQL multi-instance shards. |
auto_increment_ID | Checks whether the auto-increment primary key conflicts in the upstream MySQL multi-instance shards. |
online_ddl | Checks whether the upstream is in the process of online-DDL. |
empty_region | Checks the number of empty Regions in the downstream database for physical import. |
region_distribution | Checks the distribution of Regions in the downstream database for physical import. |
downstream_version | Checks the versions of TiDB, PD, and TiKV in the downstream database. |
free_space | Checks the free space of the downstream database. |
downstream_mutex_features | Checks whether the downstream database is running tasks that are incompatible with physical import. |
Configure precheck arguments
The migration task precheck supports processing in parallel. Even if the number of rows in sharded tables reaches a million level, the precheck can be completed in minutes.
To specify the number of threads for the precheck, you can configure the threads
argument of the mydumpers
field in the migration task configuration file.
mydumpers: # Configuration arguments of the dump processing unit
global: # Configuration name
threads: 4 # The number of threads that access the upstream when the dump processing unit performs the precheck and exports data from the upstream database (4 by default)
chunk-filesize: 64 # The size of the files generated by the dump processing unit (64 MB by default)
extra-args: "--consistency none" # Other arguments of the dump processing unit. You do not need to manually configure table-list in `extra-args`, because it is automatically generated by DM.