Export or Backup Data Using Dumpling

This document introduces how to use the Dumpling tool to export or backup data in TiDB. Dumpling exports data stored in TiDB as SQL or CSV data files and can be used to make a logical full backup or export.

For backups of SST files (KV pairs) or backups of incremental data that are not sensitive to latency, refer to BR.

For detailed usage of Dumpling, use the --help command or refer to Dumpling User Guide.

When using Dumpling, you need to execute the export command on a running cluster. This document assumes that there is a TiDB instance on the 127.0.0.1:4000 host and that this TiDB instance has a root user without a password.

Download Dumpling

To download the latest version of Dumpling, click the download link.

Export data from TiDB

Export to SQL files

Dumpling exports data to SQL files by default. You can also export data to SQL files by adding the --filetype sql flag:

dumpling \ -u root \ -P 4000 \ -h 127.0.0.1 \ --filetype sql \ --threads 32 \ -o /tmp/test \ -F 256

In the above command, -h, -P and -u mean address, port and user, respectively. If password authentication is required, you can pass it to Dumpling with -p $YOUR_SECRET_PASSWORD.

Export to CSV files

If Dumpling exports data to CSV files (use --filetype csv to export to CSV files), you can also use --sql <SQL> to export the records selected by the specified SQL statement.

For example, you can export all records that match id < 100 in test.sbtest1 using the following command:

./dumpling \ -u root \ -P 4000 \ -h 127.0.0.1 \ -o /tmp/test \ --filetype csv \ --sql 'select * from `test`.`sbtest1` where id < 100'

Filter the exported data

Use the --where command to filter data

By default, Dumpling exports the tables of the entire database except the tables in the system databases. You can use --where <SQL where expression> to select the records to be exported.

./dumpling \ -u root \ -P 4000 \ -h 127.0.0.1 \ -o /tmp/test \ --where "id < 100"

The above command exports the data that matches id < 100 from each table.

Use the --filter command to filter data

Dumpling can filter specific databases or tables by specifying the table filter with the --filter command. The syntax of table filters is similar to that of .gitignore. For details, see Table Filter.

./dumpling \ -u root \ -P 4000 \ -h 127.0.0.1 \ -o /tmp/test \ --filter "employees.*" \ --filter "*.WorkOrder"

The above command exports all the tables in the employees database and the WorkOrder tables in all databases.

Use the -B or -T command to filter data

Dumpling can also export specific databases with the -B command or specific tables with the -T command.

Examples:

--B employees exports the employees database --T employees.WorkOrder exports the employees.WorkOrder table

Improve export efficiency through concurrency

The exported file is stored in the ./export-<current local time> directory by default. Commonly used parameters are as follows:

  • -o is used to select the directory where the exported files are stored.
  • -F option is used to specify the maximum size of a single file (the unit here is MiB; inputs like 5GiB or 8KB are also acceptable).
  • -r option is used to specify the maximum number of records (or the number of rows in the database) for a single file. When it is enabled, Dumpling enables concurrency in the table to improve the speed of exporting large tables.

You can use the above parameters to provide Dumpling with a higher degree of concurrency.

Adjust Dumpling's data consistency options

Dumpling uses the --consistency <consistency level> option to control the way in which data is exported for "consistency assurance". For TiDB, data consistency is guaranteed by getting a snapshot of a certain timestamp by default (i.e. --consistency snapshot). When using snapshot for consistency, you can use the --snapshot parameter to specify the timestamp to be backed up. You can also use the following levels of consistency:

  • flush: Use FLUSH TABLES WITH READ LOCK to ensure consistency.
  • snapshot: Get a consistent snapshot of the specified timestamp and export it.
  • lock: Add read locks on all tables to be exported.
  • none: No guarantee for consistency.
  • auto: Use flush for MySQL and snapshot for TiDB.

After everything is done, you can see the exported file in /tmp/test:

ls -lh /tmp/test | awk '{print $5 "\t" $9}'
140B metadata 66B test-schema-create.sql 300B test.sbtest1-schema.sql 190K test.sbtest1.0.sql 300B test.sbtest2-schema.sql 190K test.sbtest2.0.sql 300B test.sbtest3-schema.sql 190K test.sbtest3.0.sql

In addition, if the data volume is very large, to avoid export failure due to GC during the export process, you can extend the GC time in advance:

update mysql.tidb set VARIABLE_VALUE = '720h' where VARIABLE_NAME = 'tikv_gc_life_time';

After your operation is completed, set the GC time back (the default value is 10m):

update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time';

Finally, all the exported data can be imported back to TiDB using Lightning.