TiDB Lightning supports two backends: Importer and TiDB. It determines how
tidb-lightning delivers data into the target cluster.
The Importer-backend (default) requires
tidb-lightning to first encode the SQL or CSV data into KV pairs, and relies on the external
tikv-importer program to sort these KV pairs and ingest directly into the TiKV nodes.
The TiDB-backend requires
tidb-lightning to encode these data into SQL
INSERT statements, and has these statements executed directly on the TiDB node.
|Fast (~300 GB/hr)
|Slow (~50 GB/hr)
|ACID respected while importing
|Must be empty
|Can be populated
When using the TiDB-backend, you no longer need
tikv-importer. Compared with the standard deployment procedure, the TiDB-backend deployment has the following two differences:
- Steps involving
tikv-importercan all be skipped.
- The configuration must be changed to indicate the TiDB-backend is used.
The speed of TiDB Lightning using TiDB-backend is limited by the SQL processing speed of TiDB. Therefore, even a lower-end machine may max out the possible performance. The recommended hardware configuration is:
- 16 logical cores CPU
- An SSD large enough to store the entire data source, preferring higher read speed
- 1 Gigabit network card
inventory.inican be left blank.... [importer_server] # keep empty [lightning_server] 192.168.20.10 ...
group_vars/all.ymlis ignored, and the file
group_vars/importer_server.ymldoes not need to be changed. But you need to edit
conf/tidb-lightning.ymland change the
tidb.... tikv_importer: backend: "tidb" # <-- change this ...
Bootstrap and deploy the cluster as usual.
Mount the data source for TiDB Lightning as usual.
You do not need to download and configure
tikv-importer. You can download TiDB Lightning from here.
tidb-lightning, add the following lines into the configuration file:
backend = "tidb"
or supplying the
--backend tidb arguments when executing
The TiDB-backend supports importing to an already-populated table. However, the new data might cause a unique key conflict with the old data. You can control how to resolve the conflict by using this task configuration.
backend = "tidb"
on-duplicate = "replace" # or "error" or "ignore"
|Behavior on conflict
|Equivalent SQL statement
|New entries replace old ones
REPLACE INTO ...
|Keep old entries and ignore new ones
INSERT IGNORE INTO ...
INSERT INTO ...