TiDB Lightning Backends

The backend determines how TiDB Lightning imports data into the target cluster.

TiDB Lightning supports the following backends:

The Importer-backend (default): tidb-lightning first encodes the SQL or CSV data into KV pairs, and relies on the external tikv-importer program to sort these KV pairs and ingest directly into the TiKV nodes.

The Local-backend: tidb-lightning first encodes data into key-value pairs, sorts and stores them in a local temporary directory, and upload these key-value pairs to each TiKV node as SST files. Then, TiKV ingests these SST files into the cluster. The implementation of Local-backend is the same with that of Importer-backend but does not rely on the external tikv-importer component.

The TiDB-backend: tidb-lightning first encodes these data into SQL INSERT statements, and has these statements executed directly on the TiDB node.

BackendLocal-backendImporter-backendTiDB-backend
SpeedFast (~500 GB/hr)Fast (~300 GB/hr)Slow (~50 GB/hr)
Resource usageHighHighLow
Network bandwidth usageHighMediumLow
ACID respected while importingNoNoYes
Target tablesMust be emptyMust be emptyCan be populated
Additional component requiredNotikv-importerNo
TiDB versions supported>= v4.0.0AllAll
TiDB services impactedYesYesNo

How to choose the backend modes

  • If the target cluster of data import is v4.0 or later versions, consider using the Local-backend mode first, which is easier to use and has higher performance than that of the other two modes.
  • If the target cluster of data import is v3.x or earlier versions, it is recommended to use the Importer-backend mode.
  • If the target cluster of data import is in the online production environment, or if the target table of data import already has data on it, it is recommended to use the TiDB-backend mode.

TiDB Lightning Local-backend

The Local-backend feature is introduced to TiDB Lightning since TiDB v4.0.3. You can use this feature to import data to TiDB clusters of v4.0.0 or above.

Deployment for Local-backend

To deploy TiDB Lightning in the Local-backend mode, see TiDB Lightning Deployment.

TiDB Lightning TiDB-backend

Deployment for TiDB-backend

When using the TiDB-backend, deploying tikv-importer is not necessary. Compared with the standard deployment procedure, the TiDB-backend deployment has the following two differences:

  • All steps involving tikv-importer can be skipped.
  • The configuration must be changed to declare that the TiDB-backend is used.

Hardware requirements

The speed of TiDB Lightning using TiDB-backend is limited by the SQL processing speed of TiDB. Therefore, even a lower-end machine may max out the possible performance. The recommended hardware configuration is:

  • 16 logical cores CPU
  • An SSD large enough to store the entire data source, preferring higher read speed
  • 1 Gigabit network card

Deploy TiDB Lightning using TiDB Ansible

  1. The [importer_server] section in inventory.ini can be left blank.

    ... [importer_server] # keep empty [lightning_server] 192.168.20.10 ...
  2. The tikv_importer_port setting in group_vars/all.yml is ignored, and the file group_vars/importer_server.yml does not need to be changed. But you need to edit conf/tidb-lightning.yml and change the backend setting to tidb.

    ... tikv_importer: backend: "tidb" # <-- change this ...
  3. Bootstrap and deploy the cluster as usual.

  4. Mount the data source for TiDB Lightning as usual.

  5. Start tidb-lightning as usual.

Manual deployment

You do not need to download and configure tikv-importer. You can download TiDB Lightning from here.

Before running tidb-lightning, add the following lines into the configuration file:

[tikv-importer] backend = "tidb"

or supplying the --backend tidb arguments when executing tidb-lightning.

Conflict resolution

The TiDB-backend supports importing to an already-populated table. However, the new data might cause a unique key conflict with the old data. You can control how to resolve the conflict by using this task configuration.

[tikv-importer] backend = "tidb" on-duplicate = "replace" # or "error" or "ignore"
SettingBehavior on conflictEquivalent SQL statement
replaceNew entries replace old onesREPLACE INTO ...
ignoreKeep old entries and ignore new onesINSERT IGNORE INTO ...
errorAbort importINSERT INTO ...

Migrating from Loader to TiDB Lightning TiDB-backend

If you need to import data into a TiDB cluster, TiDB Lightning using the TiDB-backend can completely replace the functionalities of Loader. The following list shows how to translate Loader configurations into TiDB Lightning configurations.

LoaderTiDB Lightning
# log level log-level = "info" # The directory to which the log is output log-file = "loader.log" # Prometheus status-addr = ":8272" # concurrency pool-size = 16
[lightning] # log level level = "info" # The directory to which the log is output. If this directory is not specified, it defaults to the directory where the command is executed. file = "tidb-lightning.log" # Prometheus pprof-port = 8289 # concurrency (better left as default) #region-concurrency = 16
# checkpoint database checkpoint-schema = "tidb_loader"
[checkpoint] # checkpoint storage enable = true schema = "tidb_lightning_checkpoint" # by default the checkpoint is stored in # a local file, which is more efficient. # but you could still choose to store the # checkpoints in the target database with # this setting: #driver = "mysql"
[tikv-importer] # use the TiDB-backend backend = "tidb"
# data source directory dir = "/data/export/"
[mydumper] # data source directory data-source-dir = "/data/export"
[db] # TiDB connection parameters host = "127.0.0.1" port = 4000 user = "root" password = "" #sql-mode = ""
[tidb] # TiDB connection parameters host = "127.0.0.1" port = 4000 # In the TiDB-backend mode, this parameter is optional. # status-port = 10080 user = "root" password = "" #sql-mode = ""
# [[route-rules]] # Table routes # schema-pattern = "shard_db_*" # table-pattern = "shard_table_*" # target-schema = "shard_db" # target-table = "shard_table"
# [[routes]] # schema-pattern = "shard_db_*" # table-pattern = "shard_table_*" # target-schema = "shard_db" # target-table = "shard_table"

TiDB Lightning Importer-backend

Deployment for Importer-backend mode

This section describes two deployment methods of TiDB Lightning in the Importer-backend mode:

Hardware requirements

tidb-lightning and tikv-importer are both resource-intensive programs. It is recommended to deploy them into two separate machines.

To achieve the best performance, it is recommended to use the following hardware configuration:

  • tidb-lightning:

    • 32+ logical cores CPU
    • An SSD large enough to store the entire data source, preferring higher read speed
    • 10 Gigabit network card (capable of transferring at ≥300 MB/s)
    • tidb-lightning fully consumes all CPU cores when running, and deploying on a dedicated machine is highly recommended. If not possible, tidb-lightning could be deployed together with other components like tidb-server, and the CPU usage could be limited via the region-concurrency setting.
  • tikv-importer:

    • 32+ logical cores CPU
    • 40 GB+ memory
    • 1 TB+ SSD, preferring higher IOPS (≥ 8000 is recommended)
      • The disk should be larger than the total size of the top N tables, where N = max(index-concurrency, table-concurrency).
    • 10 Gigabit network card (capable of transferring at ≥300 MB/s)
    • tikv-importer fully consumes all CPU, disk I/O and network bandwidth when running, and deploying on a dedicated machine is strongly recommended.

If you have sufficient machines, you can deploy multiple tidb lightning + tikv importer servers, with each working on a distinct set of tables, to import the data in parallel.

Deploy TiDB Lightning using TiDB Ansible

You can deploy TiDB Lightning using TiDB Ansible together with the deployment of the TiDB cluster itself using TiDB Ansible.

  1. Edit inventory.ini to add the addresses of the tidb-lightning and tikv-importer servers.

    ... [importer_server] 192.168.20.9 [lightning_server] 192.168.20.10 ...
  2. Configure these tools by editing the settings under group_vars/*.yml.

    • group_vars/all.yml

      ... # The listening port of tikv-importer. Should be open to the tidb-lightning server. tikv_importer_port: 8287 ...
    • group_vars/lightning_server.yml

      --- dummy: # The listening port for metrics gathering. Should be open to the monitoring servers. tidb_lightning_pprof_port: 8289 # The file path that tidb-lightning reads the data source (Mydumper SQL dump or CSV) from. data_source_dir: "{{ deploy_dir }}/mydumper"
    • group_vars/importer_server.yml

      --- dummy: # The file path to store engine files. Should reside on a partition with a large capacity. import_dir: "{{ deploy_dir }}/data.import"
  3. Deploy the cluster.

    ansible-playbook bootstrap.yml && ansible-playbook deploy.yml
  4. Mount the data source to the path specified in the data_source_dir setting.

  5. Log in to the tikv-importer server, and manually run the following command to start Importer.

    scripts/start_importer.sh
  6. Log in to the tidb-lightning server, and manually run the following command to start Lightning and import the data into the TiDB cluster.

    scripts/start_lightning.sh
  7. After completion, run scripts/stop_importer.sh on the tikv-importer server to stop Importer.

Deploy TiDB Lightning manually

Step 1: Deploy a TiDB cluster

Before importing data, you need to have a deployed TiDB cluster, with the cluster version 2.0.9 or above. It is highly recommended to use the latest version.

You can find deployment instructions in TiDB Quick Start Guide.

Step 2: Download the TiDB Lightning installation package

Refer to the TiDB enterprise tools download page to download the TiDB Lightning package (choose the same version as that of the TiDB cluster).

Step 3: Start tikv-importer

  1. Upload bin/tikv-importer from the installation package.

  2. Configure tikv-importer.toml.

    # TiKV Importer configuration file template # Log file log-file = "tikv-importer.log" # Log level: trace, debug, info, warn, error, off. log-level = "info" # Listening address of the status server. status-server-address = "0.0.0.0:8286" [server] # The listening address of tikv-importer. tidb-lightning needs to connect to # this address to write data. addr = "0.0.0.0:8287" [import] # The directory to store engine files. import-dir = "/mnt/ssd/data.import/"

    The above only shows the essential settings. See the Configuration section for the full list of settings.

  3. Run tikv-importer.

    nohup ./tikv-importer -C tikv-importer.toml > nohup.out &

Step 4: Start tidb-lightning

  1. Upload bin/tidb-lightning and bin/tidb-lightning-ctl from the tool set.

  2. Mount the data source onto the same machine.

  3. Configure tidb-lightning.toml. For configurations that do not appear in the template below, TiDB Lightning writes a configuration error to the log file and exits.

    [lightning] # The concurrency number of data. It is set to the number of logical CPU # cores by default. When deploying together with other components, you can # set it to 75% of the size of logical CPU cores to limit the CPU usage. # region-concurrency = # Logging level = "info" file = "tidb-lightning.log" [tikv-importer] # The listening address of tikv-importer. Change it to the actual address. addr = "172.16.31.10:8287" [mydumper] # mydumper local source data directory data-source-dir = "/data/my_database" [tidb] # Configuration of any TiDB server from the cluster host = "172.16.31.1" port = 4000 user = "root" password = "" # Table schema information is fetched from TiDB via this status-port. status-port = 10080

    The above only shows the essential settings. See the Configuration section for the full list of settings.

  4. Run tidb-lightning. If you directly run the command in the command-line, the process might exit because of the SIGHUP signal received. Instead, it's preferable to run a bash script that contains the nohup command:

    nohup ./tidb-lightning -config tidb-lightning.toml > nohup.out &