Manage Changefeeds

This document describes how to create and manage TiCDC changefeeds by using the TiCDC command-line tool cdc cli. You can also manage changefeeds via the HTTP interface of TiCDC. For details, see TiCDC OpenAPI.

Create a replication task

Run the following command to create a replication task:

cdc cli changefeed create --server=http://10.0.10.25:8300 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --changefeed-id="simple-replication-task"
Create changefeed successfully! ID: simple-replication-task Info: {"upstream_id":7178706266519722477,"namespace":"default","id":"simple-replication-task","sink_uri":"mysql://root:xxxxx@127.0.0.1:4000/?time-zone=","create_time":"2022-12-19T15:05:46.679218+08:00","start_ts":438156275634929669,"engine":"unified","config":{"case_sensitive":true,"enable_old_value":true,"force_replicate":false,"ignore_ineligible_table":false,"check_gc_safe_point":true,"enable_sync_point":true,"bdr_mode":false,"sync_point_interval":30000000000,"sync_point_retention":3600000000000,"filter":{"rules":["test.*"],"event_filters":null},"mounter":{"worker_num":16},"sink":{"protocol":"","schema_registry":"","csv":{"delimiter":",","quote":"\"","null":"\\N","include_commit_ts":false},"column_selectors":null,"transaction_atomicity":"none","encoder_concurrency":16,"terminator":"\r\n","date_separator":"none","enable_partition_separator":false},"consistent":{"level":"none","max_log_size":64,"flush_interval":2000,"storage":""}},"state":"normal","creator_version":"v6.5.0"}

Query the replication task list

Run the following command to query the replication task list:

cdc cli changefeed list --server=http://10.0.10.25:8300
[{ "id": "simple-replication-task", "summary": { "state": "normal", "tso": 417886179132964865, "checkpoint": "2020-07-07 16:07:44.881", "error": null } }]
  • checkpoint indicates that TiCDC has already replicated data before this time point to the downstream.
  • state indicates the state of the replication task.
    • normal: The replication task runs normally.
    • stopped: The replication task is stopped (manually paused).
    • error: The replication task is stopped (by an error).
    • removed: The replication task is removed. Tasks of this state are displayed only when you have specified the --all option. To see these tasks when this option is not specified, run the changefeed query command.
    • finished: The replication task is finished (data is replicated to the target-ts). Tasks of this state are displayed only when you have specified the --all option. To see these tasks when this option is not specified, run the changefeed query command.

Query a specific replication task

To query a specific replication task, run the changefeed query command. The query result includes the task information and the task state. You can specify the --simple or -s argument to simplify the query result that will only include the basic replication state and the checkpoint information. If you do not specify this argument, detailed task configuration, replication states, and replication table information are output.

cdc cli changefeed query -s --server=http://10.0.10.25:8300 --changefeed-id=simple-replication-task
{ "state": "normal", "tso": 419035700154597378, "checkpoint": "2020-08-27 10:12:19.579", "error": null }

In the preceding command and result:

  • state is the replication state of the current changefeed. Each state must be consistent with the state in changefeed list.
  • tso represents the largest transaction TSO in the current changefeed that has been successfully replicated to the downstream.
  • checkpoint represents the corresponding time of the largest transaction TSO in the current changefeed that has been successfully replicated to the downstream.
  • error records whether an error has occurred in the current changefeed.
cdc cli changefeed query --server=http://10.0.10.25:8300 --changefeed-id=simple-replication-task
{ "info": { "sink-uri": "mysql://127.0.0.1:3306/?max-txn-row=20\u0026worker-number=4", "opts": {}, "create-time": "2020-08-27T10:33:41.687983832+08:00", "start-ts": 419036036249681921, "target-ts": 0, "admin-job-type": 0, "sort-engine": "unified", "sort-dir": ".", "config": { "case-sensitive": true, "enable-old-value": false, "filter": { "rules": [ "*.*" ], "ignore-txn-start-ts": null, "ddl-allow-list": null }, "mounter": { "worker-num": 16 }, "sink": { "dispatchers": null, }, "scheduler": { "type": "table-number", "polling-time": -1 } }, "state": "normal", "history": null, "error": null }, "status": { "resolved-ts": 419036036249681921, "checkpoint-ts": 419036036249681921, "admin-job-type": 0 }, "count": 0, "task-status": [ { "capture-id": "97173367-75dc-490c-ae2d-4e990f90da0f", "status": { "tables": { "47": { "start-ts": 419036036249681921 } }, "operation": null, "admin-job-type": 0 } } ] }

In the preceding command and result:

  • info is the replication configuration of the queried changefeed.
  • status is the replication state of the queried changefeed.
    • resolved-ts: The largest transaction TS in the current changefeed. Note that this TS has been successfully sent from TiKV to TiCDC.
    • checkpoint-ts: The largest transaction TS in the current changefeed. Note that this TS has been successfully written to the downstream.
    • admin-job-type: The status of a changefeed:
      • 0: The state is normal.
      • 1: The task is paused. When the task is paused, all replicated processors exit. The configuration and the replication status of the task are retained, so you can resume the task from checkpiont-ts.
      • 2: The task is resumed. The replication task resumes from checkpoint-ts.
      • 3: The task is removed. When the task is removed, all replicated processors are ended, and the configuration information of the replication task is cleared up. Only the replication status is retained for later queries.
  • task-status indicates the state of each replication sub-task in the queried changefeed.

Pause a replication task

Run the following command to pause a replication task:

cdc cli changefeed pause --server=http://10.0.10.25:8300 --changefeed-id simple-replication-task

In the preceding command:

  • --changefeed-id=uuid represents the ID of the changefeed that corresponds to the replication task you want to pause.

Resume a replication task

Run the following command to resume a paused replication task:

cdc cli changefeed resume --server=http://10.0.10.25:8300 --changefeed-id simple-replication-task
  • --changefeed-id=uuid represents the ID of the changefeed that corresponds to the replication task you want to resume.
  • --overwrite-checkpoint-ts: starting from v6.2.0, you can specify the starting TSO of resuming the replication task. TiCDC starts pulling data from the specified TSO. The argument accepts now or a specific TSO (such as 434873584621453313). The specified TSO must be in the range of (GC safe point, CurrentTSO]. If this argument is not specified, TiCDC replicates data from the current checkpoint-ts by default.
  • --no-confirm: when the replication is resumed, you do not need to confirm the related information. Defaults to false.

Remove a replication task

Run the following command to remove a replication task:

cdc cli changefeed remove --server=http://10.0.10.25:8300 --changefeed-id simple-replication-task

In the preceding command:

  • --changefeed-id=uuid represents the ID of the changefeed that corresponds to the replication task you want to remove.

Update task configuration

TiCDC supports modifying the configuration of the replication task (not dynamically). To modify the changefeed configuration, pause the task, modify the configuration, and then resume the task.

cdc cli changefeed pause -c test-cf --server=http://10.0.10.25:8300 cdc cli changefeed update -c test-cf --server=http://10.0.10.25:8300 --sink-uri="mysql://127.0.0.1:3306/?max-txn-row=20&worker-number=8" --config=changefeed.toml cdc cli changefeed resume -c test-cf --server=http://10.0.10.25:8300

Currently, you can modify the following configuration items:

  • sink-uri of the changefeed.
  • The changefeed configuration file and all configuration items in the file.
  • The target-ts of the changefeed.

Manage processing units of replication sub-tasks (processor)

  • Query the processor list:

    cdc cli processor list --server=http://10.0.10.25:8300
    [ { "id": "9f84ff74-abf9-407f-a6e2-56aa35b33888", "capture-id": "b293999a-4168-4988-a4f4-35d9589b226b", "changefeed-id": "simple-replication-task" } ]
  • Query a specific changefeed which corresponds to the status of a specific replication task:

    cdc cli processor query --server=http://10.0.10.25:8300 --changefeed-id=simple-replication-task --capture-id=b293999a-4168-4988-a4f4-35d9589b226b
    { "status": { "tables": { "56": { # 56 ID of the replication table, corresponding to tidb_table_id of a table in TiDB "start-ts": 417474117955485702 } }, "operation": null, "admin-job-type": 0 }, "position": { "checkpoint-ts": 417474143881789441, "resolved-ts": 417474143881789441, "count": 0 } }

    In the preceding command:

    • status.tables: Each key number represents the ID of the replication table, corresponding to tidb_table_id of a table in TiDB.
    • resolved-ts: The largest TSO among the sorted data in the current processor.
    • checkpoint-ts: The largest TSO that has been successfully written to the downstream in the current processor.

Output the historical value of a Row Changed Event

In the default configuration, the Row Changed Event of TiCDC Open Protocol output in a replication task only contains the changed value, not the value before the change. Therefore, the output value cannot be used by the consumer ends of TiCDC Open Protocol as the historical value of a Row Changed Event.

Starting from v4.0.5, TiCDC supports outputting the historical value of a Row Changed Event. To enable this feature, specify the following configuration in the changefeed configuration file at the root level:

enable-old-value = true

This feature is enabled by default since v5.0. To learn the output format of the TiCDC Open Protocol after this feature is enabled, see TiCDC Open Protocol - Row Changed Event.

Replicate tables with the new framework for collations enabled

Starting from v4.0.15, v5.0.4, v5.1.1 and v5.2.0, TiCDC supports tables that have enabled new framework for collations.

Replicate tables without a valid index

Since v4.0.8, TiCDC supports replicating tables that have no valid index by modifying the task configuration. To enable this feature, configure in the changefeed configuration file as follows:

enable-old-value = true force-replicate = true

Unified Sorter

Unified sorter is the sorting engine in TiCDC. It can mitigate OOM problems caused by the following scenarios:

  • The data replication task in TiCDC is paused for a long time, during which a large amount of incremental data is accumulated and needs to be replicated.
  • The data replication task is started from an early timestamp so it becomes necessary to replicate a large amount of incremental data.

For the changefeeds created using cdc cli after v4.0.13, Unified Sorter is enabled by default; for the changefeeds that have existed before v4.0.13, the previous configuration is used.

To check whether or not the Unified Sorter feature is enabled on a changefeed, you can run the following example command (assuming the IP address of the PD instance is http://10.0.10.25:2379):

cdc cli --server="http://10.0.10.25:8300" changefeed query --changefeed-id=simple-replication-task | grep 'sort-engine'

In the output of the above command, if the value of sort-engine is "unified", it means that Unified Sorter is enabled on the changefeed.