Quick Start Guide on Integrating TiDB with Confluent Platform

This document introduces how to integrate TiDB to Confluent Platform using TiCDC.

Warning

This is still an experimental feature. Do NOT use it in a production environment.

Confluent Platform is a data streaming platform with Apache Kafka at its core. With many official and third-party sink connectors, Confluent Platform enables you to easily connect stream sources to relational or non-relational databases.

To integrate TiDB with Confluent Platform, you can use the TiCDC component with the Avro protocol. TiCDC can stream data changes to Kafka in the format that Confluent Platform recognizes. For the detailed integration guide, see the following sections:

Prerequisites

Note

In this tutorial, the JDBC sink connector is used to replicate TiDB data to a downstream relational database. To make it simple, SQLite is used here as an example.

Make sure that Zookeeper, Kafka, and Schema Registry are properly installed. It is recommended that you follow the Confluent Platform Quick Start Guide to deploy a local test environment.
Make sure that JDBC sink connector is installed by running the following command. The result should contain jdbc-sink.
```
confluent local services connect connector list
```

Integration procedures

Save the following configuration into jdbc-sink-connector.json:

{
  "name": "jdbc-sink-connector",
  "config": {
    "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
    "tasks.max": "1",
    "topics": "testdb_test",
    "connection.url": "jdbc:sqlite:/tmp/test.db", 
    "connection.ds.pool.size": 5,
    "table.name.format": "test",
    "auto.create": true,
    "auto.evolve": true
  }
}

Create an instance of the JDBC sink connector by running the following command (assuming Kafka is listening on 127.0.0.1:8083):
```
curl -X POST -H "Content-Type: application/json" -d jdbc-sink-connector.json http://127.0.0.1:8083/connectors
```
Deploy TiCDC in one of the following ways. If TiCDC is already deployed, you can skip this step.
Make sure that your TiDB and TiCDC clusters are healthy before proceeding.

Create a changefeed by running the cdc cli command:

./cdc cli changefeed create --pd="http://127.0.0.1:2379" --sink-uri="kafka://127.0.0.1:9092/testdb_test?protocol=avro" --opts "registry=http://127.0.0.1:8081"

Note

Make sure that PD, Kafka, and Schema Registry are running on their respective default ports.

Test data replication

After TiDB is integrated with Confluent Platform, you can follow the example procedures below to test the data replication.

Create the testdb database in your TiDB cluster:
```
CREATE DATABASE IF NOT EXISTS testdb;
```
Create the test table in testdb:
```
USE testdb;
CREATE TABLE test (
    id INT PRIMARY KEY,
    v TEXT
);
```
Note
If you need to change the database name or the table name, change topics in jdbc-sink-connector.json accordingly.

Insert data into TiDB:

INSERT INTO test (id, v) values (1, 'a');
INSERT INTO test (id, v) values (2, 'b');
INSERT INTO test (id, v) values (3, 'c');
INSERT INTO test (id, v) values (4, 'd');

Wait a moment for data to be replicated to the downstream. Then check the downstream for data:
```
sqlite3 test.db
sqlite> SELECT * from test;
```