Binlog Consumer Client User Guide

Binlog Consumer Client is used to consume TiDB secondary binlog data from Kafka and output the data in a specific format. Currently, Drainer supports multiple kinds of down streaming, including MySQL, TiDB, file and Kafka. But sometimes users have customized requirements for outputting data to other formats, for example, Elasticsearch and Hive, so this feature is introduced.

Configure Drainer

Modify the configuration file of Drainer and set it to output the data to Kafka:

[syncer] db-type = "kafka" [syncer.to] # the Kafka address kafka-addrs = "127.0.0.1:9092" # the Kafka version kafka-version = "0.8.2.0"

Customized development

Data format

Firstly, you need to obtain the format information of the data which is output to Kafka by Drainer:

// `Column` stores the column data in the corresponding variable based on the data type. message Column { // Indicates whether the data is null optional bool is_null = 1 [ default = false ]; // Stores `int` data optional int64 int64_value = 2; // Stores `uint`, `enum`, and `set` data optional uint64 uint64_value = 3; // Stores `float` and `double` data optional double double_value = 4; // Stores `bit`, `blob`, `binary` and `json` data optional bytes bytes_value = 5; // Stores `date`, `time`, `decimal`, `text`, `char` data optional string string_value = 6; } // `ColumnInfo` stores the column information, including the column name, type, and whether it is the primary key. message ColumnInfo { optional string name = 1 [ (gogoproto.nullable) = false ]; // the lower case column field type in MySQL // https://dev.mysql.com/doc/refman/8.0/en/data-types.html // for the `numeric` type: int bigint smallint tinyint float double decimal bit // for the `string` type: text longtext mediumtext char tinytext varchar // blob longblob mediumblob binary tinyblob varbinary // enum set // for the `json` type: json optional string mysql_type = 2 [ (gogoproto.nullable) = false ]; optional bool is_primary_key = 3 [ (gogoproto.nullable) = false ]; } // `Row` stores the actual data of a row. message Row { repeated Column columns = 1; } // `MutationType` indicates the DML type. enum MutationType { Insert = 0; Update = 1; Delete = 2; } // `Table` contains mutations in a table. message Table { optional string schema_name = 1; optional string table_name = 2; repeated ColumnInfo column_info = 3; repeated TableMutation mutations = 4; } // `TableMutation` stores mutations of a row. message TableMutation { required MutationType type = 1; // data after modification required Row row = 2; // data before modification. It only takes effect for `Update MutationType`. optional Row change_row = 3; } // `DMLData` stores all the mutations caused by DML in a transaction. message DMLData { // `tables` contains all the table changes in the transaction. repeated Table tables = 1; } // `DDLData` stores the DDL information. message DDLData { // the database used currently optional string schema_name = 1; // the relates table optional string table_name = 2; // `ddl_query` is the original DDL statement query. optional bytes ddl_query = 3; } // `BinlogType` indicates the binlog type, including DML and DDL. enum BinlogType { DML = 0; // Has `dml_data` DDL = 1; // Has `ddl_query` } // `Binlog` stores all the changes in a transaction. Kafka stores the serialized result of the structure data. message Binlog { optional BinlogType type = 1 [ (gogoproto.nullable) = false ]; optional int64 commit_ts = 2 [ (gogoproto.nullable) = false ]; optional DMLData dml_data = 3; optional DDLData ddl_data = 4; }

For the definition of the data format, see binlog.proto.

Driver

The TiDB-Tools project provides Driver, which is used to read the binlog data in Kafka. It has the following features:

  • Read the Kafka data.
  • Locate the binlog stored in Kafka based on commit ts.

You need to configure the following information when using Driver:

  • KafkaAddr: the address of the Kafka cluster
  • CommitTS: from which commit ts to start reading the binlog
  • Offset: from which Kafka offset to start reading data. If CommitTS is set, you needn't configure this parameter.
  • ClusterID: the cluster ID of the TiDB cluster
  • Topic: the topic name of Kafka. If Topic is empty, use the default name in Drainer <ClusterID>_obinlog.

You can use Driver by quoting the Driver code in package and refer to the example code provided by Driver to learn how to use Driver and parse the binlog data.

Currently, two examples are provided:

  • Using Driver to replicate data to MySQL. This example shows how to convert a binlog to SQL
  • Using Driver to print data