Integrate TiDB with Amazon AppFlow

Amazon AppFlow is a fully managed API integration service that you use to connect your software as a service (SaaS) applications to AWS services, and securely transfer data. With Amazon AppFlow, you can import and export data from and to TiDB into many types of data providers, such as Salesforce, Amazon S3, LinkedIn, and GitHub. For more information, see Supported source and destination applications in AWS documentation.

This document describes how to integrate TiDB with Amazon AppFlow and takes integrating a TiDB Serverless cluster as an example.

If you do not have a TiDB cluster, you can create a TiDB Serverless cluster, which is free and can be created in approximately 30 seconds.

Prerequisites

Step 1. Register a TiDB connector

Clone the code

Clone the integration example code repository for TiDB and Amazon AppFlow:

git clone https://github.com/pingcap-inc/tidb-appflow-integration

Build and upload a Lambda

  1. Build the package:

    cd tidb-appflow-integration mvn clean package
  2. (Optional) Configure your AWS access key ID and secret access key if you have not.

    aws configure
  3. Upload your JAR package as a Lambda:

    sam deploy --guided

    If you see a similar output as follows, this Lambda is successfully deployed.

    Successfully created/updated stack - <stack_name> in <region>
  4. Go to the AWS Lambda console, and you can see the Lambda that you just uploaded. Note that you need to select the correct region in the upper-right corner of the window.

    lambda dashboard

Use Lambda to register a connector

  1. In the AWS Management Console, navigate to Amazon AppFlow > Connectors and click Register new connector.

    register connector

  2. In the Register a new connector dialog, choose the Lambda function you uploaded and specify the connector label using the connector name.

    register connector dialog

  3. Click Register. Then, a TiDB connector is registered successfully.

Step 2. Create a flow

Navigate to Amazon AppFlow > Flows and click Create flow.

create flow

Set the flow name

Enter the flow name, and then click Next.

name flow

Set the source and destination tables

Choose the Source details and Destination details. TiDB connector can be used in both of them.

  1. Choose the source name. This document uses Salesforce as an example source.

    salesforce source

    After you register to Salesforce, Salesforce will add some example data to your platform. The following steps will use the Account object as an example source object.

    salesforce data

  2. Click Connect.

    1. In the Connect to Salesforce dialog, specify the name of this connection, and then click Continue.

      connect to salesforce

    2. Click Allow to confirm that AWS can read your Salesforce data.

      allow salesforce

  3. In the Destination details area, choose TiDB-Connector as the destination. The Connect button is displayed.

    tidb dest

  4. Before clicking Connect, you need to create a sf_account table in TiDB for the Salesforce Account object. Note that this table schema is different from the sample data in Tutorial of Amazon AppFlow.

    CREATE TABLE `sf_account` ( `id` varchar(255) NOT NULL, `name` varchar(150) NOT NULL DEFAULT '', `type` varchar(150) NOT NULL DEFAULT '', `billing_state` varchar(255) NOT NULL DEFAULT '', `rating` varchar(255) NOT NULL DEFAULT '', `industry` varchar(255) NOT NULL DEFAULT '', PRIMARY KEY (`id`) );
  5. After the sf_account table is created, click Connect. A connection dialog is displayed.

  6. In the Connect to TiDB-Connector dialog, enter the connection properties of the TiDB cluster. If you use a TiDB Serverless cluster, you need to set the TLS option to Yes, which lets the TiDB connector use the TLS connection. Then, click Connect.

    tidb connection message

  7. Now you can get all tables in the database that you specified for connection. Choose the sf_account table from the drop-down list.

    database

    The following screenshot shows the configurations to transfer data from the Salesforce Account object to the sf_account table in TiDB:

    complete flow

  8. In the Error handling area, choose Stop the current flow run. In the Flow trigger area, choose the Run on demand trigger type, which means you need to run the flow manually. Then, click Next.

    complete step1

Set mapping rules

Map the fields of the Account object in Salesforce to the sf_account table in TiDB, and then click Next.

  • The sf_account table is newly created in TiDB and it is empty.

    test> SELECT * FROM sf_account; +----+------+------+---------------+--------+----------+ | id | name | type | billing_state | rating | industry | +----+------+------+---------------+--------+----------+ +----+------+------+---------------+--------+----------+
  • To set a mapping rule, you can select a source field name on the left, and select a destination field name on the right. Then, click Map fields, and a rule is set.

    add mapping rule

  • The following mapping rules (Source field name -> Destination field name) are needed in this document:

    • Account ID -> id

    • Account Name -> name

    • Account Type -> type

    • Billing State/Province -> billing_state

    • Account Rating -> rating

    • Industry -> industry

      mapping a rule

      show all mapping rules

(Optional) Set filters

If you want to add some filters to your data fields, you can set them here. Otherwise, skip this step and click Next.

filters

Confirm and create the flow

Confirm the information of the flow to be created. If everything looks fine, click Create flow.

review

Step 3. Run the flow

On the page of the newly created flow, click Run flow in the upper-right corner.

run flow

The following screenshot shows an example that the flow runs successfully:

run success

Query the sf_account table, and you can see that the records from the Salesforce Account object have been written to it:

test> SELECT * FROM sf_account; +--------------------+-------------------------------------+--------------------+---------------+--------+----------------+ | id | name | type | billing_state | rating | industry | +--------------------+-------------------------------------+--------------------+---------------+--------+----------------+ | 001Do000003EDTlIAO | Sample Account for Entitlements | null | null | null | null | | 001Do000003EDTZIA4 | Edge Communications | Customer - Direct | TX | Hot | Electronics | | 001Do000003EDTaIAO | Burlington Textiles Corp of America | Customer - Direct | NC | Warm | Apparel | | 001Do000003EDTbIAO | Pyramid Construction Inc. | Customer - Channel | null | null | Construction | | 001Do000003EDTcIAO | Dickenson plc | Customer - Channel | KS | null | Consulting | | 001Do000003EDTdIAO | Grand Hotels & Resorts Ltd | Customer - Direct | IL | Warm | Hospitality | | 001Do000003EDTeIAO | United Oil & Gas Corp. | Customer - Direct | NY | Hot | Energy | | 001Do000003EDTfIAO | Express Logistics and Transport | Customer - Channel | OR | Cold | Transportation | | 001Do000003EDTgIAO | University of Arizona | Customer - Direct | AZ | Warm | Education | | 001Do000003EDThIAO | United Oil & Gas, UK | Customer - Direct | UK | null | Energy | | 001Do000003EDTiIAO | United Oil & Gas, Singapore | Customer - Direct | Singapore | null | Energy | | 001Do000003EDTjIAO | GenePoint | Customer - Channel | CA | Cold | Biotechnology | | 001Do000003EDTkIAO | sForce | null | CA | null | null | +--------------------+-------------------------------------+--------------------+---------------+--------+----------------+

Noteworthy things