Amazon Auroraから TiDB にデータを移行する

このドキュメントでは、Amazon Auroraから TiDB にデータを移行する方法について説明します。移行プロセスではDB スナップショットを使用するため、スペースと時間を大幅に節約できます。

移行全体には 2 つのプロセスがあります。

TiDB Lightningを使用して TiDB に完全なデータをインポートする
DM を使用して増分データを TiDB に複製する (オプション)

前提条件

完全なデータを TiDB にインポートする

ステップAuroraスナップショットを Amazon S3 にエクスポートする

Auroraで、次のコマンドを実行して現在の binlog の位置を照会します。

mysql> SHOW MASTER STATUS;

出力は次のようになります。後で使用するためにバイナリログの名前と位置を記録します。

+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 |    52806 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.012 sec)

Auroraスナップショットをエクスポートします。詳細な手順については、 DB スナップショットデータを Amazon S3 にエクスポートするを参照してください。

binlog の位置を取得したら、5 分以内にスナップショットをエクスポートします。そうしないと、記録されたバイナリログの位置が古くなり、増分レプリケーション中にデータの競合が発生する可能性があります。

上記の 2 つの手順を実行したら、次の情報が準備できていることを確認してください。

スナップショット作成時のAuroraバイナリログの名前と位置。
スナップショットが保存されている S3 パス、および S3 パスにアクセスできる SecretKey と AccessKey。

ステップ 2. スキーマのエクスポート

Auroraのスナップショットファイルには DDL ステートメントが含まれていないため、Dumpleing を使用してスキーマをエクスポートし、 Dumpling TiDB Lightningを使用してターゲットデータベースにスキーマを作成する必要があります。スキーマを手動で作成する場合は、この手順を省略できます。

次のコマンドを実行して、 Dumplingを使用してスキーマをエクスポートします。このコマンドには、目的のテーブルスキーマのみをエクスポートするための--filterつのパラメーターが含まれています。

tiup dumpling --host ${host} --port 3306 --user root --password ${password} --filter 'my_db1.table[12]' --no-data --output 's3://my-bucket/schema-backup' --filter "mydb.*"

上記のコマンドで使用されるパラメーターは次のとおりです。その他のパラメーターについては、 Dumplingの概要を参照してください。

パラメータ	説明
`-u`または`--user`	Aurora MySQL ユーザー
`-p`または`--password`	MySQL ユーザーのパスワード
`-P`または`--port`	MySQL ポート
`-h`または`--host`	MySQL IP アドレス
`-t`または`--thread`	エクスポートに使用されるスレッドの数
`-o`または`--output`	エクスポートされたファイルを格納するディレクトリ。ローカルパスまたは外部ストレージ URLをサポート
`-r`または`--row`	1 つのファイルの最大行数
`-F`	1 つのファイルの最大サイズ (MiB 単位)。推奨値: 256 MiB。
`-B`または`--database`	エクスポートするデータベースを指定します
`-T`または`--tables-list`	指定されたテーブルをエクスポートします
`-d`または`--no-data`	データをエクスポートしません。スキーマのみをエクスポートします。
`-f`または`--filter`	パターンに一致するテーブルをエクスポートします。 `-f`と`-T`を同時に使用しないでください。構文については、テーブルフィルターを参照してください。

ステップ 3. TiDB Lightning構成ファイルを作成する

次のようにtidb-lightning.tomlの構成ファイルを作成します。

vim tidb-lightning.toml

[tidb]

# The target TiDB cluster information.
host = ${host}                # e.g.: 172.16.32.1
port = ${port}                # e.g.: 4000
user = "${user_name}          # e.g.: "root"
password = "${password}"      # e.g.: "rootroot"
status-port = ${status-port}  # Obtains the table schema information from TiDB status port, e.g.: 10080
pd-addr = "${ip}:${port}"     # The cluster PD address, e.g.: 172.16.31.3:2379. TiDB Lightning obtains some information from PD. When backend = "local", you must specify status-port and pd-addr correctly. Otherwise, the import will be abnormal.

[tikv-importer]
# "local": Default backend. The local backend is recommended to import large volumes of data (1 TiB or more). During the import, the target TiDB cluster cannot provide any service.
# "tidb": The "tidb" backend is recommended to import data less than 1 TiB. During the import, the target TiDB cluster can provide service normally.
backend = "local"

# Set the temporary storage directory for the sorted Key-Value files. The directory must be empty, and the storage space must be greater than the size of the dataset to be imported. For better import performance, it is recommended to use a directory different from `data-source-dir` and use flash storage, which can use I/O exclusively.
sorted-kv-dir = "/mnt/ssd/sorted-kv-dir"

[mydumper]
# The path that stores the snapshot file.
data-source-dir = "${s3_path}"  # e.g.: s3://my-bucket/sql-backup

[[mydumper.files]]
# The expression that parses the parquet file.
pattern = '(?i)^(?:[^/]*/)*([a-z0-9_]+)\.([a-z0-9_]+)/(?:[^/]*/)*(?:[a-z0-9\-_.]+\.(parquet))$'
schema = '$1'
table = '$2'
type = '$3'

TiDB クラスターで TLS を有効にする必要がある場合は、 TiDB LightningConfiguration / コンフィグレーションを参照してください。

ステップ 4. 完全なデータを TiDB にインポートする

TiDB Lightningを使用してターゲットデータベースにテーブルを作成します。
```
tiup tidb-lightning -config tidb-lightning.toml -d 's3://my-bucket/schema-backup'
```
tidb-lightningを実行してインポートを開始します。コマンドラインでプログラムを直接起動すると、プロセスが SIGHUP シグナルの受信後に予期せず終了することがあります。この場合、 nohupまたはscreenツールを使用してプログラムを実行することをお勧めします。例えば：
S3 ストレージパスにアクセスできる SecretKey と AccessKey を環境変数としてDumplingノードに渡します。 ~/.aws/credentialsから資格情報を読み取ることもできます。
```
export AWS_ACCESS_KEY_ID=${access_key}
export AWS_SECRET_ACCESS_KEY=${secret_key}
nohup tiup tidb-lightning -config tidb-lightning.toml > nohup.out 2>&1 &
```
インポートの開始後、次のいずれかの方法でインポートの進行状況を確認できます。
- grepログのキーワードprogress 。デフォルトでは、進行状況は 5 分ごとに更新されます。
- 監視ダッシュボードで進行状況を確認します。
- TiDB Lightning Web インターフェイスで進行状況を確認します。
TiDB Lightningがインポートを完了すると、自動的に終了します。 tidb-lightning.logの最後の行にthe whole procedure completedが含まれているかどうかを確認します。はいの場合、インポートは成功です。「いいえ」の場合、インポートでエラーが発生します。エラーメッセージの指示に従って、エラーに対処します。

ノート：
インポートが成功したかどうかに関係なく、ログの最後の行にtidb lightning exitが表示されます。これは、 TiDB Lightningが正常に終了したことを意味しますが、必ずしもインポートが成功したことを意味するものではありません。

インポート中に問題が発生した場合は、トラブルシューティングについてTiDB LightningFAQを参照してください。

増分データを TiDB に複製する (オプション)

前提条件

ステップ 1: データソースを作成する

次のようにsource1.yamlファイルを作成します。

# Must be unique.
source-id: "mysql-01"
# Configures whether DM-worker uses the global transaction identifier (GTID) to pull binlogs. To enable this mode, the upstream MySQL must also enable GTID. If the upstream MySQL service is configured to switch master between different nodes automatically, GTID mode is required.
enable-gtid: false

from:
  host: "${host}"         # e.g.: 172.16.10.81
  user: "root"
  password: "${password}" # Supported but not recommended to use plaintext password. It is recommended to use `dmctl encrypt` to encrypt the plaintext password before using it.
  port: 3306

次のコマンドを実行して、 tiup dmctlを使用してデータソース構成を DM クラスターに読み込みます。

tiup dmctl --master-addr ${advertise-addr} operate-source create source1.yaml

上記のコマンドで使用されるパラメーターは、次のとおりです。

パラメータ	説明
`--master-addr`	`dmctl`が接続されるクラスタ内の任意の DM マスターの`{advertise-addr}`例: 172.16.10.71:8261
`operate-source create`	データソースを DM クラスターに読み込みます。

ステップ 2: 移行タスクを作成する

次のようにtask1.yamlファイルを作成します。

# Task name. Multiple tasks that are running at the same time must each have a unique name.
name: "test"
# Task mode. Options are:
# - full: only performs full data migration.
# - incremental: only performs binlog real-time replication.
# - all: full data migration + binlog real-time replication.
task-mode: "incremental"
# The configuration of the target TiDB database.
target-database:
  host: "${host}"                   # e.g.: 172.16.10.83
  port: 4000
  user: "root"
  password: "${password}"           # Supported but not recommended to use a plaintext password. It is recommended to use `dmctl encrypt` to encrypt the plaintext password before using it.

# Global configuration for block and allow lists. Each instance can reference the configuration by name.
block-allow-list:                     # If the DM version is earlier than v2.0.0-beta.2, use black-white-list.
  listA:                              # Name.
    do-tables:                        # Allow list for the upstream tables to be migrated.
    - db-name: "test_db"              # Name of databases to be migrated.
      tbl-name: "test_table"          # Name of tables to be migrated.

# Configures the data source.
mysql-instances:
  - source-id: "mysql-01"               # Data source ID, i.e., source-id in source1.yaml
    block-allow-list: "listA"           # References the block-allow-list configuration above.
#       syncer-config-name: "global"    # References the syncers incremental data configuration.
    meta:                               # When task-mode is "incremental" and the downstream database does not have a checkpoint, DM uses the binlog position as the starting point. If the downstream database has a checkpoint, DM uses the checkpoint as the starting point.
      binlog-name: "mysql-bin.000004"   # The binlog position recorded in "Step 1. Export an Aurora snapshot to Amazon S3". When the upstream database has source-replica switching, GTID mode is required.
      binlog-pos: 109227
      # binlog-gtid: "09bec856-ba95-11ea-850a-58f2b4af5188:1-9"

   # (Optional) If you need to incrementally replicate data that has already been migrated in the full data migration, you need to enable the safe mode to avoid the incremental data replication error.
   # This scenario is common in the following case: the full migration data does not belong to the data source's consistency snapshot, and after that, DM starts to replicate incremental data from a position earlier than the full migration.
   # syncers:            # The running configurations of the sync processing unit.
   #   global:            # Configuration name.
   #     safe-mode: true  # If this field is set to true, DM changes INSERT of the data source to REPLACE for the target database, and changes UPDATE of the data source to DELETE and REPLACE for the target database. This is to ensure that when the table schema contains a primary key or unique index, DML statements can be imported repeatedly. In the first minute of starting or resuming an incremental replication task, DM automatically enables the safe mode.

上記の YAML ファイルは、移行タスクに必要な最小限の構成です。その他の設定項目については、 DM 拡張タスクConfiguration / コンフィグレーションファイルを参照してください。

ステップ 3.移行タスクを実行する

移行タスクを開始する前に、エラーの可能性を減らすために、次のcheck-taskコマンドを実行して、構成が DM の要件を満たしていることを確認することをお勧めします。

tiup dmctl --master-addr ${advertise-addr} check-task task.yaml

その後、 tiup dmctlを実行して移行タスクを開始します。

tiup dmctl --master-addr ${advertise-addr} start-task task.yaml

上記のコマンドで使用されるパラメーターは、次のとおりです。

パラメータ	説明
`--master-addr`	`dmctl`が接続されるクラスタ内の任意の DM マスターの`{advertise-addr}`例: 172.16.10.71:8261
`start-task`	移行タスクを開始します。

タスクの開始に失敗した場合は、プロンプトメッセージを確認し、構成を修正します。その後、上記のコマンドを再実行してタスクを開始できます。

問題が発生した場合は、 DM エラー処理およびDMFAQを参照してください。

ステップ 4. 移行タスクのステータスを確認する

DM クラスターに進行中の移行タスクとタスクのステータスがあるかどうかを確認するには、 tiup dmctlを使用してquery-statusコマンドを実行します。

tiup dmctl --master-addr ${advertise-addr} query-status ${task-name}

結果の詳細な解釈については、クエリのステータスを参照してください。

ステップ 5. タスクを監視してログを表示する

移行タスクの履歴ステータスとその他の内部メトリックを表示するには、次の手順を実行します。

TiUP を使用して DM をデプロイしたときに Prometheus、Alertmanager、および Grafana をデプロイした場合は、デプロイ中に指定された IP アドレスとポートを使用して Grafana にアクセスできます。次に、DM ダッシュボードを選択して、DM 関連のモニタリングメトリックを表示できます。

DM が実行されている場合、DM-worker、DM-master、および dmctl は関連情報をログに出力します。これらのコンポーネントのログディレクトリは次のとおりです。

DM-master: DM-master プロセスパラメータ--log-fileによって指定されます。 TiUP を使用して DM を展開する場合、ログディレクトリはデフォルトで/dm-deploy/dm-master-8261/log/です。
DM-worker: DM-worker プロセスパラメータ--log-fileによって指定されます。 TiUP を使用して DM を展開する場合、ログディレクトリはデフォルトで/dm-deploy/dm-worker-8262/log/です。

Amazon Auroraから TiDB にデータを移行する

前提条件

完全なデータを TiDB にインポートする

ステップAuroraスナップショットを Amazon S3 にエクスポートする

ステップ 2. スキーマのエクスポート

ステップ 3. TiDB Lightning構成ファイルを作成する

ステップ 4. 完全なデータを TiDB にインポートする

増分データを TiDB に複製する (オプション)

前提条件

ステップ 1: データ ソースを作成する

ステップ 2: 移行タスクを作成する

ステップ 3.移行タスクを実行する

ステップ 4. 移行タスクのステータスを確認する

ステップ 5. タスクを監視してログを表示する

次は何ですか

ステップ 1: データソースを作成する