LOAD DATA
The LOAD DATA
statement batch loads data into a TiDB table.
In TiDB v7.0.0, the LOAD DATA
SQL statement supports the following features:
- Support importing data from S3 and GCS
- Add a new parameter
FIELDS DEFINED NULL BY
Synopsis
- LoadDataStmt
- LocalOpt
- Fields
LoadDataStmt ::=
'LOAD' 'DATA' LocalOpt 'INFILE' stringLit DuplicateOpt 'INTO' 'TABLE' TableName CharsetOpt Fields Lines IgnoreLines ColumnNameOrUserVarListOptWithBrackets LoadDataSetSpecOpt
LocalOpt ::= ('LOCAL')?
Fields ::=
('TERMINATED' 'BY' stringLit
| ('OPTIONALLY')? 'ENCLOSED' 'BY' stringLit
| 'ESCAPED' 'BY' stringLit
| 'DEFINED' 'NULL' 'BY' stringLit ('OPTIONALLY' 'ENCLOSED')?)?
Parameters
LOCAL
You can use LOCAL
to specify data files on the client to be imported, where the file parameter must be the file system path on the client.
S3 and GCS storage
If you do not specify LOCAL
, the file parameter must be a valid S3 or GCS path, as detailed in external storage.
When the data files are stored on S3 or GCS, you can import individual files or use the wildcard character *
to match multiple files to be imported. Note that wildcards do not recursively process files in subdirectories. The following are some examples:
- Import a single file:
s3://<bucket-name>/path/to/data/foo.csv
- Import all files in the specified path:
s3://<bucket-name>/path/to/data/*
- Import all files ending with
.csv
under the specified path:s3://<bucket-name>/path/to/data/*.csv
- Import all files prefixed with
foo
under the specified path:s3://<bucket-name>/path/to/data/foo*
- Import all files prefixed with
foo
and ending with.csv
under the specified path:s3://<bucket-name>/path/to/data/foo*.csv
Fields
, Lines
, and Ignore Lines
You can use the Fields
and Lines
parameters to specify how to handle the data format.
FIELDS TERMINATED BY
: specifies the data delimiter.FIELDS ENCLOSED BY
: specifies the enclosing character of the data.LINES TERMINATED BY
: specifies the line terminator, if you want to end a line with a certain character.
You can use DEFINED NULL BY
to specify how NULL values are represented in the data file.
- Consistent with MySQL behavior, if
ESCAPED BY
is not null, for example, if the default value\
is used, then\N
will be considered a NULL value. - If you use
DEFINED NULL BY
, such asDEFINED NULL BY 'my-null'
,my-null
is considered a NULL value. - If you use
DEFINED NULL BY ... OPTIONALLY ENCLOSED
, such asDEFINED NULL BY 'my-null' OPTIONALLY ENCLOSED
,my-null
and"my-null"
(assumingENCLOSED BY '"
) are considered NULL values. - If you do not use
DEFINED NULL BY
orDEFINED NULL BY ... OPTIONALLY ENCLOSED
, but useENCLOSED BY
, such asENCLOSED BY '"'
, thenNULL
is considered a NULL value. This behavior is consistent with MySQL. - In other cases, it is not considered a NULL value.
Take the following data format as an example:
"bob","20","street 1"\r\n
"alice","33","street 1"\r\n
If you want to extract bob
, 20
, and street 1
, specify the field delimiter as ','
, and the enclosing character as '\"'
:
FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\r\n'
If you do not specify the preceding parameters, the imported data is processed in the following way by default:
FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'
LINES TERMINATED BY '\n' STARTING BY ''
You can ignore the first number
lines of a file by configuring the IGNORE <number> LINES
parameter. For example, if you configure IGNORE 1 LINES
, the first line of a file is ignored.
Examples
The following example imports data using LOAD DATA
. Comma is specified as the field delimiter. The double quotation marks that enclose the data are ignored. The first line of the file is ignored.
If you see ERROR 1148 (42000): the used command is not allowed with this TiDB version
, refer to ERROR 1148 (42000): the used command is not allowed with this TiDB version for troubleshooting.
LOAD DATA LOCAL INFILE '/mnt/evo970/data-sets/bikeshare-data/2017Q4-capitalbikeshare-tripdata.csv' INTO TABLE trips FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (duration, start_date, end_date, start_station_number, start_station, end_station_number, end_station, bike_number, member_type);
Query OK, 815264 rows affected (39.63 sec)
Records: 815264 Deleted: 0 Skipped: 0 Warnings: 0
LOAD DATA
also supports using hexadecimal ASCII character expressions or binary ASCII character expressions as the parameters for FIELDS ENCLOSED BY
and FIELDS TERMINATED BY
. See the following example:
LOAD DATA LOCAL INFILE '/mnt/evo970/data-sets/bikeshare-data/2017Q4-capitalbikeshare-tripdata.csv' INTO TABLE trips FIELDS TERMINATED BY x'2c' ENCLOSED BY b'100010' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (duration, start_date, end_date, start_station_number, start_station, end_station_number, end_station, bike_number, member_type);
In the above example, x'2c'
is the hexadecimal representation of the ,
character, and b'100010'
is the binary representation of the "
character.
MySQL compatibility
The syntax of the LOAD DATA
statement is compatible with that of MySQL, except for character set options which are parsed but ignored. If you find any syntax compatibility difference, you can report it via an issue on GitHub.