String Functions
TiDB supports most of the string functions available in MySQL 5.7, some of the string functions available in MySQL 8.0, and some of the functions available in Oracle 21.
For comparisons between functions and syntax of Oracle and TiDB, see Comparisons between Functions and Syntax of Oracle and TiDB.
Supported functions
ASCII()
The ASCII(str)
function is used to get the ASCII value of the leftmost character in the given argument. The argument can be either a string or a number.
- If the argument is not empty, the function returns the ASCII value of the leftmost character.
- If the argument is an empty string, the function returns
0
. - If the argument is
NULL
, the function returnsNULL
.
Example:
SELECT ASCII('A'), ASCII('TiDB'), ASCII(23);
Output:
+------------+---------------+-----------+
| ASCII('A') | ASCII('TiDB') | ASCII(23) |
+------------+---------------+-----------+
| 65 | 84 | 50 |
+------------+---------------+-----------+
BIN()
The BIN()
function is used to convert the given argument into a string representation of its binary value. The argument can be either a string or a number.
- If the argument is a positive number, the function returns a string representation of its binary value.
- If the argument is a negative number, the function converts the absolute value of the argument to its binary representation, inverts each bit of the binary value (changing
0
to1
and1
to0
), and then adds1
to the inverted value. - If the argument is a string containing only digits, the function returns the result according to those digits. For example, the results for
"123"
and123
are the same. - If the argument is a string and its first character is not a digit (such as
"q123"
), the function returns0
. - If the argument is a string that consists of digits and non-digits, the function returns the result according to the consecutive digits at the beginning of the argument. For example, the results for
"123q123"
and123
are the same, butBIN('123q123')
generates a warning likeTruncated incorrect INTEGER value: '123q123'
. - If the argument is
NULL
, the function returnsNULL
.
Example 1:
SELECT BIN(123), BIN('123q123');
Output 1:
+----------+----------------+
| BIN(123) | BIN('123q123') |
+----------+----------------+
| 1111011 | 1111011 |
+----------+----------------+
Example 2:
SELECT BIN(-7);
Output 2:
+------------------------------------------------------------------+
| BIN(-7) |
+------------------------------------------------------------------+
| 1111111111111111111111111111111111111111111111111111111111111001 |
+------------------------------------------------------------------+
BIT_LENGTH()
The BIT_LENGTH()
function is used to return the length of a given argument in bits.
Examples:
SELECT BIT_LENGTH("TiDB");
+--------------------+
| BIT_LENGTH("TiDB") |
+--------------------+
| 32 |
+--------------------+
8 bits per character x 4 characters = 32 bits
SELECT BIT_LENGTH("PingCAP 123");
+---------------------------+
| BIT_LENGTH("PingCAP 123") |
+---------------------------+
| 88 |
+---------------------------+
8 bits per character (space is counted because it is a non-alphanumeric character) x 11 characters = 88 bits
SELECT CustomerName, BIT_LENGTH(CustomerName) AS BitLengthOfName FROM Customers;
+--------------------+-----------------+
| CustomerName | BitLengthOfName |
+--------------------+-----------------+
| Albert Einstein | 120 |
| Robert Oppenheimer | 144 |
+--------------------+-----------------+
CHAR()
The CHAR()
function is used to get the corresponding character of a specific ASCII value. It performs the opposite operation of ASCII()
, which returns the ASCII value of a specific character.
Examples:
SELECT CHAR(65);
+------------+
| CHAR(65) |
+------------+
| A |
+------------+
SELECT CHAR(84);
+------------+
| CHAR(84) |
+------------+
| T |
+------------+
The CHAR()
function can also be used to get the corresponding character of ASCII values that extend beyond the standard ASCII range (0
- 127
).
/*For extended ASCII: */
SELECT CHAR(128);
+------------+
| CHAR(128) |
+------------+
| 0x80 |
+------------+
The CHAR()
function can also get the corresponding character value of a unicode value.
/* For Unicode: */
--skip-binary-as-hex
SELECT CHAR(50089);
+--------------+
| CHAR(50089) |
+--------------+
| é |
+--------------+
CHAR_LENGTH()
The CHAR_LENGTH()
function is used to get the total number of characters in a given argument as an integer.
Examples:
SELECT CHAR_LENGTH("TiDB") AS LengthOfString;
+----------------+
| LengthOfString |
+----------------+
| 4 |
+----------------+
SELECT CustomerName, CHAR_LENGTH(CustomerName) AS LengthOfName FROM Customers;
+--------------------+--------------+
| CustomerName | LengthOfName |
+--------------------+--------------+
| Albert Einstein | 15 |
| Robert Oppenheimer | 18 |
+--------------------+--------------+
CHARACTER_LENGTH()
The CHARACTER_LENGTH()
function is the same as the CHAR_LENGTH()
function. Both functions can be used synonymously because they generate the same output.
CONCAT()
The CONCAT()
function concatenates one or more arguments into a single string.
Syntax:
CONCAT(str1,str2,...)
str1, str2, ...
is a list of arguments to be concatenated. Each argument can be a string or a number.
Example:
SELECT CONCAT('TiDB', ' ', 'Server', '-', 1, TRUE);
Output:
+---------------------------------------------+
| CONCAT('TiDB', ' ', 'Server', '-', 1, TRUE) |
+---------------------------------------------+
| TiDB Server-11 |
+---------------------------------------------+
If any of the arguments is NULL
, CONCAT()
returns NULL
.
Example:
SELECT CONCAT('TiDB', NULL, 'Server');
Output:
+--------------------------------+
| CONCAT('TiDB', NULL, 'Server') |
+--------------------------------+
| NULL |
+--------------------------------+
In addition to the CONCAT()
function, you can concatenate strings by placing them adjacent to each other as in the following example. Note that this method does not support numeric types.
SELECT 'Ti' 'DB' ' ' 'Server';
Output:
+-------------+
| Ti |
+-------------+
| TiDB Server |
+-------------+
CONCAT_WS()
The CONCAT_WS()
function is a form of CONCAT()
with a separator, which returns a string concatenated by the specified separator.
Syntax:
CONCAT_WS(separator,str1,str2,...)
separator
: the first argument is the separator, which concatenates the remaining arguments that are notNULL
.str1, str2, ...
: a list of arguments to be concatenated. Each argument can be a string or a number.
Example:
SELECT CONCAT_WS(',', 'TiDB Server', 'TiKV', 'PD');
Output:
+---------------------------------------------+
| CONCAT_WS(',', 'TiDB Server', 'TiKV', 'PD') |
+---------------------------------------------+
| TiDB Server,TiKV,PD |
+---------------------------------------------+
If the separator is an empty string,
CONCAT_WS()
is equivalent toCONCAT()
and returns the concatenated string of the remaining arguments.Example:
SELECT CONCAT_WS('', 'TiDB Server', 'TiKV', 'PD');Output:
+--------------------------------------------+ | CONCAT_WS('', 'TiDB Server', 'TiKV', 'PD') | +--------------------------------------------+ | TiDB ServerTiKVPD | +--------------------------------------------+If the separator is
NULL
,CONCAT_WS()
returnsNULL
.Example:
SELECT CONCAT_WS(NULL, 'TiDB Server', 'TiKV', 'PD');Output:
+----------------------------------------------+ | CONCAT_WS(NULL, 'TiDB Server', 'TiKV', 'PD') | +----------------------------------------------+ | NULL | +----------------------------------------------+If only one of the arguments to be concatenated is not
NULL
,CONCAT_WS()
returns that argument.Example:
SELECT CONCAT_WS(',', 'TiDB Server', NULL);Output:
+-------------------------------------+ | CONCAT_WS(',', 'TiDB Server', NULL) | +-------------------------------------+ | TiDB Server | +-------------------------------------+If there are
NULL
arguments to be concatenated,CONCAT_WS()
skips theseNULL
arguments.Example:
SELECT CONCAT_WS(',', 'TiDB Server', NULL, 'PD');Output:
+-------------------------------------------+ | CONCAT_WS(',', 'TiDB Server', NULL, 'PD') | +-------------------------------------------+ | TiDB Server,PD | +-------------------------------------------+If there are empty strings to be concatenated,
CONCAT_WS()
does not skip empty strings.Example:
SELECT CONCAT_WS(',', 'TiDB Server', '', 'PD');Output:
+-----------------------------------------+ | CONCAT_WS(',', 'TiDB Server', '', 'PD') | +-----------------------------------------+ | TiDB Server,,PD | +-----------------------------------------+
ELT()
Return string at index number.
EXPORT_SET()
Return a string such that for every bit set in the value bits, you get an on string and for every unset bit, you get an off string.
FIELD()
Return the index (position)of the first argument in the subsequent arguments.
FIND_IN_SET()
Return the index position of the first argument within the second argument.
FORMAT()
The FORMAT(X,D[,locale])
function is used to format the number X
to a format similar to "#,###,###. ##"
, rounded to D
decimal places, and return the result as a string.
Arguments:
X
: the number to be formatted. It can be a direct numeric value, a numeric string, or a number in scientific notation.D
: the number of decimal places for the returned value. The function rounds the numberX
toD
decimal places. IfD
is greater than the actual number of decimal places inX
, the result is padded with zeros to the corresponding length.[locale]
: specifies a locale setting to be used for grouping between decimal points, thousands separators, and separators for resultant numbers. A valid locale value is the same as the valid value of thelc_time_names
system variable. If not specified or the region setting isNULL
, the'en_US'
region setting is used by default. This argument is optional.
Behaviors:
- If the first argument is a string and contains only numbers, the function returns a result based on that numeric value. For example,
FORMAT('12.34', 1)
andFORMAT(12.34, 1)
return the same result. - If the first argument is a number represented in scientific notation (using
E/e
), the function returns the result based on that number. For example,FORMAT('1E2', 3)
returns100.000
. - If the first argument is a string starting with non-numeric characters, the function returns zero and a warning
(Code 1292)
. For example,FORMAT('q12.36', 5)
returns0.00000
, but also includes a warningWarning (Code 1292): Truncated incorrect DOUBLE value: 'q12.36'
. - If the first argument is a string mixing numbers and non-numbers, the function returns a result based on the consecutive numeric part at the beginning of the argument, and also includes a warning
(Code 1292)
. For example,FORMAT('12.36q56.78', 1)
returns the same numeric result asFORMAT('12.36', 1)
, but includes a warningWarning (Code 1292): Truncated incorrect DOUBLE value: '12.36q56.78'
. - If the second argument is zero or a negative number, the function truncates the decimal part and returns an integer.
- If any of the arguments is
NULL
, the function returnsNULL
.
Examples:
The following examples show how to format the number 12.36 to different decimal places:
mysql> SELECT FORMAT(12.36, 1);
+------------------+
| FORMAT(12.36, 1) |
+------------------+
| 12.4 |
+------------------+
mysql> SELECT FORMAT(12.36, 5);
+------------------+
| FORMAT(12.36, 5) |
+------------------+
| 12.36000 |
+------------------+
mysql> SELECT FORMAT(12.36, 2);
+------------------+
| FORMAT(12.36, 2) |
+------------------+
| 12.36 |
+------------------+
FROM_BASE64()
The FROM_BASE64()
function is used to decode a Base64 encoded string and return the decoded result in its hexadecimal form.
- This function accepts a single argument, that is, the Base64 encoded string to be decoded.
- If the argument is
NULL
or not a valid Base64 encoded string, theFROM_BASE64()
function returnsNULL
.
Examples:
The following example shows how to decode the Base64 encoded string 'SGVsbG8gVGlEQg=='
. This string is the result of encoding 'Hello TiDB'
, using the TO_BASE64()
function.
mysql> SELECT TO_BASE64('Hello TiDB');
+-------------------------+
| TO_BASE64('Hello TiDB') |
+-------------------------+
| SGVsbG8gVGlEQg== |
+-------------------------+
mysql> SELECT FROM_BASE64('SGVsbG8gVGlEQg==');
+------------------------------------------------------------------+
| FROM_BASE64('SGVsbG8gVGlEQg==') |
+------------------------------------------------------------------+
| 0x48656C6C6F2054694442 |
+------------------------------------------------------------------+
mysql> SELECT CONVERT(FROM_BASE64('SGVsbG8gVGlEQg==') USING utf8mb4);
+--------------------------------------------------------+
| CONVERT(FROM_BASE64('SGVsbG8gVGlEQg==') USING utf8mb4) |
+--------------------------------------------------------+
| Hello TiDB |
+--------------------------------------------------------+
The following example shows how to decode the Base64 encoded number MTIzNDU2
. This string is the result of encoding 123456
, which can be done using the TO_BASE64()
function.
mysql> SELECT FROM_BASE64('MTIzNDU2');
+--------------------------------------------------+
| FROM_BASE64('MTIzNDU2') |
+--------------------------------------------------+
| 0x313233343536 |
+--------------------------------------------------+
HEX()
The HEX()
function is used to convert the given argument into a string representation of its hexadecimal value. The argument can be either a string or a number.
- If the argument is a string,
HEX(str)
returns a hexadecimal string representation ofstr
. The function converts each byte of each character instr
into two hexadecimal digits. For example, the charactera
in a UTF-8 or ASCII character set is represented as a binary value of00111101
, or61
in hexadecimal notation. - If the argument is a number,
HEX(n)
returns a hexadecimal string representation ofn
. The function treats the argumentn
as aBIGINT
number, equivalent to usingCONV(n, 10, 16)
. - If the argument is
NULL
, the function returnsNULL
.
Examples:
SELECT X'616263', HEX('abc'), UNHEX(HEX('abc')), 0x616263;
+-----------+------------+-------------------+----------+
| X'616263' | HEX('abc') | UNHEX(HEX('abc')) | 0x616263 |
+-----------+------------+-------------------+----------+
| abc | 616263 | abc | abc |
+-----------+------------+-------------------+----------+
SELECT X'F09F8DA3', HEX('🍣'), UNHEX(HEX('🍣')), 0xF09F8DA3;
+-------------+-------------+--------------------+------------+
| X'F09F8DA3' | HEX('🍣') | UNHEX(HEX('🍣')) | 0xF09F8DA3 |
+-------------+-------------+--------------------+------------+
| 🍣 | F09F8DA3 | 🍣 | 🍣 |
+-------------+-------------+--------------------+------------+
SELECT HEX(255), CONV(HEX(255), 16, 10);
+----------+------------------------+
| HEX(255) | CONV(HEX(255), 16, 10) |
+----------+------------------------+
| FF | 255 |
+----------+------------------------+
SELECT HEX(NULL);
+-----------+
| HEX(NULL) |
+-----------+
| NULL |
+-----------+
INSERT()
The INSERT(str, pos, len, newstr)
function is used to replace a substring in str
(that starts at position pos
and is len
characters long) with the string newstr
. This function is multibyte safe.
- If
pos
exceeds the length ofstr
, the function returns the original stringstr
without modification. - If
len
exceeds the remaining length ofstr
from positionpos
, the function replaces the rest of the string from positionpos
. - If any argument is
NULL
, the function returnsNULL
.
Examples:
SELECT INSERT('He likes tennis', 4, 5, 'plays');
+------------------------------------------+
| INSERT('He likes tennis', 4, 5, 'plays') |
+------------------------------------------+
| He plays tennis |
+------------------------------------------+
SELECT INSERT('He likes tennis', -1, 5, 'plays');
+-------------------------------------------+
| INSERT('He likes tennis', -1, 5, 'plays') |
+-------------------------------------------+
| He likes tennis |
+-------------------------------------------+
SELECT INSERT('He likes tennis', 4, 100, 'plays');
+--------------------------------------------+
| INSERT('He likes tennis', 4, 100, 'plays') |
+--------------------------------------------+
| He plays |
+--------------------------------------------+
SELECT INSERT('He likes tenis', 10, 100, '🍣');
+-------------------------------------------+
| INSERT('He likes tenis', 10, 100, '🍣') |
+-------------------------------------------+
| He likes 🍣 |
+-------------------------------------------+
SELECT INSERT('あああああああ', 2, 3, 'いいい');
+----------------------------------------------------+
| INSERT('あああああああ', 2, 3, 'いいい') |
+----------------------------------------------------+
| あいいいあああ |
+----------------------------------------------------+
SELECT INSERT('あああああああ', 2, 3, 'xx');
+---------------------------------------------+
| INSERT('あああああああ', 2, 3, 'xx') |
+---------------------------------------------+
| あxxあああ |
+---------------------------------------------+
INSTR()
The INSTR(str, substr)
function is used to get the position of the first occurrence of substr
in str
. Each argument can be either a string or a number. This function is the same as the two-argument version of LOCATE(substr, str)
, but with the order of the arguments reversed.
- If either argument is a number, the function treats the number as a string.
- If
substr
is not instr
, the function returns0
. Otherwise, it returns the position of the first occurrence ofsubstr
instr
. - If either argument is
NULL
, the function returnsNULL
.
Examples:
SELECT INSTR("pingcap.com", "tidb");
+------------------------------+
| INSTR("pingcap.com", "tidb") |
+------------------------------+
| 0 |
+------------------------------+
SELECT INSTR("pingcap.com/tidb", "tidb");
+-----------------------------------+
| INSTR("pingcap.com/tidb", "tidb") |
+-----------------------------------+
| 13 |
+-----------------------------------+
SELECT INSTR("pingcap.com/tidb" COLLATE utf8mb4_bin, "TiDB");
+-------------------------------------------------------+
| INSTR("pingcap.com/tidb" COLLATE utf8mb4_bin, "TiDB") |
+-------------------------------------------------------+
| 0 |
+-------------------------------------------------------+
SELECT INSTR("pingcap.com/tidb" COLLATE utf8mb4_general_ci, "TiDB");
+--------------------------------------------------------------+
| INSTR("pingcap.com/tidb" COLLATE utf8mb4_general_ci, "TiDB") |
+--------------------------------------------------------------+
| 13 |
+--------------------------------------------------------------+
SELECT INSTR(0123, "12");
+-------------------+
| INSTR(0123, "12") |
+-------------------+
| 1 |
+-------------------+
LCASE()
The LCASE(str)
function is a synonym for LOWER(str)
, which returns the lowercase of the given argument.
LEFT()
The LEFT()
function returns a specified number of characters from the left side of a string.
Syntax:
LEFT(`str`, `len`)
str
: the original string to extract characters. Ifstr
contains a multibyte character, the function counts it as a single code point.len
: the length of characters to be returned.- If
len
is equal to or less than 0, the function returns an empty string. - If
len
is equal to or greater than the length ofstr
, the function returns the originalstr
.
- If
- If any argument is
NULL
, the function returnsNULL
.
Examples:
SELECT LEFT('ABCED', 3);
+------------------+
| LEFT('ABCED', 3) |
+------------------+
| ABC |
+------------------+
SELECT LEFT('ABCED', 6);
+------------------+
| LEFT('ABCED', 6) |
+------------------+
| ABCED |
+------------------+
SELECT LEFT('ABCED', 0);
+------------------+
| LEFT('ABCED', 0) |
+------------------+
| |
+------------------+
SELECT LEFT('ABCED', -1);
+-------------------+
| LEFT('ABCED', -1) |
+-------------------+
| |
+-------------------+
SELECT LEFT('🍣ABC', 3);
+--------------------+
| LEFT('🍣ABC', 3) |
+--------------------+
| 🍣AB |
+--------------------+
SELECT LEFT('ABC', NULL);
+-------------------+
| LEFT('ABC', NULL) |
+-------------------+
| NULL |
+-------------------+
SELECT LEFT(NULL, 3);
+------------------------------+
| LEFT(NULL, 3) |
+------------------------------+
| NULL |
+------------------------------+
LENGTH()
The LENGTH()
function returns the length of a string in bytes.
LENGTH()
counts a multibyte character as multiple bytes while CHAR_LENGTH()
counts a multibyte character as a single code point.
If the argument is NULL
, the function returns NULL
.
Examples:
SELECT LENGTH('ABC');
+---------------+
| LENGTH('ABC') |
+---------------+
| 3 |
+---------------+
SELECT LENGTH('🍣ABC');
+-------------------+
| LENGTH('🍣ABC') |
+-------------------+
| 7 |
+-------------------+
SELECT CHAR_LENGTH('🍣ABC');
+------------------------+
| CHAR_LENGTH('🍣ABC') |
+------------------------+
| 4 |
+------------------------+
SELECT LENGTH(NULL);
+--------------+
| LENGTH(NULL) |
+--------------+
| NULL |
+--------------+
LIKE
The LIKE
operator is used for simple string matching. The expression expr LIKE pat [ESCAPE 'escape_char']
returns 1
(TRUE
) or 0
(FALSE
). If either expr
or pat
is NULL
, the result is NULL
.
You can use the following two wildcard parameters with LIKE
:
%
matches any number of characters, including zero characters._
matches exactly one character.
The following examples use the utf8mb4_bin
collation:
SET collation_connection='utf8mb4_bin';
SHOW VARIABLES LIKE 'collation_connection';
+----------------------+-------------+
| Variable_name | Value |
+----------------------+-------------+
| collation_connection | utf8mb4_bin |
+----------------------+-------------+
SELECT NULL LIKE '%' as result;
+--------+
| result |
+--------+
| NULL |
+--------+
SELECT 'sushi!!!' LIKE 'sushi_' AS result;
+--------+
| result |
+--------+
| 0 |
+--------+
SELECT '🍣🍺sushi🍣🍺' LIKE '%sushi%' AS result;
+--------+
| result |
+--------+
| 1 |
+--------+
SELECT '🍣🍺sushi🍣🍺' LIKE '%SUSHI%' AS result;
+--------+
| result |
+--------+
| 0 |
+--------+
SELECT '🍣🍺sushi🍣🍺' LIKE '%🍣%' AS result;
+--------+
| result |
+--------+
| 1 |
+--------+
The default escape character is \
:
SELECT 'sushi!!!' LIKE 'sushi\_' AS result;
+--------+
| result |
+--------+
| 0 |
+--------+
SELECT 'sushi_' LIKE 'sushi\_' AS result;
+--------+
| result |
+--------+
| 1 |
+--------+
To specify a different escape character, such as *
, you can use the ESCAPE
clause:
SELECT 'sushi_' LIKE 'sushi*_' ESCAPE '*' AS result;
+--------+
| result |
+--------+
| 1 |
+--------+
SELECT 'sushi!' LIKE 'sushi*_' ESCAPE '*' AS result;
+--------+
| result |
+--------+
| 0 |
+--------+
You can use the LIKE
operator to match a numeric value:
SELECT 10 LIKE '1%' AS result;
+--------+
| result |
+--------+
| 1 |
+--------+
SELECT 10000 LIKE '12%' AS result;
+--------+
| result |
+--------+
| 0 |
+--------+
To specify a collation explicitly, such as utf8mb4_unicode_ci
, you can use COLLATE
:
SELECT '🍣🍺Sushi🍣🍺' COLLATE utf8mb4_unicode_ci LIKE '%SUSHI%' AS result;
+--------+
| result |
+--------+
| 1 |
+--------+
LOCATE()
The LOCATE(substr, str[, pos])
function is used to get the position of the first occurrence of a specified substring substr
in a string str
. The pos
argument is optional and specifies the starting position for the search.
- If the substring
substr
is not present instr
, the function returns0
. - If any argument is
NULL
, the function returnsNULL
. - This function is multibyte safe and performs a case-sensitive search only if at least one argument is a binary string.
The following examples use the utf8mb4_bin
collation:
SET collation_connection='utf8mb4_bin';
SHOW VARIABLES LIKE 'collation_connection';
+----------------------+-------------+
| Variable_name | Value |
+----------------------+-------------+
| collation_connection | utf8mb4_bin |
+----------------------+-------------+
SELECT LOCATE('bar', 'foobarbar');
+----------------------------+
| LOCATE('bar', 'foobarbar') |
+----------------------------+
| 4 |
+----------------------------+
SELECT LOCATE('baz', 'foobarbar');
+----------------------------+
| LOCATE('baz', 'foobarbar') |
+----------------------------+
| 0 |
+----------------------------+
SELECT LOCATE('bar', 'fooBARBAR');
+----------------------------+
| LOCATE('bar', 'fooBARBAR') |
+----------------------------+
| 0 |
+----------------------------+
SELECT LOCATE('bar', 'foobarBAR', 100);
+---------------------------------+
| LOCATE('bar', 'foobarBAR', 100) |
+---------------------------------+
| 0 |
+---------------------------------+
SELECT LOCATE('bar', 'foobarbar', 5);
+-------------------------------+
| LOCATE('bar', 'foobarbar', 5) |
+-------------------------------+
| 7 |
+-------------------------------+
SELECT LOCATE('bar', NULL);
+---------------------+
| LOCATE('bar', NULL) |
+---------------------+
| NULL |
+---------------------+
SELECT LOCATE('い', 'たいでぃーびー');
+----------------------------------------+
| LOCATE('い', 'たいでぃーびー') |
+----------------------------------------+
| 2 |
+----------------------------------------+
SELECT LOCATE('い', 'たいでぃーびー', 3);
+-------------------------------------------+
| LOCATE('い', 'たいでぃーびー', 3) |
+-------------------------------------------+
| 0 |
+-------------------------------------------+
The following examples use the utf8mb4_unicode_ci
collation:
SET collation_connection='utf8mb4_unicode_ci';
SHOW VARIABLES LIKE 'collation_connection';
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_unicode_ci |
+----------------------+--------------------+
SELECT LOCATE('い', 'たいでぃーびー', 3);
+-------------------------------------------+
| LOCATE('い', 'たいでぃーびー', 3) |
+-------------------------------------------+
| 4 |
+-------------------------------------------+
SELECT LOCATE('🍺', '🍣🍣🍣🍺🍺');
+----------------------------------------+
| LOCATE('🍺', '🍣🍣🍣🍺🍺') |
+----------------------------------------+
| 1 |
+----------------------------------------+
The following multibyte and binary string examples use the utf8mb4_bin
collation:
SET collation_connection='utf8mb4_bin';
SHOW VARIABLES LIKE 'collation_connection';
+----------------------+-------------+
| Variable_name | Value |
+----------------------+-------------+
| collation_connection | utf8mb4_bin |
+----------------------+-------------+
SELECT LOCATE('🍺', '🍣🍣🍣🍺🍺');
+----------------------------------------+
| LOCATE('🍺', '🍣🍣🍣🍺🍺') |
+----------------------------------------+
| 4 |
+----------------------------------------+
SELECT LOCATE('b', _binary'aBcde');
+-----------------------------+
| LOCATE('b', _binary'aBcde') |
+-----------------------------+
| 0 |
+-----------------------------+
SELECT LOCATE('B', _binary'aBcde');
+-----------------------------+
| LOCATE('B', _binary'aBcde') |
+-----------------------------+
| 2 |
+-----------------------------+
SELECT LOCATE(_binary'b', 'aBcde');
+-----------------------------+
| LOCATE(_binary'b', 'aBcde') |
+-----------------------------+
| 0 |
+-----------------------------+
SELECT LOCATE(_binary'B', 'aBcde');
+-----------------------------+
| LOCATE(_binary'B', 'aBcde') |
+-----------------------------+
| 2 |
+-----------------------------+
LOWER()
The LOWER(str)
function is used to convert all characters in the given argument str
to lowercase. The argument can be either a string or a number.
- If the argument is a string, the function returns the string in lowercase.
- If the argument is a number, the function returns the number without leading zeros.
- If the argument is
NULL
, the function returnsNULL
.
Examples:
SELECT LOWER("TiDB");
+---------------+
| LOWER("TiDB") |
+---------------+
| tidb |
+---------------+
SELECT LOWER(-012);
+-------------+
| LOWER(-012) |
+-------------+
| -12 |
+-------------+
LPAD()
Return the string argument, left-padded with the specified string.
LTRIM()
Remove leading spaces.
MAKE_SET()
Return a set of comma-separated strings that have the corresponding bit in bits set.
MID()
Return a substring starting from the specified position.
NOT LIKE
Negation of simple pattern matching.
NOT REGEXP
Negation of REGEXP
.
OCT()
Return a string containing octal representation of a number.
OCTET_LENGTH()
Synonym for LENGTH()
.
ORD()
Return character code for leftmost character of the argument.
POSITION()
Synonym for LOCATE()
.
QUOTE()
Escape the argument for use in an SQL statement.
REGEXP
Pattern matching using regular expressions.
REGEXP_INSTR()
Return the starting index of the substring that matches the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL).
REGEXP_LIKE()
Whether the string matches the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL).
REGEXP_REPLACE()
Replace substrings that match the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL).
REGEXP_SUBSTR()
Return the substring that matches the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL).
REPEAT()
Repeat a string the specified number of times.
REPLACE()
Replace occurrences of a specified string.
REVERSE()
Reverse the characters in a string.
RIGHT()
Return the specified rightmost number of characters.
RLIKE
Synonym for REGEXP
.
RPAD()
Append string the specified number of times.
RTRIM()
Remove trailing spaces.
SPACE()
Return a string of the specified number of spaces.
STRCMP()
Compare two strings.
SUBSTR()
Return the substring as specified.
SUBSTRING()
Return the substring as specified.
SUBSTRING_INDEX()
The SUBSTRING_INDEX()
function is used to extract a substring from a string based on a specified delimiter and count. This function is particularly useful when dealing with data separated by a specific delimiter, such as parsing CSV data or processing log files.
Syntax:
SUBSTRING_INDEX(str, delim, count)
str
: specifies the string to be processed.delim
: specifies the delimiter in the string, which is case-sensitive.count
: specifies the number of occurrences of the delimiter.- If
count
is a positive number, the function returns the substring before thecount
occurrences (counting from the left of the string) of the delimiter. - If
count
is a negative number, the function returns the substring after thecount
occurrences (counting from the right of the string) of the delimiter. - If
count
is0
, the function returns an empty string.
- If
Example 1:
SELECT SUBSTRING_INDEX('www.tidbcloud.com', '.', 2);
Output 1:
+-----------------------------------------+
| SUBSTRING_INDEX('www.tidbcloud.com', '.', 2) |
+-----------------------------------------+
| www.tidbcloud |
+-----------------------------------------+
Example 2:
SELECT SUBSTRING_INDEX('www.tidbcloud.com', '.', -1);
Output 2:
+------------------------------------------+
| SUBSTRING_INDEX('www.tidbcloud.com', '.', -1) |
+------------------------------------------+
| com |
+------------------------------------------+
TO_BASE64()
The TO_BASE64()
function is used to convert the given argument to a string in the base-64 encoded form and return the result according to the character set and collation of the current connection. A base-64 encoded string can be decoded using the FROM_BASE64()
function.
Syntax:
TO_BASE64(str)
- If the argument is not a string, the function converts it to a string before base-64 encoding.
- If the argument is
NULL
, the function returnsNULL
.
Example 1:
SELECT TO_BASE64('abc');
Output 1:
+------------------+
| TO_BASE64('abc') |
+------------------+
| YWJj |
+------------------+
Example 2:
SELECT TO_BASE64(6);
Output 2:
+--------------+
| TO_BASE64(6) |
+--------------+
| Ng== |
+--------------+
TRANSLATE()
Replace all occurrences of characters by other characters in a string. It does not treat empty strings as NULL
as Oracle does.
TRIM()
Remove leading and trailing spaces.
UCASE()
The UCASE()
function is used to convert a string to uppercase letters. This function is equivalent to the UPPER()
function.
Example:
SELECT UCASE('bigdata') AS result_upper, UCASE(null) AS result_null;
Output:
+--------------+-------------+
| result_upper | result_null |
+--------------+-------------+
| BIGDATA | NULL |
+--------------+-------------+
UNHEX()
The UNHEX()
function performs the reverse operation of the HEX()
function. It treats each pair of characters in the argument as a hexadecimal number and converts it to the character represented by that number, returning the result as a binary string.
Example:
SELECT UNHEX('54694442');
Output:
+--------------------------------------+
| UNHEX('54694442') |
+--------------------------------------+
| 0x54694442 |
+--------------------------------------+
UPPER()
The UPPER()
function is used to convert a string to uppercase letters. This function is equivalent to the UCASE()
function.
Example:
SELECT UPPER('bigdata') AS result_upper, UPPER(null) AS result_null;
Output:
+--------------+-------------+
| result_upper | result_null |
+--------------+-------------+
| BIGDATA | NULL |
+--------------+-------------+
WEIGHT_STRING()
The WEIGHT_STRING()
function returns the weight string (binary characters) for the input string, primarily used for sorting and comparison operations in multi-character set scenarios. If the argument is NULL
, it returns NULL
. The syntax is as follows:
WEIGHT_STRING(str [AS {CHAR|BINARY}(N)])
str
: the input string expression. If it is a non-binary string, such as aCHAR
,VARCHAR
, orTEXT
value, the return value contains the collation weights for the string. If it is a binary string, such as aBINARY
,VARBINARY
, orBLOB
value, the return value is the same as the input.AS {CHAR|BINARY}(N)
: optional parameters used to specify the type and length of the output.CHAR
represents the character data type, andBINARY
represents the binary data type.N
specifies the output length, which is an integer greater than or equal to 1.
Example:
SET NAMES 'utf8mb4';
SELECT HEX(WEIGHT_STRING('ab' AS CHAR(3))) AS char_result, HEX(WEIGHT_STRING('ab' AS BINARY(3))) AS binary_result;
Output:
+-------------+---------------+
| char_result | binary_result |
+-------------+---------------+
| 6162 | 616200 |
+-------------+---------------+
Unsupported functions
LOAD_FILE()
MATCH()
SOUNDEX()
Regular expression compatibility with MySQL
The following sections describe the regular expression compatibility with MySQL, including REGEXP_INSTR()
, REGEXP_LIKE()
, REGEXP_REPLACE()
, and REGEXP_SUBSTR()
.
Syntax compatibility
MySQL implements regular expression using International Components for Unicode (ICU), and TiDB uses RE2. To learn the syntax differences between the two libraries, you can refer to the ICU documentation and RE2 Syntax.
match_type
compatibility
The value options of match_type
between TiDB and MySQL are:
Value options in TiDB are
"c"
,"i"
,"m"
, and"s"
, and value options in MySQL are"c"
,"i"
,"m"
,"n"
, and"u"
.The
"s"
in TiDB corresponds to"n"
in MySQL. When"s"
is set in TiDB, the.
character also matches line terminators (\n
).For example, the
SELECT REGEXP_LIKE(a, b, "n") FROM t1
in MySQL is the same as theSELECT REGEXP_LIKE(a, b, "s") FROM t1
in TiDB.TiDB does not support
"u"
, which means Unix-only line endings in MySQL.
Data type compatibility
The difference between TiDB and MySQL support for the binary string type:
- MySQL does not support binary strings in regular expression functions since 8.0.22. For more details, refer to MySQL documentation. But in practice, regular functions can work in MySQL when all parameters or return types are binary strings. Otherwise, an error will be reported.
- Currently, TiDB prohibits using binary strings and an error will be reported under any circumstances.
Other compatibility
The behavior of replacing empty strings in TiDB is different from MySQL. Taking
REGEXP_REPLACE("", "^$", "123")
as an example:- MySQL does not replace the empty string and returns
""
as the result. - TiDB replaces the empty string and returns
"123"
as the result.
- MySQL does not replace the empty string and returns
The keyword used for capturing groups in TiDB is different from MySQL. MySQL uses
$
as the keyword, while TiDB uses\\
as the keyword. In addition, TiDB only supports capturing groups numbered from0
to9
.For example, the following SQL statement returns
ab
in TiDB:SELECT REGEXP_REPLACE('abcd','(.*)(.{2})$','\\1') AS s;