String Functions

TiDB supports most of the string functions available in MySQL 5.7, some of the string functions available in MySQL 8.0, and some of the functions available in Oracle 21.

For comparisons between functions and syntax of Oracle and TiDB, see Comparisons between Functions and Syntax of Oracle and TiDB.

Supported functions

ASCII()

The ASCII(str) function is used to get the ASCII value of the leftmost character in the given argument. The argument can be either a string or a number.

  • If the argument is not empty, the function returns the ASCII value of the leftmost character.
  • If the argument is an empty string, the function returns 0.
  • If the argument is NULL, the function returns NULL.

Example:

SELECT ASCII('A'), ASCII('TiDB'), ASCII(23);

Output:

+------------+---------------+-----------+ | ASCII('A') | ASCII('TiDB') | ASCII(23) | +------------+---------------+-----------+ | 65 | 84 | 50 | +------------+---------------+-----------+

BIN()

The BIN() function is used to convert the given argument into a string representation of its binary value. The argument can be either a string or a number.

  • If the argument is a positive number, the function returns a string representation of its binary value.
  • If the argument is a negative number, the function converts the absolute value of the argument to its binary representation, inverts each bit of the binary value (changing 0 to 1 and 1 to 0), and then adds 1 to the inverted value.
  • If the argument is a string containing only digits, the function returns the result according to those digits. For example, the results for "123" and 123 are the same.
  • If the argument is a string and its first character is not a digit (such as "q123"), the function returns 0.
  • If the argument is a string that consists of digits and non-digits, the function returns the result according to the consecutive digits at the beginning of the argument. For example, the results for "123q123" and 123 are the same, but BIN('123q123') generates a warning like Truncated incorrect INTEGER value: '123q123'.
  • If the argument is NULL, the function returns NULL.

Example 1:

SELECT BIN(123), BIN('123q123');

Output 1:

+----------+----------------+ | BIN(123) | BIN('123q123') | +----------+----------------+ | 1111011 | 1111011 | +----------+----------------+

Example 2:

SELECT BIN(-7);

Output 2:

+------------------------------------------------------------------+ | BIN(-7) | +------------------------------------------------------------------+ | 1111111111111111111111111111111111111111111111111111111111111001 | +------------------------------------------------------------------+

BIT_LENGTH()

The BIT_LENGTH() function is used to return the length of a given argument in bits.

Examples:

SELECT BIT_LENGTH("TiDB"); +--------------------+ | BIT_LENGTH("TiDB") | +--------------------+ | 32 | +--------------------+

8 bits per character x 4 characters = 32 bits

SELECT BIT_LENGTH("PingCAP 123"); +---------------------------+ | BIT_LENGTH("PingCAP 123") | +---------------------------+ | 88 | +---------------------------+

8 bits per character (space is counted because it is a non-alphanumeric character) x 11 characters = 88 bits

SELECT CustomerName, BIT_LENGTH(CustomerName) AS BitLengthOfName FROM Customers; +--------------------+-----------------+ | CustomerName | BitLengthOfName | +--------------------+-----------------+ | Albert Einstein | 120 | | Robert Oppenheimer | 144 | +--------------------+-----------------+

CHAR()

The CHAR() function is used to get the corresponding character of a specific ASCII value. It performs the opposite operation of ASCII(), which returns the ASCII value of a specific character.

Examples:

SELECT CHAR(65); +------------+ | CHAR(65) | +------------+ | A | +------------+
SELECT CHAR(84); +------------+ | CHAR(84) | +------------+ | T | +------------+

The CHAR() function can also be used to get the corresponding character of ASCII values that extend beyond the standard ASCII range (0 - 127).

/*For extended ASCII: */ SELECT CHAR(128); +------------+ | CHAR(128) | +------------+ | 0x80 | +------------+

The CHAR() function can also get the corresponding character value of a unicode value.

/* For Unicode: */ --skip-binary-as-hex SELECT CHAR(50089); +--------------+ | CHAR(50089) | +--------------+ | é | +--------------+

CHAR_LENGTH()

The CHAR_LENGTH() function is used to get the total number of characters in a given argument as an integer.

Examples:

SELECT CHAR_LENGTH("TiDB") AS LengthOfString; +----------------+ | LengthOfString | +----------------+ | 4 | +----------------+
SELECT CustomerName, CHAR_LENGTH(CustomerName) AS LengthOfName FROM Customers; +--------------------+--------------+ | CustomerName | LengthOfName | +--------------------+--------------+ | Albert Einstein | 15 | | Robert Oppenheimer | 18 | +--------------------+--------------+

CHARACTER_LENGTH()

The CHARACTER_LENGTH() function is the same as the CHAR_LENGTH() function. Both functions can be used synonymously because they generate the same output.

CONCAT()

The CONCAT() function concatenates one or more arguments into a single string.

Syntax:

CONCAT(str1,str2,...)

str1, str2, ... is a list of arguments to be concatenated. Each argument can be a string or a number.

Example:

SELECT CONCAT('TiDB', ' ', 'Server', '-', 1, TRUE);

Output:

+---------------------------------------------+ | CONCAT('TiDB', ' ', 'Server', '-', 1, TRUE) | +---------------------------------------------+ | TiDB Server-11 | +---------------------------------------------+

If any of the arguments is NULL, CONCAT() returns NULL.

Example:

SELECT CONCAT('TiDB', NULL, 'Server');

Output:

+--------------------------------+ | CONCAT('TiDB', NULL, 'Server') | +--------------------------------+ | NULL | +--------------------------------+

In addition to the CONCAT() function, you can concatenate strings by placing them adjacent to each other as in the following example. Note that this method does not support numeric types.

SELECT 'Ti' 'DB' ' ' 'Server';

Output:

+-------------+ | Ti | +-------------+ | TiDB Server | +-------------+

CONCAT_WS()

The CONCAT_WS() function is a form of CONCAT() with a separator, which returns a string concatenated by the specified separator.

Syntax:

CONCAT_WS(separator,str1,str2,...)
  • separator: the first argument is the separator, which concatenates the remaining arguments that are not NULL.
  • str1, str2, ...: a list of arguments to be concatenated. Each argument can be a string or a number.

Example:

SELECT CONCAT_WS(',', 'TiDB Server', 'TiKV', 'PD');

Output:

+---------------------------------------------+ | CONCAT_WS(',', 'TiDB Server', 'TiKV', 'PD') | +---------------------------------------------+ | TiDB Server,TiKV,PD | +---------------------------------------------+
  • If the separator is an empty string, CONCAT_WS() is equivalent to CONCAT() and returns the concatenated string of the remaining arguments.

    Example:

    SELECT CONCAT_WS('', 'TiDB Server', 'TiKV', 'PD');

    Output:

    +--------------------------------------------+ | CONCAT_WS('', 'TiDB Server', 'TiKV', 'PD') | +--------------------------------------------+ | TiDB ServerTiKVPD | +--------------------------------------------+
  • If the separator is NULL, CONCAT_WS() returns NULL.

    Example:

    SELECT CONCAT_WS(NULL, 'TiDB Server', 'TiKV', 'PD');

    Output:

    +----------------------------------------------+ | CONCAT_WS(NULL, 'TiDB Server', 'TiKV', 'PD') | +----------------------------------------------+ | NULL | +----------------------------------------------+
  • If only one of the arguments to be concatenated is not NULL, CONCAT_WS() returns that argument.

    Example:

    SELECT CONCAT_WS(',', 'TiDB Server', NULL);

    Output:

    +-------------------------------------+ | CONCAT_WS(',', 'TiDB Server', NULL) | +-------------------------------------+ | TiDB Server | +-------------------------------------+
  • If there are NULL arguments to be concatenated, CONCAT_WS() skips these NULL arguments.

    Example:

    SELECT CONCAT_WS(',', 'TiDB Server', NULL, 'PD');

    Output:

    +-------------------------------------------+ | CONCAT_WS(',', 'TiDB Server', NULL, 'PD') | +-------------------------------------------+ | TiDB Server,PD | +-------------------------------------------+
  • If there are empty strings to be concatenated, CONCAT_WS() does not skip empty strings.

    Example:

    SELECT CONCAT_WS(',', 'TiDB Server', '', 'PD');

    Output:

    +-----------------------------------------+ | CONCAT_WS(',', 'TiDB Server', '', 'PD') | +-----------------------------------------+ | TiDB Server,,PD | +-----------------------------------------+

ELT()

Return string at index number.

EXPORT_SET()

Return a string such that for every bit set in the value bits, you get an on string and for every unset bit, you get an off string.

FIELD()

Return the index (position)of the first argument in the subsequent arguments.

FIND_IN_SET()

Return the index position of the first argument within the second argument.

FORMAT()

The FORMAT(X,D[,locale]) function is used to format the number X to a format similar to "#,###,###. ##", rounded to D decimal places, and return the result as a string.

Arguments:

  • X: the number to be formatted. It can be a direct numeric value, a numeric string, or a number in scientific notation.
  • D: the number of decimal places for the returned value. The function rounds the number X to D decimal places. If D is greater than the actual number of decimal places in X, the result is padded with zeros to the corresponding length.
  • [locale]: specifies a locale setting to be used for grouping between decimal points, thousands separators, and separators for resultant numbers. A valid locale value is the same as the valid value of the lc_time_names system variable. If not specified or the region setting is NULL, the 'en_US' region setting is used by default. This argument is optional.

Behaviors:

  • If the first argument is a string and contains only numbers, the function returns a result based on that numeric value. For example, FORMAT('12.34', 1) and FORMAT(12.34, 1) return the same result.
  • If the first argument is a number represented in scientific notation (using E/e), the function returns the result based on that number. For example, FORMAT('1E2', 3) returns 100.000.
  • If the first argument is a string starting with non-numeric characters, the function returns zero and a warning (Code 1292). For example, FORMAT('q12.36', 5) returns 0.00000, but also includes a warning Warning (Code 1292): Truncated incorrect DOUBLE value: 'q12.36'.
  • If the first argument is a string mixing numbers and non-numbers, the function returns a result based on the consecutive numeric part at the beginning of the argument, and also includes a warning (Code 1292). For example, FORMAT('12.36q56.78', 1) returns the same numeric result as FORMAT('12.36', 1), but includes a warning Warning (Code 1292): Truncated incorrect DOUBLE value: '12.36q56.78'.
  • If the second argument is zero or a negative number, the function truncates the decimal part and returns an integer.
  • If any of the arguments is NULL, the function returns NULL.

Examples:

The following examples show how to format the number 12.36 to different decimal places:

mysql> SELECT FORMAT(12.36, 1); +------------------+ | FORMAT(12.36, 1) | +------------------+ | 12.4 | +------------------+
mysql> SELECT FORMAT(12.36, 5); +------------------+ | FORMAT(12.36, 5) | +------------------+ | 12.36000 | +------------------+
mysql> SELECT FORMAT(12.36, 2); +------------------+ | FORMAT(12.36, 2) | +------------------+ | 12.36 | +------------------+

FROM_BASE64()

The FROM_BASE64() function is used to decode a Base64 encoded string and return the decoded result in its hexadecimal form.

  • This function accepts a single argument, that is, the Base64 encoded string to be decoded.
  • If the argument is NULL or not a valid Base64 encoded string, the FROM_BASE64() function returns NULL.

Examples:

The following example shows how to decode the Base64 encoded string 'SGVsbG8gVGlEQg=='. This string is the result of encoding 'Hello TiDB', using the TO_BASE64() function.

mysql> SELECT TO_BASE64('Hello TiDB'); +-------------------------+ | TO_BASE64('Hello TiDB') | +-------------------------+ | SGVsbG8gVGlEQg== | +-------------------------+ mysql> SELECT FROM_BASE64('SGVsbG8gVGlEQg=='); +------------------------------------------------------------------+ | FROM_BASE64('SGVsbG8gVGlEQg==') | +------------------------------------------------------------------+ | 0x48656C6C6F2054694442 | +------------------------------------------------------------------+
mysql> SELECT CONVERT(FROM_BASE64('SGVsbG8gVGlEQg==') USING utf8mb4); +--------------------------------------------------------+ | CONVERT(FROM_BASE64('SGVsbG8gVGlEQg==') USING utf8mb4) | +--------------------------------------------------------+ | Hello TiDB | +--------------------------------------------------------+

The following example shows how to decode the Base64 encoded number MTIzNDU2. This string is the result of encoding 123456, which can be done using the TO_BASE64() function.

mysql> SELECT FROM_BASE64('MTIzNDU2'); +--------------------------------------------------+ | FROM_BASE64('MTIzNDU2') | +--------------------------------------------------+ | 0x313233343536 | +--------------------------------------------------+

HEX()

The HEX() function is used to convert the given argument into a string representation of its hexadecimal value. The argument can be either a string or a number.

  • If the argument is a string, HEX(str) returns a hexadecimal string representation of str. The function converts each byte of each character in str into two hexadecimal digits. For example, the character a in a UTF-8 or ASCII character set is represented as a binary value of 00111101, or 61 in hexadecimal notation.
  • If the argument is a number, HEX(n) returns a hexadecimal string representation of n. The function treats the argument n as a BIGINT number, equivalent to using CONV(n, 10, 16).
  • If the argument is NULL, the function returns NULL.

Examples:

SELECT X'616263', HEX('abc'), UNHEX(HEX('abc')), 0x616263; +-----------+------------+-------------------+----------+ | X'616263' | HEX('abc') | UNHEX(HEX('abc')) | 0x616263 | +-----------+------------+-------------------+----------+ | abc | 616263 | abc | abc | +-----------+------------+-------------------+----------+
SELECT X'F09F8DA3', HEX('🍣'), UNHEX(HEX('🍣')), 0xF09F8DA3; +-------------+-------------+--------------------+------------+ | X'F09F8DA3' | HEX('🍣') | UNHEX(HEX('🍣')) | 0xF09F8DA3 | +-------------+-------------+--------------------+------------+ | 🍣 | F09F8DA3 | 🍣 | 🍣 | +-------------+-------------+--------------------+------------+
SELECT HEX(255), CONV(HEX(255), 16, 10); +----------+------------------------+ | HEX(255) | CONV(HEX(255), 16, 10) | +----------+------------------------+ | FF | 255 | +----------+------------------------+
SELECT HEX(NULL); +-----------+ | HEX(NULL) | +-----------+ | NULL | +-----------+

INSERT()

The INSERT(str, pos, len, newstr) function is used to replace a substring in str (that starts at position pos and is len characters long) with the string newstr. This function is multibyte safe.

  • If pos exceeds the length of str, the function returns the original string str without modification.
  • If len exceeds the remaining length of str from position pos, the function replaces the rest of the string from position pos.
  • If any argument is NULL, the function returns NULL.

Examples:

SELECT INSERT('He likes tennis', 4, 5, 'plays'); +------------------------------------------+ | INSERT('He likes tennis', 4, 5, 'plays') | +------------------------------------------+ | He plays tennis | +------------------------------------------+
SELECT INSERT('He likes tennis', -1, 5, 'plays'); +-------------------------------------------+ | INSERT('He likes tennis', -1, 5, 'plays') | +-------------------------------------------+ | He likes tennis | +-------------------------------------------+
SELECT INSERT('He likes tennis', 4, 100, 'plays'); +--------------------------------------------+ | INSERT('He likes tennis', 4, 100, 'plays') | +--------------------------------------------+ | He plays | +--------------------------------------------+
SELECT INSERT('He likes tenis', 10, 100, '🍣'); +-------------------------------------------+ | INSERT('He likes tenis', 10, 100, '🍣') | +-------------------------------------------+ | He likes 🍣 | +-------------------------------------------+
SELECT INSERT('あああああああ', 2, 3, 'いいい'); +----------------------------------------------------+ | INSERT('あああああああ', 2, 3, 'いいい') | +----------------------------------------------------+ | あいいいあああ | +----------------------------------------------------+
SELECT INSERT('あああああああ', 2, 3, 'xx'); +---------------------------------------------+ | INSERT('あああああああ', 2, 3, 'xx') | +---------------------------------------------+ | あxxあああ | +---------------------------------------------+

INSTR()

The INSTR(str, substr) function is used to get the position of the first occurrence of substr in str. Each argument can be either a string or a number. This function is the same as the two-argument version of LOCATE(substr, str), but with the order of the arguments reversed.

  • If either argument is a number, the function treats the number as a string.
  • If substr is not in str, the function returns 0. Otherwise, it returns the position of the first occurrence of substr in str.
  • If either argument is NULL, the function returns NULL.

Examples:

SELECT INSTR("pingcap.com", "tidb"); +------------------------------+ | INSTR("pingcap.com", "tidb") | +------------------------------+ | 0 | +------------------------------+
SELECT INSTR("pingcap.com/tidb", "tidb"); +-----------------------------------+ | INSTR("pingcap.com/tidb", "tidb") | +-----------------------------------+ | 13 | +-----------------------------------+
SELECT INSTR("pingcap.com/tidb" COLLATE utf8mb4_bin, "TiDB"); +-------------------------------------------------------+ | INSTR("pingcap.com/tidb" COLLATE utf8mb4_bin, "TiDB") | +-------------------------------------------------------+ | 0 | +-------------------------------------------------------+
SELECT INSTR("pingcap.com/tidb" COLLATE utf8mb4_general_ci, "TiDB"); +--------------------------------------------------------------+ | INSTR("pingcap.com/tidb" COLLATE utf8mb4_general_ci, "TiDB") | +--------------------------------------------------------------+ | 13 | +--------------------------------------------------------------+
SELECT INSTR(0123, "12"); +-------------------+ | INSTR(0123, "12") | +-------------------+ | 1 | +-------------------+

LCASE()

The LCASE(str) function is a synonym for LOWER(str), which returns the lowercase of the given argument.

LEFT()

The LEFT() function returns a specified number of characters from the left side of a string.

Syntax:

LEFT(`str`, `len`)
  • str: the original string to extract characters. If str contains a multibyte character, the function counts it as a single code point.
  • len: the length of characters to be returned.
    • If len is equal to or less than 0, the function returns an empty string.
    • If len is equal to or greater than the length of str, the function returns the original str.
  • If any argument is NULL, the function returns NULL.

Examples:

SELECT LEFT('ABCED', 3); +------------------+ | LEFT('ABCED', 3) | +------------------+ | ABC | +------------------+ SELECT LEFT('ABCED', 6); +------------------+ | LEFT('ABCED', 6) | +------------------+ | ABCED | +------------------+
SELECT LEFT('ABCED', 0); +------------------+ | LEFT('ABCED', 0) | +------------------+ | | +------------------+ SELECT LEFT('ABCED', -1); +-------------------+ | LEFT('ABCED', -1) | +-------------------+ | | +-------------------+
SELECT LEFT('🍣ABC', 3); +--------------------+ | LEFT('🍣ABC', 3) | +--------------------+ | 🍣AB | +--------------------+
SELECT LEFT('ABC', NULL); +-------------------+ | LEFT('ABC', NULL) | +-------------------+ | NULL | +-------------------+ SELECT LEFT(NULL, 3); +------------------------------+ | LEFT(NULL, 3) | +------------------------------+ | NULL | +------------------------------+

LENGTH()

The LENGTH() function returns the length of a string in bytes.

LENGTH() counts a multibyte character as multiple bytes while CHAR_LENGTH() counts a multibyte character as a single code point.

If the argument is NULL, the function returns NULL.

Examples:

SELECT LENGTH('ABC'); +---------------+ | LENGTH('ABC') | +---------------+ | 3 | +---------------+ SELECT LENGTH('🍣ABC'); +-------------------+ | LENGTH('🍣ABC') | +-------------------+ | 7 | +-------------------+ SELECT CHAR_LENGTH('🍣ABC'); +------------------------+ | CHAR_LENGTH('🍣ABC') | +------------------------+ | 4 | +------------------------+
SELECT LENGTH(NULL); +--------------+ | LENGTH(NULL) | +--------------+ | NULL | +--------------+

LIKE

The LIKE operator is used for simple string matching. The expression expr LIKE pat [ESCAPE 'escape_char'] returns 1 (TRUE) or 0 (FALSE). If either expr or pat is NULL, the result is NULL.

You can use the following two wildcard parameters with LIKE:

  • % matches any number of characters, including zero characters.
  • _ matches exactly one character.

The following examples use the utf8mb4_bin collation:

SET collation_connection='utf8mb4_bin'; SHOW VARIABLES LIKE 'collation_connection'; +----------------------+-------------+ | Variable_name | Value | +----------------------+-------------+ | collation_connection | utf8mb4_bin | +----------------------+-------------+
SELECT NULL LIKE '%' as result; +--------+ | result | +--------+ | NULL | +--------+
SELECT 'sushi!!!' LIKE 'sushi_' AS result; +--------+ | result | +--------+ | 0 | +--------+
SELECT '🍣🍺sushi🍣🍺' LIKE '%sushi%' AS result; +--------+ | result | +--------+ | 1 | +--------+
SELECT '🍣🍺sushi🍣🍺' LIKE '%SUSHI%' AS result; +--------+ | result | +--------+ | 0 | +--------+
SELECT '🍣🍺sushi🍣🍺' LIKE '%🍣%' AS result; +--------+ | result | +--------+ | 1 | +--------+

The default escape character is \:

SELECT 'sushi!!!' LIKE 'sushi\_' AS result; +--------+ | result | +--------+ | 0 | +--------+
SELECT 'sushi_' LIKE 'sushi\_' AS result; +--------+ | result | +--------+ | 1 | +--------+

To specify a different escape character, such as *, you can use the ESCAPE clause:

SELECT 'sushi_' LIKE 'sushi*_' ESCAPE '*' AS result; +--------+ | result | +--------+ | 1 | +--------+
SELECT 'sushi!' LIKE 'sushi*_' ESCAPE '*' AS result; +--------+ | result | +--------+ | 0 | +--------+

You can use the LIKE operator to match a numeric value:

SELECT 10 LIKE '1%' AS result; +--------+ | result | +--------+ | 1 | +--------+
SELECT 10000 LIKE '12%' AS result; +--------+ | result | +--------+ | 0 | +--------+

To specify a collation explicitly, such as utf8mb4_unicode_ci, you can use COLLATE:

SELECT '🍣🍺Sushi🍣🍺' COLLATE utf8mb4_unicode_ci LIKE '%SUSHI%' AS result; +--------+ | result | +--------+ | 1 | +--------+

LOCATE()

The LOCATE(substr, str[, pos]) function is used to get the position of the first occurrence of a specified substring substr in a string str. The pos argument is optional and specifies the starting position for the search.

  • If the substring substr is not present in str, the function returns 0.
  • If any argument is NULL, the function returns NULL.
  • This function is multibyte safe and performs a case-sensitive search only if at least one argument is a binary string.

The following examples use the utf8mb4_bin collation:

SET collation_connection='utf8mb4_bin'; SHOW VARIABLES LIKE 'collation_connection'; +----------------------+-------------+ | Variable_name | Value | +----------------------+-------------+ | collation_connection | utf8mb4_bin | +----------------------+-------------+
SELECT LOCATE('bar', 'foobarbar'); +----------------------------+ | LOCATE('bar', 'foobarbar') | +----------------------------+ | 4 | +----------------------------+
SELECT LOCATE('baz', 'foobarbar'); +----------------------------+ | LOCATE('baz', 'foobarbar') | +----------------------------+ | 0 | +----------------------------+
SELECT LOCATE('bar', 'fooBARBAR'); +----------------------------+ | LOCATE('bar', 'fooBARBAR') | +----------------------------+ | 0 | +----------------------------+
SELECT LOCATE('bar', 'foobarBAR', 100); +---------------------------------+ | LOCATE('bar', 'foobarBAR', 100) | +---------------------------------+ | 0 | +---------------------------------+
SELECT LOCATE('bar', 'foobarbar', 5); +-------------------------------+ | LOCATE('bar', 'foobarbar', 5) | +-------------------------------+ | 7 | +-------------------------------+
SELECT LOCATE('bar', NULL); +---------------------+ | LOCATE('bar', NULL) | +---------------------+ | NULL | +---------------------+
SELECT LOCATE('い', 'たいでぃーびー'); +----------------------------------------+ | LOCATE('い', 'たいでぃーびー') | +----------------------------------------+ | 2 | +----------------------------------------+
SELECT LOCATE('い', 'たいでぃーびー', 3); +-------------------------------------------+ | LOCATE('い', 'たいでぃーびー', 3) | +-------------------------------------------+ | 0 | +-------------------------------------------+

The following examples use the utf8mb4_unicode_ci collation:

SET collation_connection='utf8mb4_unicode_ci'; SHOW VARIABLES LIKE 'collation_connection'; +----------------------+--------------------+ | Variable_name | Value | +----------------------+--------------------+ | collation_connection | utf8mb4_unicode_ci | +----------------------+--------------------+
SELECT LOCATE('い', 'たいでぃーびー', 3); +-------------------------------------------+ | LOCATE('い', 'たいでぃーびー', 3) | +-------------------------------------------+ | 4 | +-------------------------------------------+
SELECT LOCATE('🍺', '🍣🍣🍣🍺🍺'); +----------------------------------------+ | LOCATE('🍺', '🍣🍣🍣🍺🍺') | +----------------------------------------+ | 1 | +----------------------------------------+

The following multibyte and binary string examples use the utf8mb4_bin collation:

SET collation_connection='utf8mb4_bin'; SHOW VARIABLES LIKE 'collation_connection'; +----------------------+-------------+ | Variable_name | Value | +----------------------+-------------+ | collation_connection | utf8mb4_bin | +----------------------+-------------+
SELECT LOCATE('🍺', '🍣🍣🍣🍺🍺'); +----------------------------------------+ | LOCATE('🍺', '🍣🍣🍣🍺🍺') | +----------------------------------------+ | 4 | +----------------------------------------+
SELECT LOCATE('b', _binary'aBcde'); +-----------------------------+ | LOCATE('b', _binary'aBcde') | +-----------------------------+ | 0 | +-----------------------------+
SELECT LOCATE('B', _binary'aBcde'); +-----------------------------+ | LOCATE('B', _binary'aBcde') | +-----------------------------+ | 2 | +-----------------------------+
SELECT LOCATE(_binary'b', 'aBcde'); +-----------------------------+ | LOCATE(_binary'b', 'aBcde') | +-----------------------------+ | 0 | +-----------------------------+
SELECT LOCATE(_binary'B', 'aBcde'); +-----------------------------+ | LOCATE(_binary'B', 'aBcde') | +-----------------------------+ | 2 | +-----------------------------+

LOWER()

The LOWER(str) function is used to convert all characters in the given argument str to lowercase. The argument can be either a string or a number.

  • If the argument is a string, the function returns the string in lowercase.
  • If the argument is a number, the function returns the number without leading zeros.
  • If the argument is NULL, the function returns NULL.

Examples:

SELECT LOWER("TiDB"); +---------------+ | LOWER("TiDB") | +---------------+ | tidb | +---------------+
SELECT LOWER(-012); +-------------+ | LOWER(-012) | +-------------+ | -12 | +-------------+

LPAD()

Return the string argument, left-padded with the specified string.

LTRIM()

Remove leading spaces.

MAKE_SET()

Return a set of comma-separated strings that have the corresponding bit in bits set.

MID()

Return a substring starting from the specified position.

NOT LIKE

Negation of simple pattern matching.

NOT REGEXP

Negation of REGEXP.

OCT()

Return a string containing octal representation of a number.

OCTET_LENGTH()

Synonym for LENGTH().

ORD()

Return character code for leftmost character of the argument.

POSITION()

Synonym for LOCATE().

QUOTE()

Escape the argument for use in an SQL statement.

REGEXP

Pattern matching using regular expressions.

REGEXP_INSTR()

Return the starting index of the substring that matches the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL).

REGEXP_LIKE()

Whether the string matches the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL).

REGEXP_REPLACE()

Replace substrings that match the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL).

REGEXP_SUBSTR()

Return the substring that matches the regular expression (Partly compatible with MySQL. For more details, see Regular expression compatibility with MySQL).

REPEAT()

Repeat a string the specified number of times.

REPLACE()

Replace occurrences of a specified string.

REVERSE()

Reverse the characters in a string.

Return the specified rightmost number of characters.

RLIKE

Synonym for REGEXP.

RPAD()

Append string the specified number of times.

RTRIM()

Remove trailing spaces.

SPACE()

Return a string of the specified number of spaces.

STRCMP()

Compare two strings.

SUBSTR()

Return the substring as specified.

SUBSTRING()

Return the substring as specified.

SUBSTRING_INDEX()

The SUBSTRING_INDEX() function is used to extract a substring from a string based on a specified delimiter and count. This function is particularly useful when dealing with data separated by a specific delimiter, such as parsing CSV data or processing log files.

Syntax:

SUBSTRING_INDEX(str, delim, count)
  • str: specifies the string to be processed.
  • delim: specifies the delimiter in the string, which is case-sensitive.
  • count: specifies the number of occurrences of the delimiter.
    • If count is a positive number, the function returns the substring before the count occurrences (counting from the left of the string) of the delimiter.
    • If count is a negative number, the function returns the substring after the count occurrences (counting from the right of the string) of the delimiter.
    • If count is 0, the function returns an empty string.

Example 1:

SELECT SUBSTRING_INDEX('www.tidbcloud.com', '.', 2);

Output 1:

+-----------------------------------------+ | SUBSTRING_INDEX('www.tidbcloud.com', '.', 2) | +-----------------------------------------+ | www.tidbcloud | +-----------------------------------------+

Example 2:

SELECT SUBSTRING_INDEX('www.tidbcloud.com', '.', -1);

Output 2:

+------------------------------------------+ | SUBSTRING_INDEX('www.tidbcloud.com', '.', -1) | +------------------------------------------+ | com | +------------------------------------------+

TO_BASE64()

The TO_BASE64() function is used to convert the given argument to a string in the base-64 encoded form and return the result according to the character set and collation of the current connection. A base-64 encoded string can be decoded using the FROM_BASE64() function.

Syntax:

TO_BASE64(str)
  • If the argument is not a string, the function converts it to a string before base-64 encoding.
  • If the argument is NULL, the function returns NULL.

Example 1:

SELECT TO_BASE64('abc');

Output 1:

+------------------+ | TO_BASE64('abc') | +------------------+ | YWJj | +------------------+

Example 2:

SELECT TO_BASE64(6);

Output 2:

+--------------+ | TO_BASE64(6) | +--------------+ | Ng== | +--------------+

TRANSLATE()

Replace all occurrences of characters by other characters in a string. It does not treat empty strings as NULL as Oracle does.

TRIM()

Remove leading and trailing spaces.

UCASE()

The UCASE() function is used to convert a string to uppercase letters. This function is equivalent to the UPPER() function.

Example:

SELECT UCASE('bigdata') AS result_upper, UCASE(null) AS result_null;

Output:

+--------------+-------------+ | result_upper | result_null | +--------------+-------------+ | BIGDATA | NULL | +--------------+-------------+

UNHEX()

The UNHEX() function performs the reverse operation of the HEX() function. It treats each pair of characters in the argument as a hexadecimal number and converts it to the character represented by that number, returning the result as a binary string.

Example:

SELECT UNHEX('54694442');

Output:

+--------------------------------------+ | UNHEX('54694442') | +--------------------------------------+ | 0x54694442 | +--------------------------------------+

UPPER()

The UPPER() function is used to convert a string to uppercase letters. This function is equivalent to the UCASE() function.

Example:

SELECT UPPER('bigdata') AS result_upper, UPPER(null) AS result_null;

Output:

+--------------+-------------+ | result_upper | result_null | +--------------+-------------+ | BIGDATA | NULL | +--------------+-------------+

WEIGHT_STRING()

The WEIGHT_STRING() function returns the weight string (binary characters) for the input string, primarily used for sorting and comparison operations in multi-character set scenarios. If the argument is NULL, it returns NULL. The syntax is as follows:

WEIGHT_STRING(str [AS {CHAR|BINARY}(N)])
  • str: the input string expression. If it is a non-binary string, such as a CHAR, VARCHAR, or TEXT value, the return value contains the collation weights for the string. If it is a binary string, such as a BINARY, VARBINARY, or BLOB value, the return value is the same as the input.

  • AS {CHAR|BINARY}(N): optional parameters used to specify the type and length of the output. CHAR represents the character data type, and BINARY represents the binary data type. N specifies the output length, which is an integer greater than or equal to 1.

Example:

SET NAMES 'utf8mb4'; SELECT HEX(WEIGHT_STRING('ab' AS CHAR(3))) AS char_result, HEX(WEIGHT_STRING('ab' AS BINARY(3))) AS binary_result;

Output:

+-------------+---------------+ | char_result | binary_result | +-------------+---------------+ | 6162 | 616200 | +-------------+---------------+

Unsupported functions

  • LOAD_FILE()
  • MATCH()
  • SOUNDEX()

Regular expression compatibility with MySQL

The following sections describe the regular expression compatibility with MySQL, including REGEXP_INSTR(), REGEXP_LIKE(), REGEXP_REPLACE(), and REGEXP_SUBSTR().

Syntax compatibility

MySQL implements regular expression using International Components for Unicode (ICU), and TiDB uses RE2. To learn the syntax differences between the two libraries, you can refer to the ICU documentation and RE2 Syntax.

match_type compatibility

The value options of match_type between TiDB and MySQL are:

  • Value options in TiDB are "c", "i", "m", and "s", and value options in MySQL are "c", "i", "m", "n", and "u".

  • The "s" in TiDB corresponds to "n" in MySQL. When "s" is set in TiDB, the . character also matches line terminators (\n).

    For example, the SELECT REGEXP_LIKE(a, b, "n") FROM t1 in MySQL is the same as the SELECT REGEXP_LIKE(a, b, "s") FROM t1 in TiDB.

  • TiDB does not support "u", which means Unix-only line endings in MySQL.

Data type compatibility

The difference between TiDB and MySQL support for the binary string type:

  • MySQL does not support binary strings in regular expression functions since 8.0.22. For more details, refer to MySQL documentation. But in practice, regular functions can work in MySQL when all parameters or return types are binary strings. Otherwise, an error will be reported.
  • Currently, TiDB prohibits using binary strings and an error will be reported under any circumstances.

Other compatibility

  • The behavior of replacing empty strings in TiDB is different from MySQL. Taking REGEXP_REPLACE("", "^$", "123") as an example:

    • MySQL does not replace the empty string and returns "" as the result.
    • TiDB replaces the empty string and returns "123" as the result.
  • The keyword used for capturing groups in TiDB is different from MySQL. MySQL uses $ as the keyword, while TiDB uses \\ as the keyword. In addition, TiDB only supports capturing groups numbered from 0 to 9.

    For example, the following SQL statement returns ab in TiDB:

    SELECT REGEXP_REPLACE('abcd','(.*)(.{2})$','\\1') AS s;