MySQL8のutf8mb4におけるCollation比較

2021-10-30 (Last Modified: 2022-05-21)

1. MySQLバージョン

SELECT VERSION();

+-----------+
| VERSION() |
+-----------+
| 8.0.20    |
+-----------+

2. デフォルトのCollation

SHOW COLLATION WHERE COLLATION LIKE 'utf8mb4%' AND `default` = 'Yes';

+--------------------+---------+-----+---------+----------+---------+---------------+
| Collation          | Charset | Id  | Default | Compiled | Sortlen | Pad_attribute |
+--------------------+---------+-----+---------+----------+---------+---------------+
| utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes     | Yes      |       0 | NO PAD        |
+--------------------+---------+-----+---------+----------+---------+---------------+

3. 比較対象

3.1. 比較するCollation

utf8mb4_unicode_ci
utf8mb4_general_ci
utf8mb4_bin
utf8mb4_0900_bin
utf8mb4_0900_ai_ci
utf8mb4_0900_as_ci
utf8mb4_0900_as_cs
utf8mb4_ja_0900_as_cs
utf8mb4_ja_0900_as_cs_ks

Collation               |Charset|Id |Default|Compiled|Sortlen|Pad_attribute|
------------------------+-------+---+-------+--------+-------+-------------+
utf8mb4_0900_ai_ci      |utf8mb4|255|Yes    |Yes     |      0|NO PAD       |
utf8mb4_0900_as_ci      |utf8mb4|305|       |Yes     |      0|NO PAD       |
utf8mb4_0900_as_cs      |utf8mb4|278|       |Yes     |      0|NO PAD       |
utf8mb4_0900_bin        |utf8mb4|309|       |Yes     |      1|NO PAD       |
utf8mb4_bin             |utf8mb4| 46|       |Yes     |      1|PAD SPACE    |
utf8mb4_general_ci      |utf8mb4| 45|       |Yes     |      1|PAD SPACE    |
utf8mb4_ja_0900_as_cs   |utf8mb4|303|       |Yes     |      0|NO PAD       |
utf8mb4_ja_0900_as_cs_ks|utf8mb4|304|       |Yes     |     24|NO PAD       |
utf8mb4_unicode_ci      |utf8mb4|224|       |Yes     |      8|PAD SPACE    |

3.1.1. Collation名のサフィックス

_ai : アクセントを区別しない（Accent Insensitive）
_as : アクセントを区別する（Accent Sensitive）
_ci : 大文字・小文字を区別しない（Case Insensitive）
_cs : 大文字・小文字を区別する（Case Sensitive）
_ks: カナを区別する（Kana Sensitive）
_bin : バイナリ

3.2. 比較する文字列

'はは'='ハハ' : ひらがなとカタカナ
'はは'='ぱぱ' : 半濁点
'ひょう'='ひよう' : 拗音
'A'='a' : 大文字と小文字
'C'='Ｃ' : 半角と全角
'a '='a' : 末尾の空白
'🍣'='🍺' : 寿司ビール
'0'='〇' : 漢数字のゼロ
'1'='一' : 漢数字のイチ
'①'='１' : NEC特殊文字（丸数字と全角数字）
'⑩'='１０' : NEC特殊文字（丸数字と全角数字その2）
'①'='1' : NEC特殊文字（丸数字と半角数字）
'⑩'='10' : NEC特殊文字（丸数字と半角数字その2）
'㍻'='平成' : NEC特殊文字（元号）
'㈱'='(株)' : NEC特殊文字（半角括弧の株）
'㈱'='（株）' : NEC特殊文字（全角括弧の株）

4. 比較結果

Collation 'はは'='ハハ' 'はは'='ぱぱ' 'ひょう'='ひよう'

Collation	`'はは'='ハハ'`	`'はは'='ぱぱ'`	`'ひょう'='ひよう'`
utf8mb4_unicode_ci	=	=	=
utf8mb4_general_ci
utf8mb4_bin
utf8mb4_0900_bin
utf8mb4_0900_ai_ci	=	=	=
utf8mb4_0900_as_ci	=		=
utf8mb4_0900_as_cs
utf8mb4_ja_0900_as_cs	=
utf8mb4_ja_0900_as_cs_ks

utf8mb4_unicode_ci

utf8mb4_general_ci

utf8mb4_bin

utf8mb4_0900_bin

utf8mb4_0900_ai_ci

utf8mb4_0900_as_ci

utf8mb4_0900_as_cs

utf8mb4_ja_0900_as_cs

utf8mb4_ja_0900_as_cs_ks

Collation 'A'='a' 'C'='Ｃ' 'a '='a'

Collation	`'A'='a'`	`'C'='Ｃ'`	`'a '='a'`
utf8mb4_unicode_ci	=	=	=
utf8mb4_general_ci	=		=
utf8mb4_bin			=
utf8mb4_0900_bin
utf8mb4_0900_ai_ci	=	=
utf8mb4_0900_as_ci	=	=
utf8mb4_0900_as_cs
utf8mb4_ja_0900_as_cs		=
utf8mb4_ja_0900_as_cs_ks		=

utf8mb4_unicode_ci

utf8mb4_general_ci

utf8mb4_bin

utf8mb4_0900_bin

utf8mb4_0900_ai_ci

utf8mb4_0900_as_ci

utf8mb4_0900_as_cs

utf8mb4_ja_0900_as_cs

utf8mb4_ja_0900_as_cs_ks

Collation '🍣'='🍺' '0'='〇' '1'='一'

utf8mb4_unicode_ci

utf8mb4_general_ci

utf8mb4_bin

utf8mb4_0900_bin

utf8mb4_0900_ai_ci

utf8mb4_0900_as_ci

utf8mb4_0900_as_cs

utf8mb4_ja_0900_as_cs

utf8mb4_ja_0900_as_cs_ks

Collation '①'='１' '⑩'='１０' '①'='1' '⑩'='10'

utf8mb4_unicode_ci

utf8mb4_general_ci

utf8mb4_bin

utf8mb4_0900_bin

utf8mb4_0900_ai_ci

utf8mb4_0900_as_ci

utf8mb4_0900_as_cs

utf8mb4_ja_0900_as_cs

utf8mb4_ja_0900_as_cs_ks

※utf8mb4_general_ci の結果が間違っていたので修正（2022/05/21）

Collation '㍻'='平成' '㈱'='(株)' '㈱'='（株）'

Collation	`'㍻'='平成'`	`'㈱'='(株)'`	`'㈱'='（株）'`
utf8mb4_unicode_ci	=	=	=
utf8mb4_general_ci
utf8mb4_bin
utf8mb4_0900_bin
utf8mb4_0900_ai_ci	=	=	=
utf8mb4_0900_as_ci	=	=	=
utf8mb4_0900_as_cs
utf8mb4_ja_0900_as_cs
utf8mb4_ja_0900_as_cs_ks

utf8mb4_unicode_ci

utf8mb4_general_ci

utf8mb4_bin

utf8mb4_0900_bin

utf8mb4_0900_ai_ci

utf8mb4_0900_as_ci

utf8mb4_0900_as_cs

utf8mb4_ja_0900_as_cs

utf8mb4_ja_0900_as_cs_ks

※utf8mb4_general_ci の結果が間違っていたので修正（2022/05/21）

5. 使うなら

新しく採用するなら以下のいずれかでしょうか

utf8mb4_ja_0900_as_cs_ks
utf8mb4_0900_as_cs
utf8mb4_0900_bin
utf8mb4_bin

utf8mb4_bin より utf8mb4_0900_bin の方がパフォーマンスで優れているようです（未計測）

For collating weights, utf8mb4_bin uses code points, possibly with leading zero bytes added, whereas utf8mb4_0900_bin uses the utf8mb4 encoding bytes. The sort order is the same for both collations, but sorting for utf8mb4_0900_bin is much faster.

— MySQL :: MySQL 8.0 Release Notes :: Changes in MySQL 8.0.17 (2019-07-22, General Availability)