Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support utf8mb3 charset #1424

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions charset/charset.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,11 @@ type Collation struct {
IsDefault bool
}

var collationsIDMap = make(map[int]*Collation)
var collationsNameMap = make(map[string]*Collation)
var supportedCollations = make([]*Collation, 0, len(supportedCollationNames))
var (
collationsIDMap = make(map[int]*Collation)
collationsNameMap = make(map[string]*Collation)
supportedCollations = make([]*Collation, 0, len(supportedCollationNames))
)

// All the supported charsets should be in the following table.
var charsetInfos = map[string]*Charset{
Expand Down Expand Up @@ -235,6 +237,7 @@ const (
CharsetUTF16 = "utf16"
CharsetUTF16LE = "utf16le"
CharsetUTF32 = "utf32"
CharsetUTF8MB3 = "utf8mb3"
)

var collations = []*Collation{
Expand Down Expand Up @@ -459,6 +462,7 @@ var collations = []*Collation{
{247, "utf8mb4", "utf8mb4_vietnamese_ci", false},
{255, "utf8mb4", "utf8mb4_0900_ai_ci", false},
{2048, "utf8mb4", "utf8mb4_zh_pinyin_tidb_as_cs", false},
{2049, "utf8mb3", "utf8mb3_general_ci", true},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This collation seems to be there under a different name already.

mysql> show collation where charset='utf8mb3' and Collation LIKE '%\_general\_ci';
+--------------------+---------+----+---------+----------+---------+---------------+
| Collation          | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
+--------------------+---------+----+---------+----------+---------+---------------+
| utf8mb3_general_ci | utf8mb3 | 33 | Yes     | Yes      |       1 | PAD SPACE     |
+--------------------+---------+----+---------+----------+---------+---------------+
1 row in set (0.01 sec)

mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.30    |
+-----------+
1 row in set (0.01 sec)

There already is this:

	{33, "utf8", "utf8_general_ci", false},

That is similar to this:

	{2049, "utf8mb3", "utf8mb3_general_ci", true},

utf8 is an alias for utf8mb3. I think the change in MySQL 8.0.28 (see https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8.html ) might be related to this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think it's better to keep 'utf8_general_ci(33)' and just make utf8mb3_general_ci as an alias.

If we want to introduce the alias, maybe we can introduce a new 'alias' table to represent the mapping relations:

  • utf8 <-> utf8mb3
  • utf8_bin <-> utf8mb3_bin
    ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MySQL (and MariaDB?) are doing the mapping the other way around, so utf8 is an alias for utf8mb3. The result is that tables that are created with current versions and use utf8 show utf8mb3 in the output. Then the plan is to switch the utf8 alias to mean utf8mb4 in the future. Then old tables still show utf8mb3 while new ones show utf8mb4 even if both were created with the same utf8 but just on different versions.

See also:

}

// AddCharset adds a new charset.
Expand Down