Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidb uhhex() results are conflicting with mysql( 5.7 and 8.0) when collation_database =gbk_chinese_ci #30362

Closed
ramanich1 opened this issue Dec 2, 2021 · 4 comments · Fixed by #30288
Assignees
Labels
feature/developing the related feature is in development severity/moderate sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.

Comments

@ramanich1
Copy link
Collaborator

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

drop table if exists t1;
SET NAMES utf8;
SET collation_database =gbk_chinese_ci;
CREATE TABLE t1 (code  VARCHAR(4),a VARCHAR(4));
INSERT INTO `t1` (code) VALUES ('C29F'),('CC91'),('D697'),('8020'),('CCA0'),('D2B1'),('DFAB'),('DCB0');
UPDATE IGNORE t1 SET a=unhex(code) ORDER BY code;
SELECT COUNT(*) FROM t1 WHERE a <> '';

2. What did you expect to see? (Required)

--mysql 5.7
mysql> SELECT COUNT(*) FROM t1 WHERE a <> '';
+----------+
| COUNT(*) |
+----------+
|        8 |
+----------+
1 row in set (0.00 sec)
--mysql 8.0
mysql> SELECT COUNT(*) FROM t1 WHERE a <> '';
+----------+
| COUNT(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

3. What did you see instead (Required)

mysql> SELECT COUNT(*) FROM t1 WHERE a <> '';
+----------+
| COUNT(*) |
+----------+
|        7 |
+----------+
1 row in set (0.00 sec)

4. What is your TiDB version? (Required)

| Release Version: v5.4.0-alpha-264-g6efa36df6
Edition: Community
Git Commit Hash: 6efa36df6cff325106f67ecfe3d79816ba37ca6a
Git Branch: master
UTC Build Time: 2021-11-29 16:57:51
GoVersion: go1.17.2
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false |
@ramanich1 ramanich1 added the type/bug The issue is confirmed as a bug. label Dec 2, 2021
@bb7133 bb7133 added the feature/developing the related feature is in development label Dec 3, 2021
@Defined2014
Copy link
Contributor

Defined2014 commented Dec 3, 2021

Tried on MySQL 8.0.27, the behavior is same as TiDB. But on 8.0.12, the result is 1. :(
What's your MySQL version? @ramanich1


mysql> SELECT COUNT(*) FROM t1 WHERE a <> '';
+----------+
| COUNT(*) |
+----------+
|        7 |
+----------+
1 row in set (0.00 sec)

mysql> SELECT * FROM t1;
+------+------+
| code | a    |
+------+------+
| C29F |     |
| CC91 | ̑    |
| D697 | ֗    |
| 8020 |      |
| CCA0 | ̠    |
| D2B1 | ұ    |
| DFAB | ߫    |
| DCB0 | ܰ    |
+------+------+
8 rows in set (0.00 sec)

mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.27    |
+-----------+
1 row in set (0.00 sec)

@Defined2014
Copy link
Contributor

Defined2014 commented Dec 3, 2021

But we still have some problems with unhex() func related gbk.

create table t1(code  VARCHAR(4) charset gbk, a varchar(4) charset gbk);
INSERT INTO `t1` (code) VALUES ('C29F'),('CC91'),('D697'),('8020'),('CCA0'),('D2B1'),('DFAB'),('DCB0');
UPDATE IGNORE t1 SET a=unhex(code) ORDER BY code;
select * from t1;

mysql

mysql> select * from t1;
+------+------+
| code | a    |
+------+------+
| C29F | 聼   |
| CC91 | 虘   |
| D697 | 謼   |
| 8020 |      |
| CCA0 | 虪   |
| D2B1 | 冶   |
| DFAB | 攉   |
| DCB0 | 馨   |
+------+------+
8 rows in set (0.00 sec)

TiDB

mysql> select * from t1;
+------+------+
| code | a    |
+------+------+
| C29F |      |
| CC91 |      |
| D697 |      |
| 8020 |      |
| CCA0 |      |
| D2B1 |      |
| DFAB |      |
| DCB0 |      |
+------+------+
8 rows in set (0.00 sec)

@Defined2014
Copy link
Contributor

Defined2014 commented Dec 3, 2021

But we still have some problems with unhex() func related gbk.

create table t1(code  VARCHAR(4) charset gbk, a varchar(4) charset gbk);
INSERT INTO `t1` (code) VALUES ('C29F'),('CC91'),('D697'),('8020'),('CCA0'),('D2B1'),('DFAB'),('DCB0');
UPDATE IGNORE t1 SET a=unhex(code) ORDER BY code;
select * from t1;

mysql

mysql> select * from t1;
+------+------+
| code | a    |
+------+------+
| C29F | 聼   |
| CC91 | 虘   |
| D697 | 謼   |
| 8020 |      |
| CCA0 | 虪   |
| D2B1 | 冶   |
| DFAB | 攉   |
| DCB0 | 馨   |
+------+------+
8 rows in set (0.00 sec)

TiDB

mysql> select * from t1;
+------+------+
| code | a    |
+------+------+
| C29F |      |
| CC91 |      |
| D697 |      |
| 8020 |      |
| CCA0 |      |
| D2B1 |      |
| DFAB |      |
| DCB0 |      |
+------+------+
8 rows in set (0.00 sec)

I think the problem is about insert to a gbk filed. The minimal step to reproduce it:

CREATE TABLE `t1` (a varchar(4) charset gbk);
insert into t1 values (0xc29f);

Use a hack way to fix it, it works. Seems we think all string is decoded as UTF-8.

diff --git a/table/column.go b/table/column.go
index 445a169a8..31244a75e 100644
--- a/table/column.go
+++ b/table/column.go
@@ -319,6 +319,10 @@ func CastValue(ctx sessionctx.Context, val types.Datum, col *model.ColumnInfo, r
                strategy := charset.TruncateStrategyReplace
                if val.Collation() == charset.CollationBin {
                        strategy = charset.TruncateStrategyTrim
+                       if col.Charset == charset.CharsetGBK {
+                               str, err = charset.NewEncoding(col.Charset).DecodeString(str)
+                               casted.SetString(str, charset.CollationUTF8)
+                       }
                }
                if newStr, invalidPos := v.Truncate(str, strategy); invalidPos >= 0 {
                        casted = types.NewStringDatum(newStr)

@github-actions
Copy link

Please check whether the issue should be labeled with 'affects-x.y' or 'fixes-x.y.z', and then remove 'needs-more-info' label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/developing the related feature is in development severity/moderate sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants