Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoded Common Chinese Dialectal Ideographs List #84

Open
eisoch opened this issue Sep 7, 2020 · 5 comments
Open

Encoded Common Chinese Dialectal Ideographs List #84

eisoch opened this issue Sep 7, 2020 · 5 comments
Labels

Comments

@eisoch
Copy link
Owner

eisoch commented Sep 7, 2020

This list is prepared to include the encoded ideographs which could be used to record the common words in different Chinese dialects, and the characters should be common and familiar for the local people at least. The characters in this list are out of IICore and UnihanCore, so we can treat it as a supplement of IICore and UnihanCore. For example, “唦” is not a G0 character, and it's always used in a Wuhan dialect word “板唦” by Dada Band (达达乐队), but it has been included in UnihanCore which is marked as HMT, so there is no need to include it here. Please click here to see the explanation by Dada Band.

Some people will use a wrong character to replace the right one which is hard to input although they know how to write the right one. Some wrong characters have become more and more popular.

The dialect column is based on ISO 639-3:2007, GB/T 2260-2007 and ISO 3166-1:2020, ISO 3166-2:CN, ISO 3166-2:TW, ISO 3166-2:HK, ISO 3166-2:MO, ISO 3166-2:JP, ISO 3166-2:KR, ISO 3166-2:KP, ISO 3166-2:VN, ISO 3166-2:SG, ISO 3166-2:MY, and we can add more than one value if possible.

UCS Char. IDS Ref. Dialect Popularity T/S Wr. Note Proposer
U+378E ⿸尸巴 G3-3B50 cmn-CN-BJ Mandarin area TS faeces for many Mandarin dialects Admin
U+379E ⿸尸從 G3-3B4E cmn-CN-BJ; cmn-CN-SN-SIA Mandarin area T weak and incompetent for almost all the Mandarin dialects, not only Beijing, which is common in TV drama subtitles; 《不愁》, the opening song of the drama 《装台》 Admin
U+45A2 ⿰虫少 G3-615D,
T4-2F6B
yue-CN-GD-SUD Pearl River Delta and Hong Kong, Macao TS the second character of the local Cantonese word as a kind of butterfly-like fried snack with WS2017-03891 for Shunde, Foshan, Guangdong Admin
U+473A ⿰豆昔 G3-6E4A,
T4-4D64,
KP1-782B
cmn-CN-BJ local TS 豆䜺儿糕 is a kind of cake for Beijing Admin
U+4B23 ⿰飠乞 G3-7A4B,
T4-362C,
K3-3545,
KP1-8820
cmn-CN-SN-YNA local T 䬣飥 is a kind of food for Yan'an, Shaanxi Admin
U+4B30 ⿰飠召 GKX-1418.15,
T3-4556,
KP1-8845
cmn-CN-CQ local T 䬰子鋪盖面: a kind of noodle for Chongqing Admin
U+4B54 ⿰飠追 G3-7A7A,
T4-5E5A,
KP1-88C7
nan-CN-FJ-QZJ local T 豬油䭔 is a kind of food for Quanzhou, Fujian Admin
U+4C9D ⿰鱼仓 GS-224C yue-CN-HK Yue area S pomfret Admin
U+717F ⿰火尃 G5-5225,
T3-476C,
K2-4475,
KP1-55B4
cjy-CN-SX-FYJ local TS 炉煿肉 is a kind of food for Fenyang, Shanxi; also used for T/FYCY 012-2019 Admin
U+7CC4 ⿰米扁 G3-677C,
T3-4D6D,
J14-737C,
K2-515B,
KP1-65ED
hsn-CN-HN-HNY local TS 糄粑 is a kind of food for Hengyang, Hunan; also used for DB43/T 1588.18-2019 Admin
U+9904 ⿰飠合 G3-7A5D,
T5-4B4C,
K2-6E2D,
KP1-8864
cmn-CN-HA-JXY; cmn-CN-HE-NJN Mandarin area T 饸饹面 is a common kind of noodle for Jiaxian, Ningjin, Ulanqab, Datong, Handan and so on Admin
U+9966 ⿰饣乇 G8-2D44 cmn-CN-SN-YNA local S 𱃱饦 is a kind of food for Yan'an, Shaanxi Admin
U+9B9C ⿰魚后 G3-7778,
T4-5A49,
K2-7067,
KP1-8CD1
nan-CN-GD-HIF local T 鮜門: train station Admin
U+2028E 𠊎 ⿰亻厓 GHZ-10172.08 hak-CN-GD-MXZ Hakka area TS the first person (I, me) for the common Hakka dialect Admin
U+216A6 𡚦 ⿴女丶 V2-736D yue-CN-GD-ZHA; yue-CN-HK Yue area TS female sex organ Admin
U+25EF5 𥻵 ⿰米時 GHZ‑53156.12 cdo-CN-FJ-FOC local T 𥻵粿 is a kind of food for Fuzhou, Fujian Admin
U+2683F 𦠿 ⿰月䪞 GKX‑0995.24,
T5-5D59
czh-CN-AH-HZQ local TS 𦠿汤 is a kind of soup, often as breakfast for Huizhou, Anhui Admin
U+274BD 𧒽 ⿰虫雷 GHZ‑42896.10,
T4-6122
yue-CN-GD-FOS local TS a kind of shell, 𧒽岗: metro station @‌Honoka55
U+2969F 𩚟 ⿰飠夬 GHC cmn-CN-YN-KMG local T 炒餌𩚟: a kind of food for Kunming, Yunnan Admin
U+296C3 𩛃 ⿰飠瓜 V2-8678 cmn-CN-SD-LCH local T 𩛃⿰飠荅 or 𩛃𩞰: food for Liaocheng, Shandong; 𩛃 is a Nom character which is the variant of 菓 or 果 meant confectionery Admin
U+29751 𩝑 ⿰飠宣 GHZ‑74466.08 cmn-CN-SN-LON local T 𩝑餅 is a kind of food for Longxian, Baoji, Shaanxi Admin
U+297B0 𩞰 ⿰飠答 GCY cmn-CN-SD-LCH local T 𩛃𩞰: food for Liaocheng, Shandong Admin
U+2980D 𩠍 ⿰饣旋 GBK cmn-CN-SD-TNA local S 油𩠍儿 is a kind of food for Jinan, Shandong Admin
U+2AA0A 𪨊 ⿸尸从 GXC-1003.72,
UTC-00041
cmn-CN-BJ; cmn-CN-SN-SIA Mandarin area S weak and incompetent for almost all the Mandarin dialects, not only Beijing, which is common in TV drama subtitles; 《不愁》, the opening song of the drama 《装台》 Admin
U+2B0D1 𫃑 ⿰米勞 GZFY-01072,
TE-2F3E
cmn-CN-CQ local T 𫃑糟小湯圓 is a kind of desert for Chongqing Admin
U+2B5EB 𫗫 ⿰饣胡 GXC‑3024.06,
UTC‑00678
cmn-CN-SN-WNA local S 𫗫饽 is a kind of fish for Weinan, Shaanxi Admin
U+2B5F0 𫗰 ⿰饣追 GCH‑4024.13 nan-CN-FJ-QZJ local S 猪油𫗰 is a kind of food for Quanzhou, Fujian Admin
U+2B694 𫚔 ⿰鱼回 GXC-2026.42,
UTC-00074
wuu-CN-JS-NTG local S 𫚔鱼 is a kind of fish for Nantong Admin
U+2C081 𬂁 ⿰月君 GZFY-00889 cmn-CN-SC-CTU nationwide TS a kind of viscus for Chengdu, Sichuan Admin
U+2C2B6 𬊶 ⿰火监 GGH-2008.58 cmn-CN-SN-SIA local S 𬊶臊子: a kind of food for Xi’an Admin
U+2C58B 𬖋 ⿰米乙 GCYY-01188 yue-CN-GD-ZHA local TS a kind of ciba for Zhanjiang, Guangdong; maybe also common for Beihai, Guangxi; also Zhuang character and Chinese Nom character with the same meaning Admin
U+2C59E 𬖞 ⿰米时 GZFY‑01039 cdo-CN-FJ-FOC local S 𬖞粿 is a kind of food for Fuzhou, Fujian Admin
U+2CCB1 𬲱 ⿰饣它 GZFY-00155 cmn-CN-JS-XUZ neighbouring area S a kind of soup, often for breakfast Admin
U+2CCBD 𬲽 ⿰饣其 GZFY‑00204 cmn-CN-SX-YCE local S 炒𬲽 is a kind of food for Yuncheng, Shanxi Admin
U+2CCC7 𬳇 ⿰饣宣 GZFY‑00157 cmn-CN-SN-LON local S 𬳇饼 is a kind of food for Longxian, Baoji, Shaanxi Admin
U+2EA3B 𮨻 ⿰飠它 KC-05896 cmn-CN-JS-XUZ neighbouring area T a kind of soup, often for breakfast Admin
U+30ABF 𰪿 ⿰米劳 UK-02846 cmn-CN-CQ local S 𰪿糟小汤圆 is a kind of desert for Chongqing Admin
U+30EDD 𰻝 ⿺辶⿳穴⿲月⿱⿲幺言幺⿲长马长刂心 UTC-01312 cmn-CN-SN-SIA nationwide S a kind of noodle for Xi'an, Shaanxi Admin
U+30EDE 𰻞 ⿺辶⿳穴⿲月⿱⿲幺言幺⿲長馬長刂心 UTC-00791 cmn-CN-SN-SIA nationwide T a kind of noodle for Xi'an, Shaanxi Admin
U+310EA 𱃪 ⿰飠其 UK-02812 cmn-CN-SX-YCE local T 炒𱃪 is a kind of food for Yuncheng, Shanxi Admin
U+310F1 𱃱 ⿰饣乞 UK‑02297 cmn-CN-SN-YNA local S 𱃱饦 is a kind of food for Yan'an, Shaanxi Admin
@eisoch eisoch changed the title Encoded Chinese Dialectal Ideographs List Encoded Common Chinese Dialectal Ideographs List Sep 7, 2020
@eisoch
Copy link
Owner Author

eisoch commented Sep 9, 2020

Some non-G0 or non-G1 common Chinese dialectal ideographs have been included in IICore or UnihanCore.

There are two rules in GB/T 22484-2016 for naming the metro and bus stations related to this list, A.3.2 and A.3.9.

UCS Char. G Ref. IICore UHCore TGH Dialect T/S Note
U+3635 G3-3370 H yue-CN-GD-CAN TS 清㘵: metro station; also used for place name in Fujian and Guangdong
U+3E06 GS-215F GH 6411 cmn-CN-SD-ZBO TS 干㸆鸡块, 煎㸆大虾: food; also used for DB37/T 1976-2011
U+4C7D GHZ‑74707.08 CH H yue-CN-HK T pomfret
U+5526 G3-3643 HMT cmn-CN-HB-WUH TS 板唦: attitude
U+5710 G3-3759 G 7369 cjy-CN-NM-ODS-JUN TS 糕圐圙: food; also used for DB15/T 815-2014
U+5719 G3-375C G 7677 cjy-CN-NM-ODS-JUN TS 糕圐圙: food; also used for DB15/T 815-2014
U+5871 G5-365B BTH GHMT 7601 yue-CN-GD-CAN TS 西塱: metro station
U+5C7B G5-3A5C HMT hak-CN-HK TS legacy use for 打铁岃 as the bus station
U+5C83 GH-1105 H hak-CN-GD-MXZ
hak-CN-HK
TS 赤岃岗: bus station
打铁岃: bus station
U+6725 GE-3C3E H nan-CN-GD-SWA T 朥餅: food; also used for T/CZSPTXH 075-2018
U+6818 G3-413C AT HMT cmn-CN-YN-KMG TS 酸栘𣐿: tree
U+681F G3-4140 GHMT 4461 wuu-CN-JS-RDG TS 栟茶: train station
U+6C39 G5-5460 BHM HM yue-CN-MO TS 氹仔: LTR line
U+6D50 G8-7C78 G 6895 cmn-CN-SN-SIA S 浐河: metro station
U+6E74 G3-546F GHMT 7239 yue-CN-GD-CAN TS 长湴: metro station
U+6E81 G8-2E5D G 7427 hsn-CN-HN-CSX S 溁湾镇: metro station
U+6ED8 G3-5563 BGH GHMT 7596 yue-CN-GD-LWQ
yue-CN-GD-SUD
yue-CN-GD-DGG
TS 滘口: metro station
北滘: train station
道滘: train station
U+6EFB G3-542A HMT cmn-CN-SN-SIA T 滻河: metro station
U+6FDA G5-5628 AKP HKP hsn-CN-HN-CSX T 濚灣鎮: metro station
U+713F GE-4079 CT T nan-CN-TW-KEE TS 红烧鳗焿 & 豆签焿: food
U+7201 G3-503D HMT cmn-CN-SN-SIA T 爁臊子: food
U+784B G5-582D H cdo-CN-FJ-NDS TS ceramic
U+7881 GE-443D ATJKP HJKP yue-CN-HK V 石碁: metro station
U+78B6 G3-583D GH 7659 wuu-CN-ZJ-NGB TS 大碶: metro station
U+7910 G3-5767 HMT nan-CN-GD-SWA T 礐石風景區: bus station
U+7C84 G5-6658 CT HMT hak-CN-GD-MXZ TS food; also used for T/GDMZCX 031-2019, 2.22 in T/GDMZCX 001-2019
U+7CBF 粿 G5-666C CT GHMT 5792 nan-CN-GD-SWA TS food; also used for T/CZSPTXH 001-2018.1, T/CZSPTXH 004-2018, T/CZSPTXH 070-2018, T/CZSPTXH 074-2018, T/CZSPTXH 077-2018, T/CZSPTXH 079-2019, T/CZSPTXH 085-2019, T/CZSPTXH 98.1-2019, T/CZSPTXH 101.1-2019, T/CZSPTXH 104.1-2019, T/YWYCY 3-2019
U+7F49 GH-1278 H yue-CN-HK TS 甜薄罉: desert, how to do
U+8137 GH-1281 BHM HM yue-CN-HK TS 龙脷鱼: sole fish
U+81A5 GH-1110 CH H yue-CN-HK TS 多膥鱼: capelin
U+83C9 GE-495D AGTKP GHKMPT 7133 yue-CN-GD-ZHA; wuu-CN-JS-KUS TS 菉塘: bus station, 菉溪路联谊路: bus station
U+8855 GE-4B77 HMT cmn-CN-BJ V 衚衕: alley in the common translation, derived from ᠬᠤᠳᠳᠤᠭ or ᠬᠣᠲᠠ possibly
U+885A GE-4B7A HMT cmn-CN-BJ V 衚衕: alley in the common translation, derived from ᠬᠤᠳᠳᠤᠭ or ᠬᠣᠲᠠ possibly
U+8FCC GE-4F2F H nan-CN-TW-TPE TS 《𨑨迌》
U+9120 G3-7137 GHMT 7510 cmn-CN-SN-HUX TS 鄠邑: train station
U+98E5 G3-7A4A HMT cmn-CN-SN-YNA T 䬣飥: food
U+990E G3-7A5E H cmn-CN-HA-JXY; cmn-CN-HE-NJN T 餄餎麺: food
U+991C G3-7A6E H cmn-CN-TJ T 煎餅餜子: food
U+992C GE-545C BTH HJMT cmn-CN-SN-WNA T 餬餑: food
U+993D GE-5462 ATH HJMT cmn-CN-HE-ZJK; cmn-CN-BJ-YQX T 炒餽⿰飠畾
U+9978 G8-2D45 CG G 4300 cmn-CN-HA-JXY; cmn-CN-HE-NJN S 饸饹面: food; also used for DB15/T 794-2014, DB41/T 1620-2018, T/JCCZ 001-2019, T/WRSPCYHY 002-2020, T/JCCZ 103-2020
U+9979 G8-2D46 CG G 4301 cmn-CN-HA-JXY; cmn-CN-HE-NJN S 饸饹面: food; also used for DB15/T 794-2014, DB41/T 1620-2018, T/JCCZ 001-2019, T/WRSPCYHY 002-2020, T/JCCZ 103-2020
U+9983 G8-2D4A CG G 4932 cmn-CN-TJ S 煎饼馃子: food; also used for T/TJCY 002-2018
U+9B93 G3-775C CJ HJMT cmn-CN-GZ-KWE; cmn-CN-GZ-PXN; cmn-CN-CQ T 小米鮓: a kind of food for Guiyang; 鮓面粑: a kind of food for Panzhou; 鮓鴨肉: a kind of food for Chongqing
U+9BB0 G3-7773 H wuu-CN-JS-NTG T 鮰魚: fish; also used for GB/T 15805.4-2018, T/NTCYHY 002.4-2018
U+9C02 G3-7774 CH H yue-CN-HK T 鰂魚涌/Quarry Bay: metro station, 鰂魚: crucian
U+9C72 G3-7879 BTH H yue-CN-HK T 赤鱲角消防局: bus station
U+9C8A G8-2D31 G 7558 cmn-CN-GZ-KWE; cmn-CN-GZ-PXN; cmn-CN-CQ; cmn-CN-SC-RHQ S 小米鲊: a kind of food for Guiyang; 鲊面粑: a kind of food for Panzhou; 鲊鸭肉: a kind of food for Chongqing; also used for DB50/T 792-2017, the standard text is here; also used for T/GZSX 006-2016 and T/GZSX 024-2017; 拉鲊: train station
U+9C97 G8-2D34 G 7703 yue-CN-HK S 鲗鱼涌/Quarry Bay: metro station, 鲗鱼: crucian
U+9C98 G8-2D35 G 7704 nan-CN-GD-HIF S 鲘门: train station
U+20547 𠕇 H nan-CN-GD-SWA TS 𠕇鱼: fish
U+20C96 𠲖 GHZ‑10622.01 CH H yue-CN-GD-ZHA TS 𠲖记鸡饭店: bus station
U+2343F 𣐿 GKX-0526.14 H cmn-CN-YN-KMG TS 酸栘𣐿: tree
U+241B5 𤆵 GHZ‑32194.05 H cmn-CN-YN-WSY; cmn-CN-CQ-SZY TS 𤆵肉饵丝: food, 桃园𤆵牛肉: food (The people lived in Shizhu County sometimes write the wrong character as "桃园耙牛肉"); also used for T/YCX 0106-2018
U+266E8 𦛨 H nan-CN-GD-SWA S 𦛨饼: food
U+28468 𨑨 GKX-1253.39 H nan-CN-TW-TPE TS 《𨑨迌》
U+2B6AD 𫚭 GCH-4027.31 G 8100 yue-CN-HK S 赤𫚭角消防局: bus station
U+2C488 𬒈 GCH-1010.22 G 7070 nan-CN-GD-SWA S 𬒈石风景区: bus station
U+2C494 𬒔 GGFZ‑011300 G 7506 yue-CN-GD-FOS TS 石𬒔总站: bus station

@eisoch eisoch added the subset label Sep 10, 2020
@eisoch
Copy link
Owner Author

eisoch commented Sep 11, 2020

Ongoing encoding characters are shown as below.

WSSN Pre-UCS IDS Ref. Dialect Popularity T/S Wr. Note
WS2017-02313 ⿰火足 GZA-204390 nan-CN-GD-SWA local TS ~肉: food
WS2017-03377 ⿰米某 UTC‑02979 nan-CN-GD-SWE local TS porridge
WS2017-03891 ⿰虫崩 UTC-02975 yue-CN-GD-SUD Pearl River Delta and Hong Kong, Macao TS ~䖢: food
WS2017-04751 ⿰饣召 UK-10440 cmn-CN-CQ local S ~子铺盖面: food
UNC2020-1 U+9FFD ⿰口窄 MC-00134 yue-CN-HK; yue-CN-MO local TS 渣~/喳~: desert

@eisoch
Copy link
Owner Author

eisoch commented Sep 11, 2020

The following characters are needed to encode in future.

IDS Dialect Popularity T/S Wr. Note Evidence
⿰火掌 yue-CN-GD-ZHA local TS 瓦~焗鸡: baked chicken by earthen pot absent
⿰米困 hak-CN-GD-FES local TS ~粄: food absent
⿺辶喜 yue-CN-HK; yue-CN-MO HK and Macao TS attitude 《俗》
⿰飠荅 cmn-CN-SD-LCH local T 𩛃~: food absent
⿰飠畾 cmn-CN-HE-ZJK; cmn-CN-BJ-YQX local T 餽~: food absent
⿰飠旋 cmn-CN-SD-TNA local T 油~兒: food absent
⿰饣夬 cmn-CN-YN-KMG local S 饵~: food; DB5223/T 6-2019 uses the wrong character 《烧饵~》
⿰饣瓜 cmn-CN-SD-LCH local S ~⿰饣荅 or ~⿰饣答: food absent
⿰饣荅 cmn-CN-SD-LCH local S ⿰饣瓜~: food absent
⿰饣鬼 cmn-CN-HE-ZJK; cmn-CN-BJ-YQX local S ~⿰饣畾: food absent
⿰饣答 cmn-CN-SD-LCH local S ⿰饣瓜~: food absent
⿰饣畾 cmn-CN-HE-ZJK; cmn-CN-BJ-YQX local S ⿰饣鬼~: food absent

@eisoch
Copy link
Owner Author

eisoch commented Feb 16, 2021

I provide the instable codes for sub-languages or sub-dialects of the tag cmn and some newfound dialects in Guangdong and Hunan as below. It is not suitable to submit them to ISO 639-3/RA currently.

Code Definition Macrolanguage Deprecated
dsm 东北官话 zho : cmn
cbm 北京官话 zho : cmn Yepocapa Southwestern Cakchiquel
Merged with Kaqchikel
hsm 冀鲁官话 zho : cmn
jlm 胶辽官话 zho : cmn
cpm 中原官话 zho : cmn
lym 兰银官话 zho : cmn
jhm 江淮官话 zho : cmn
cxm 西南官话 zho : cmn
cxn 湘南土话 zho
szt 韶州土话 zho
cxx 湘西乡话 zho

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant