Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoded Common Chinese Place Name Ideographs List #105

Open
eisoch opened this issue Dec 26, 2020 · 3 comments
Open

Encoded Common Chinese Place Name Ideographs List #105

eisoch opened this issue Dec 26, 2020 · 3 comments
Labels

Comments

@eisoch
Copy link
Owner

eisoch commented Dec 26, 2020

This list is based on the correction list of the place name data cited from National Bureau of Statistics of China, which was written by @Kushim-Jiang. This list is used for the 7th population census of PRC. The value of Subdivision column is based on ISO 3166-2:CN, and the part before the hyphen is defined as below.

p=province/省, ar=autonomous region/自治区, sar=special administrative region/特别行政区, m=municipality/直辖市, t=town/镇, v=village/村, n=neighborhood committee/居委会

UCS Char. Ref. Subdivision Count T/S RVar. Note
U+3630 GHZ-10427.03
T3-2534
v-ZJ 1 TS
U+3BCA G3-4276
T4-4A38
KP1-4D42
V2-7954
t-GZ
v-GZ
2 TS
U+3D72 G3-5651
T4-5660
t-GZ
v-GZ
2 TS
U+3E67 G3-437C
T4-273A
v-ZJ 1 TS
U+44E3 G3-6922
T4-3A5C
v-GD 9 T 𬜯
U+4955 G3-7678
T4-616C
KP1-8182
v-JS 5 T 𬭯
U+57E8 G3-336B
T3-3474
v-SD 1 T 𫭢
U+6E7B GE-3F5C
T3-3C62
J1-4825
v-GD 1 V
U+9D1C GE-565E
T4-5A4E
J1-6B62
K2-7235
KP1-8E34
v-LN 1 T 𪉈
U+2134C 𡍌 GHZ-10452.06 v-SX 2 T *
U+23D66 𣵦 GHZ-31630.05
T4-2D72
v-TJ 2 TS
U+23E24 𣸤 *GBK-2107000 v-GX 4 TS
U+24C59 𤱙 TF-312C v-ZJ 1 TS
U+287AA 𨞪 GKX-1278.33
T4-593B
VU-287AA
v-SC 1 T *
U+2A248 𪉈 GCH v-LN 1 S
U+2AFD6 𪿖 TC-4D72 v-ZJ 4 TS
U+2C1DE 𬇞 *MC-00110 v-FJ 1 TS
U+2CBB4 𬮴 GGH-3022.16 v-GZ
v-SN
v-GS
v-QH
4 VS
U+30F5A 𰽚 UK-02204 v-SD 1 VS
@eisoch eisoch added the subset label Dec 26, 2020
@eisoch
Copy link
Owner Author

eisoch commented Dec 26, 2020

Some non-G0 or non-G1 common Chinese place name ideographs have been included in IICore or UnihanCore.

UCS Char. GRef. IICore UHCore TGH Subdivision Count T/S RVar. Note
U+3B4E G7-2225 G 6664 v-CQ
v-SC
2 S
U+364D G3-3447 G 7118 t-GD
v-GD
2 TS
U+3658 G3-344C G 7292 v-HB 1 TS
U+365F G3-3465 H v-GD
v-GX
2 TS
U+3666 G3-346C G 7698 v-JX 4 TS
U+3C54 GHZ-21442.01 H v-YN 1 TS
U+40CE G3-5840 G 7660 v-GS 1 TS
U+48BA G3-7066 G 6633 t-SC
v-SC
2 TS
U+5273 GE-3241 HJ v-ZJ 1 V
U+5649 GE-3378 HMT v-GD 1 TS
U+57B5 G5-3623 CT GHMT 6780 v-ZJ
n-FJ
v-FJ
46 TS
U+58C6 G3-3378 CH HMT n-GD
v-GD
2 T *
U+5EFB GE-383F AJKP HJKP v-SX
v-ZJ
v-JX
v-HN
n-HN
v-GD
t-GD
n-GD
v-SC
n-GZ
v-GZ
17 V
U+68E1 G3-405E HJMT v-CQ
v-SC
2 T
U+6AC8 GE-3E2F CT H n-HE
v-LN
2 V
U+6C3E G8-2F6A AGTJHKMP GHJKMPT 6517 t-JS 1 V
U+6C4E GE-3E7D ATJKMP HJKMPT n-ZJ
v-HA
3 V
U+6D6C GT-3192 AGTHKMP GHJKMPT 7057 v-GD 1 TS
U+6D7F 浿 G3-5348 AKP HKMPT v-FJ 3 T 𬇙
U+7460 GE-424B ATJKP HJKP v-HB 1 TS
U+74C8 GE-425D CT H v-JX 1 TS
U+77F4 GE-4427 H v-ZJ 1 TS
U+784B G5-582D H v-FJ 6 TS
U+7A8E G8-7D71 G 7074 v-GS 1 S
U+7AB5 G3-5F5A HMT v-GS 1 T
U+7BE2 G3-637B HMT v-HN
t-GD
n-GD
v-GX
4 T 𬕂
U+83D3 GE-495F ATJHKP HJKP v-SX
v-SD
v-HA
v-HN
6 V
U+8534 G8-2F6D BTH H v-SX
t-JX
n-JX
v-SD
v-GD
5 V
U+87C7 GE-4B58 CJ J v-SC 1 TS
U+8856 GE-4B78 CT HMT v-AH
v-HB
v-HN
8 TS
U+945B GE-524A ATJKP HJKP v-SD 1 VT 𰽚
U+95C7 GE-5274 ATJKP HJKMPT v-GZ
v-SN
v-GS
v-QH
4 VT 𬮴
U+21336 𡌶 H v-GX 2 TS
U+21413 𡐓 GHZ-10482.01 GH 7635 v-HB 2 TS
U+287E0 𨟠 GKX-1279.32 G 8057 v-HE
v-SD
v-TJ
31 TS
U+2BB62 𫭢 GCH-3002.50 G 6564 v-SD 1 S
U+2C1D9 𬇙 GCH-3007.79 G 6616 v-FJ 3 S 浿
U+2C542 𬕂 GXC-1010.94 G 7541 v-HN
t-GD
n-GD
v-GX
4 S
U+2C72F 𬜯 GGFZ-022100 G 6951 v-GD 9 S
U+2CB6F 𬭯 GCH-4021.51 G 7874 v-JS 5 S

@eisoch
Copy link
Owner Author

eisoch commented Dec 26, 2020

Ongoing encoding characters are shown as below.

WSSN Pre-UCS IDS Ref. Subdivision Count T/S RVar. Note
WS2017-00667 ⿰土夭 USAT07230
UTC-02991
v-ZJ 2 TS
WS2017-00670 ⿰土戋 GDM-00022
UK-10849
v-SX 2 S 𡍌
WS2017-02545 ⿰犭茶 GDM-00090
UK-10993
v-ZJ 1 TS

@eisoch
Copy link
Owner Author

eisoch commented Dec 26, 2020

The following character is needed to encode in future.

Pre-UCS IDS Subdivision Count T/S RVar. Note Evidence
*U+2B737 ⿰寿阝 v-SC 1 S 𨞪 UNC from China IRGN2446
⿱𰃮土 n-GD
v-GD
2 S UK candidate for WS2021; also the bus station as 作~坑村 in HKSAR 《汉语方言大词典》

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant