Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Chinese words are not included in the module #3

Open
LLauryn opened this issue Jul 8, 2019 · 3 comments
Open

Some Chinese words are not included in the module #3

LLauryn opened this issue Jul 8, 2019 · 3 comments

Comments

@LLauryn
Copy link

LLauryn commented Jul 8, 2019

The library of Chinese grapheme-to-phoneme conversion is not complete. I have found part of missed Chinese words: 邓,吴,鄂,皖,蔡,萨,廖,宋,秦,刘,滧,闫,陕,郑,郝,犇,鹏,陇,祾,渭,邹,濮,梵,佟,韩,龚,洛,湘,婍,沂,隋,洣,潘,蒋,禹,喲,闽,湳,綪,睍,孻,汶,杭,吶,黔,渝,辽,銶,滇,灞,溁,浙,渤,邵,赣,淮,郸,彭,傣,蜀,沪,癍,郦,滕,滦,榣,姈,亳,漳,邢,涪,尧,昝,羲,媃,粤,鞑
from g2pc import G2pC
g2p = G2pC()
print(g2p("吴"))
e.g. When I input the text "邓小平", the result for "邓" is ('邓', 'nr', '邓', '邓', '', '邓').
When I input "吴", the result is ('吴', 'nr', '吴', '吴', '', '吴'), etc.
All of words I post have the same problem like the examples above.

@Kyubyong
Copy link
Owner

Kyubyong commented Jul 9, 2019

Thanks. Most of them are used for names. I fixed the bug so update the library to check the new results. Some of them are still missing because they are not in cedict. Let me find a solution to this in the near future.

@melspectrum007
Copy link

@Kyubyong
Thanks for your impressive work. I also found Some Chinese words are not included in the module, such as "琊". Cound you update and include these missing Chinese words?

thanks

@melspectrum007
Copy link

Another question, how many Chinese word is included in the model? Cound you include the full Chinese Dictonary? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants