Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

follow opencc conversion chain #688

Merged
merged 3 commits into from
Aug 12, 2023
Merged

Conversation

eagleoflqj
Copy link
Member

@eagleoflqj eagleoflqj commented Aug 10, 2023

Pull request

Issue tracker

Fixes will automatically close the related issue

Fixes #652

Feature

Describe feature of pull request

Unit test

  • Done

Manual test

  • Done
    Screenshot from 2023-08-10 21-35-37

EDIT: the screenshot actually shows a clear bug 🤦‍♂️. See below for a corrected one.

Code Review

  1. Unit and manual test pass
  2. GitHub Action CI pass
  3. At least one contributor reviews and votes
  4. Can be merged clean without conflicts
  5. PR will be merged by rebase upstream base

Additional Info

@eagleoflqj eagleoflqj marked this pull request as ready for review August 11, 2023 01:38
@eagleoflqj
Copy link
Member Author

Screenshot from 2023-08-11 20-41-27

@eagleoflqj eagleoflqj requested review from amorphobia and lotem and removed request for amorphobia August 12, 2023 00:56
@lotem lotem merged commit 75e6b1a into rime:master Aug 12, 2023
5 checks passed
@eagleoflqj eagleoflqj deleted the conversion-chain branch August 12, 2023 15:51
@groverlynn
Copy link
Contributor

groverlynn commented Sep 4, 2023

This is still wrong.
E.g. 才能 (zh-Hans) should be converted to 才能 and 纔能 (zh-Hant), but the two should agains be merged to 才能 (zh-TW / zh-HK). However, this algorithm can filter out but not 纔能 in zh-TW / zh-HK.

Basically, it only works with mono-character and fails when there are terms and phrases in the text

@eagleoflqj
Copy link
Member Author

The restriction from OpenCC determines it’s not easy to be perfect if possible. Current status is more acceptable than previous as at least correct result is available to users. Treat extraneous words just as irrelevant ones.

@groverlynn
Copy link
Contributor

it has nothing to do with OpenCC. On the contrary, OpenCC gives correct results in such cases. The problem is the implementation of conversion chain in rime

@groverlynn groverlynn mentioned this pull request Sep 18, 2023
2 tasks
groverlynn pushed a commit to groverlynn/librime that referenced this pull request Sep 27, 2023
* follow opencc conversion chain

* when a dict doesn't contain a word, pass as-is

* de-duplication
graphemecluster pushed a commit to TypeDuck-HK/librime that referenced this pull request Nov 2, 2023
* follow opencc conversion chain

* when a dict doesn't contain a word, pass as-is

* de-duplication
graphemecluster pushed a commit to TypeDuck-HK/librime that referenced this pull request Nov 8, 2023
* follow opencc conversion chain

* when a dict doesn't contain a word, pass as-is

* de-duplication
graphemecluster pushed a commit to TypeDuck-HK/librime that referenced this pull request Mar 18, 2024
* follow opencc conversion chain

* when a dict doesn't contain a word, pass as-is

* de-duplication
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenCC "conversion_chain" not fully working
4 participants