Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when specify the white-list, why show the following effect? #1200

Closed
gbolin opened this issue Nov 6, 2017 · 2 comments
Closed

when specify the white-list, why show the following effect? #1200

gbolin opened this issue Nov 6, 2017 · 2 comments

Comments

@gbolin
Copy link

gbolin commented Nov 6, 2017


Environment

  • Tesseract Version: 4.0.0-alpha
  • Commit Number:
  • Platform:

Current Behavior:

I am using the tesseract to detect the chi_sim, for example "2017年11月06日”.
when I using the following command:
tesseract XX.jpg stdout -l chi_sim --psm 7 --oem 0,
it outputs:2017年10月 12 臼
but when I use this one:
tesseract XX.jpg stdout -l chi_sim --psm 7 --oem 0 -c tessedit_char_whitelist="0123456789年月日"
it outputs:
2017 10 12
it looks that after I specify the whitelist, the letter "年” and “月” go missing while not specify, it is there, why?

Expected Behavior:

input: tesseract XX.jpg stdout -l chi_sim --psm 7 --oem 0 -c tessedit_char_whitelist="0123456789年月日"
output: 2017年10月 12

Suggested Fix:

in my opinion, whitelist means the result filter, for example:
input "6758490210", the whitelist is as: "02468"
the output should be:"684020"

sincerely thanks for your answer.

@wosiu
Copy link

wosiu commented Nov 10, 2017

whitelisting does not work for LSTM.
#751

@ghost
Copy link

ghost commented Jan 18, 2018

Not sure if this is related but we've had this issue for a long time:
#960

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants