Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best tessdata Feedback - Japanese #76

Open
whatohyou opened this issue Aug 25, 2017 · 2 comments
Open

Best tessdata Feedback - Japanese #76

whatohyou opened this issue Aug 25, 2017 · 2 comments

Comments

@whatohyou
Copy link

https://github.com/tesseract-ocr/tessdata/blob/master/best/jpn.traineddata
Very nice.
https://github.com/tesseract-ocr/tessdata/blob/master/best/jpn_vert.traineddata
Does not work for PSM 6(default) mode.
If you add "-l jpn + jpn_vert" option, it will read vertical text in a horizontal way therefore result in failure.

Some vertical texts are not recognised correctly.
temp
This vertical text should be read from right to left.
After it reads "ったく", tesseract seems to understand it as the shorter edge, so it rotate the image in a wrong way... therefore results in failure.

Possible solution is, to split the text image, put the texts in single line before processing it with tesseract.

1
perfect result was achieved: "ったく 戦い方もロクに 知らないくせに 抵抗しやがって"

@amitdo
Copy link

amitdo commented Aug 25, 2017

Does not work for PSM 6(default) mode

The default is 3.

3 Fully automatic page segmentation, but no OSD. (Default)

@whatohyou
Copy link
Author

whatohyou commented Aug 25, 2017

I see. I changed it to 3 from 1 and it works, too.

Still it doesn't recognise "ったく" and similar cases.

temp
2temp
This Tesseract jpn_vert.traineddata doesn't reads these two.
11untitled-2
But if you combine them vertically. Then it reads at least "村に行って馬を借りてくるわ" but still not the "それに" part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants