Best tessdata Feedback - Japanese #76

whatohyou · 2017-08-25T01:09:35Z

https://github.com/tesseract-ocr/tessdata/blob/master/best/jpn.traineddata
Very nice.
https://github.com/tesseract-ocr/tessdata/blob/master/best/jpn_vert.traineddata
Does not work for PSM 6(default) mode.
If you add "-l jpn + jpn_vert" option, it will read vertical text in a horizontal way therefore result in failure.

Some vertical texts are not recognised correctly.

This vertical text should be read from right to left.
After it reads "ったく", tesseract seems to understand it as the shorter edge, so it rotate the image in a wrong way... therefore results in failure.

Possible solution is, to split the text image, put the texts in single line before processing it with tesseract.

perfect result was achieved: "ったく戦い方もロクに知らないくせに抵抗しやがって"

amitdo · 2017-08-25T05:43:52Z

Does not work for PSM 6(default) mode

The default is 3.

3 Fully automatic page segmentation, but no OSD. (Default)

whatohyou · 2017-08-25T06:10:50Z

I see. I changed it to 3 from 1 and it works, too.

Still it doesn't recognise "ったく" and similar cases.

This Tesseract jpn_vert.traineddata doesn't reads these two.

But if you combine them vertically. Then it reads at least "村に行って馬を借りてくるわ" but still not the "それに" part.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best tessdata Feedback - Japanese #76

Best tessdata Feedback - Japanese #76

whatohyou commented Aug 25, 2017

amitdo commented Aug 25, 2017

whatohyou commented Aug 25, 2017 •

edited

Loading

Best tessdata Feedback - Japanese #76

Best tessdata Feedback - Japanese #76

Comments

whatohyou commented Aug 25, 2017

amitdo commented Aug 25, 2017

whatohyou commented Aug 25, 2017 • edited Loading

whatohyou commented Aug 25, 2017 •

edited

Loading