OSD not working for rotated image, but working for same image after deskew #1701

Layneww · 2018-06-24T11:58:01Z

Environment

Tesseract Vetesseract:
4.0.0-beta.1-270-g5a56
leptonica-1.76.0
libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
Found AVX2
Found AVX
Found SSErsion
Commit Number:
Platform:
Macbook Air
Darwin 17.6.0 Darwin Kernel Version 17.6.0: Tue May 8 15:22:16 PDT 2018; root:xnu-4570.61.1~1/RELEASE_X86_64 x86_64

Current Behavior:

I refer to #1463 for the command line.

tesseract image-rotated-270.png stdout --tessdata-dir <my tessdata dir> --psm 0 --oem 0 -l osd

The text on the image is like a label, printed text but the text is not much. Around 5 lines, each line with averge 15 Thai characters.
The output is

Warning. Invalid resolution 0 dpi. Using 70 instead.
Too few characters. Skipping this page
Error during processing.

However, when I rotated it back to the horizontal direction manually, then use the same command line. I got this result,

Warning. Invalid resolution 0 dpi. Using 70 instead.
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 0.54
Script: Latin
Script confidence: 2.22

BTW, the script type has a high error rate. I have another Thai pic, and it predicts Cyrillic.

Update:
For one picture, I tried both the command line and the API (using a Python wrapper called tesserocr), the command line --psm 0 --oem 0 -l osd outputs 'too few characters. skipping', while the api can give the orientation result.
Besides, tesseract can give output of the text on this picture using tesseract <picture name> stdout -l tha+eng

Expected Behavior:

Since they are actually the same picture, tesseract should perform the same connected component analysis on both. Thus it should not return too few characters.

Update:
It should return the same orientation result since the command line and the api are equivalent.

Suggested Fix:

NA

The text was updated successfully, but these errors were encountered:

amitdo · 2018-07-05T19:14:04Z

I don't know why it does not warn with the second image.

In any case (with or without warning), less than 50 characters is too little for the OSD feature to reliably detect script and orientation.

tailsu · 2019-02-06T09:54:34Z

There is the parameter min_characters_to_try which governs the cutoff mentioned by @amitdo . By default it's 50.

$ tesseract --print-parameters | fgrep characters
...
min_characters_to_try	50	Specify minimum characters to try during OSD

suravijayjilla · 2019-12-05T09:39:43Z

Hi,

Is this possible to change that cutoff value.
I am facing the same issue, I need to change that value and try to rotate the image.

amitdo · 2020-05-18T01:00:30Z

@Layneww, without an image we can't test the reported issue and thus can't help you.

bozhodimitrov mentioned this issue Jul 2, 2018

image_to_osd: tesseract is not installed or it's not in your path madmaze/pytesseract#134

Closed

zdenop added the accuracy label Sep 30, 2018

amitdo added the OSD Orientation and Script Detection label May 14, 2020

amitdo closed this as completed May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSD not working for rotated image, but working for same image after deskew #1701

OSD not working for rotated image, but working for same image after deskew #1701

Layneww commented Jun 24, 2018 •

edited

Loading

amitdo commented Jul 5, 2018 •

edited

Loading

tailsu commented Feb 6, 2019

suravijayjilla commented Dec 5, 2019

amitdo commented May 18, 2020

OSD not working for rotated image, but working for same image after deskew #1701

OSD not working for rotated image, but working for same image after deskew #1701

Comments

Layneww commented Jun 24, 2018 • edited Loading

Environment

Current Behavior:

Expected Behavior:

Suggested Fix:

amitdo commented Jul 5, 2018 • edited Loading

tailsu commented Feb 6, 2019

suravijayjilla commented Dec 5, 2019

amitdo commented May 18, 2020

Layneww commented Jun 24, 2018 •

edited

Loading

amitdo commented Jul 5, 2018 •

edited

Loading