You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The text on the image is like a label, printed text but the text is not much. Around 5 lines, each line with averge 15 Thai characters.
The output is
Warning. Invalid resolution 0 dpi. Using 70 instead.
Too few characters. Skipping this page
Error during processing.
However, when I rotated it back to the horizontal direction manually, then use the same command line. I got this result,
Warning. Invalid resolution 0 dpi. Using 70 instead.
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 0.54
Script: Latin
Script confidence: 2.22
BTW, the script type has a high error rate. I have another Thai pic, and it predicts Cyrillic.
Update:
For one picture, I tried both the command line and the API (using a Python wrapper called tesserocr), the command line --psm 0 --oem 0 -l osd outputs 'too few characters. skipping', while the api can give the orientation result.
Besides, tesseract can give output of the text on this picture using tesseract <picture name> stdout -l tha+eng
Expected Behavior:
Since they are actually the same picture, tesseract should perform the same connected component analysis on both. Thus it should not return too few characters.
Update:
It should return the same orientation result since the command line and the api are equivalent.
Suggested Fix:
NA
The text was updated successfully, but these errors were encountered:
Environment
Tesseract Vetesseract:
4.0.0-beta.1-270-g5a56
leptonica-1.76.0
libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
Found AVX2
Found AVX
Found SSErsion
Commit Number:
Platform:
Macbook Air
Darwin 17.6.0 Darwin Kernel Version 17.6.0: Tue May 8 15:22:16 PDT 2018; root:xnu-4570.61.1~1/RELEASE_X86_64 x86_64
Current Behavior:
I refer to #1463 for the command line.
The text on the image is like a label, printed text but the text is not much. Around 5 lines, each line with averge 15 Thai characters.
The output is
However, when I rotated it back to the horizontal direction manually, then use the same command line. I got this result,
BTW, the script type has a high error rate. I have another Thai pic, and it predicts Cyrillic.
Update:
For one picture, I tried both the command line and the API (using a Python wrapper called tesserocr), the command line
--psm 0 --oem 0 -l osd
outputs 'too few characters. skipping', while the api can give the orientation result.Besides, tesseract can give output of the text on this picture using
tesseract <picture name> stdout -l tha+eng
Expected Behavior:
Since they are actually the same picture, tesseract should perform the same connected component analysis on both. Thus it should not return
too few characters
.Update:
It should return the same orientation result since the command line and the api are equivalent.
Suggested Fix:
NA
The text was updated successfully, but these errors were encountered: