Error: Illegal Parameter specification! with Tesseract4Alpha #1010

nachobit · 2017-06-28T11:55:29Z

After upgrade to Tesseract-4-Alpha, I found this error making the OCR from my JAVA code:

ITesseract instance = new Tesseract(); instance.setDatapath("/usr/share/tessdata/"); instance.setLanguage("spa"); (...) result = instance.doOCR(imageFile);

Environment

Tesseract Version: tesseract 4.00.00alpha
Leptonica Version: leptonica-1.74.4
Platform: CentOS 6.7
Server: Wildfly 10.1

Current Behavior:

Error: Illegal Parameter specification!
"Fatal error encountered!" == NULL:Error:Assert failed:in file globaloc.cpp, line 75

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007ff1b3098549, pid=25091, tid=0x00007ff29d7d7700

JRE version: OpenJDK Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13)
Java VM: OpenJDK 64-Bit Server VM (25.121-b13 mixed mode linux-amd64 compressed oops)
Problematic frame:
C [libtesseract.so+0x26f549] ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+0x129

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as:
/opt/wildfly/wildfly-10.1.0.Final/hs_err_pid25091.log

If you would like to submit a bug report, please visit:
http://bugreport.java.com/bugreport/crash.jsp
The crash happened outside the Java Virtual Machine in native code.
See problematic frame for where to report the bug.

*** JBossAS process (25091) received ABRT signal ***

Suggested Fix:

Any idea?

The text was updated successfully, but these errors were encountered:

Shreeshrii · 2017-06-28T12:42:53Z

Please use the latest source from master branch of github and inform whether you still get the error.

nachobit · 2017-06-29T06:58:00Z

I'm using the lastest source yet. I have the same Error (*) in two different OS.

(*) Error: Illegal Parameter specification!
"Fatal error encountered!" == NULL:Error:Assert failed:in file globaloc.cpp, line 75

Shreeshrii · 2017-06-29T07:18:59Z

what version of c++ are you using?

two different OS

which ones? I have been able to build on ubuntu 14.04. Travis and appveyor builds are building ok.

Shreeshrii · 2017-06-29T07:20:32Z

Also, are you able to run tesseract from command line ?

tesseract -v

also try to OCR the sample image from testing folder.

nachobit · 2017-06-29T07:31:47Z

@Shreeshrii I'm using g++ 7.1.1 in Arch and 4.8.2 in CentOS 6.7.
I launch tesseract from Java. No problems with 3.05 version but I get the error previously commented with 4Alpha version.

tesseract 4.00.00alpha
leptonica-1.74.4
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.1) : libpng 1.6.29 : libtiff 4.0.8 : zlib 1.2.11 : libwebp 0.6.0

Shreeshrii · 2017-06-29T08:04:34Z

what about

tesseract --list_langs

Are you able to OCR an image from command line with the 4.0 version?

Do you have 4.00.00alpha version of traineddata files?

Download 4.0 traineddata to a different folder and refer to that

nachobit · 2017-06-29T08:10:05Z

No problems detecting langs with tesseract --list_langs (eng, spa and osd trainned files for LSTM based 4.00.00alpha version).

About command line recognition, I have done fine an example from testing folder properly.

Perhaps, some Java code has changed from this 4Alpha version?

Shreeshrii · 2017-06-29T08:30:24Z

@nachobit Please see Quan's Java JNA wrapper for Tesseract OCR API at https://github.com/nguyenq/tess4j

nachobit · 2017-06-29T08:49:08Z

The problem with 3.05.01 version is that I get different resutls from both OS using same Leptonica and Tesseract ver. in a PDF recognition.

Example:

0000 0340 º71º ZL (in CentOS) and 0000 0340 0710 ZL (in Arch).

For that reason I'd like to improve the 4Alpha but it's impossible for the error commented some lines back.

amitdo · 2017-06-29T09:12:05Z

If you have an issue with a wrapper to Tesseract's C/C++ API, please report the issue to the developers of that software.

amitdo · 2017-06-29T09:20:42Z

I'm using g++ 7.1.1 in Arch and 4.8.2 in CentOS 6.7.

0000 0340 º71º ZL (in CentOS) and 0000 0340 0710 ZL (in Arch).

Ray said, many years ago, that you can get different results with different compilers.

amitdo · 2017-06-29T09:47:55Z

Perhaps, some Java code has changed from this 4Alpha version?

Yes.
https://github.com/nguyenq/tess4j/commits/master

nachobit · 2017-06-29T10:09:51Z

Updated to the lastest libs from Tess4J-3.4.0-src I get same error when launch the OCR from Java code.

From 3.05.01 version, is there any solution to solve the fail recognizing "zeros" ( º instead of 0)?

amitdo · 2017-06-29T10:27:54Z

3.4.0 does not include the 4.00 changes.
https://github.com/nguyenq/tess4j/commits/tess4j-3.4.0

amitdo · 2017-06-29T10:32:58Z

From 3.05.01 version, is there any solution to solve the fail recognizing "zeros" ( º instead of 0)?

You can try to compile with a newer version of gcc. I can't promise that this 'solution' will help you with this issue.

nachobit · 2017-06-29T10:45:34Z

Ok, that's the problem? Tess4J-3.4.0 (Java) is not supported by 4.00Alpha release? Then I will try compilling with a newer version of GCC.

amitdo · 2017-06-29T11:06:18Z

Ok, that's the problem? Tess4J-3.4.0 (Java) is not supported by 4.00Alpha release?

I assume that's the source of the problem (It's Tess4J 3.4.0 that seems to not have support for Tesseract 4.00, not vice versa). To be sure, ask the developer.

https://sourceforge.net/p/tess4j/discussion/1202294/
https://github.com/nguyenq/tess4j/issues

Shreeshrii · 2017-06-29T11:12:57Z

nguyenq/tess4j@74c8509 +Version 4.0.0 beta (8 June 2017) +- Upgrade to Tesseract 4.0.0 alpha (8c29e68) +- Update Lept4J to 1.5.0 (Leptonica 1.74.2) ShreeDevi

…

____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Jun 29, 2017 at 3:40 PM, Nacho Romero ***@***.***> wrote: Updated to the lastest libs from *Tess4J-3.4.0-src* I get same error when launch the OCR from Java code. From 3.05.01 version, is there any solution to solve the fail recognizing "zeros" ( º instead of 0)? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1010 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE2_oxFIpUnAMuY0-FEdUVI_o0dR3pxUks5sI3f8gaJpZM4OH3Lp> .

nguyenq · 2017-06-29T13:22:31Z

tess4j's master branch is for Tesseract 4.0alpha and includes the latest Tesseract 4.0alpha Windows binary. All of its unit tests passed on Windows 10. We have not tested on Linux OS yet.

Since you link against Leptonica 1.74.4, make sure you use lept4j-1.6.0.

zdenop · 2017-06-29T16:48:32Z

We do not support 3rd party sw including tesseract wrapper. Please reproduce error with c++

banksone · 2017-08-06T17:51:09Z

Hi .. it seems you just have to add environment variable LC_NUMERIC="C" ... and it works. :)

AlexanderHugestrand · 2017-09-27T06:29:08Z

I dug into Tesseract's code and found that the string "Illegal Parameter specification" only exists in one place, namely in the file classify/clusttool.cpp. After some debugging I realised that the function ReadParamDesc() calls sscanf() at line 82 (for git commit hash 2b854e3), which is locale dependent. It fails since the numeric input (two floating point values) are written with dots (example: 1.23), but using a different locale other than en_US for LC_NUMERIC may cause sscanf() to expect other characters, like commas (1,23).

With other words, the error is in tesseract, assuming a locale. It should rather be set explicitly. The workaround is to set LC_NUMERIC=en_US.UTF-8.

amitdo · 2017-09-27T09:48:09Z

https://github.com/tesseract-ocr/tesseract/wiki/FAQ#error-illegal-min-or-max-specification
https://web.archive.org/web/20150510151209/http://code.google.com/p/tesseract-ocr/issues/detail?id=228
https://web.archive.org/web/20150509203443/http://code.google.com/p/tesseract-ocr/issues/detail?id=250

https://msdn.microsoft.com/en-us/library/wyzd2bce.aspx

https://en.cppreference.com/w/cpp/locale/setlocale

https://stackoverflow.com/questions/13919817/sscanf-and-locales-how-does-one-really-parse-things-like-3-14

hamduu · 2018-04-05T15:24:56Z

i am facing the same issue.Please could u share the file which has to be changed, so that i can jus go replace the specific file and continue creating the traineddata.

hamduu · 2018-04-05T15:26:45Z

@nizzeberra : i see the same files as you say, but dont know where to place the code. Please could you share that file.

Shreeshrii · 2018-04-05T16:17:32Z

@stweil Is it possible to address this for final 4.0.0?

stweil · 2018-04-05T18:15:38Z

Setting LC_NUMERIC in the Tesseract code would perhaps solve the problem, but is not a good solution for people who use the Tesseract library. They don't expect that Tesseract changes LC_NUMERIC, and perhaps they need a different value.

I wonder whether the sscanf handling of %f does really depend on the locale settings. It does not on my Debian GNU Linux system, nor could I find a hint in the MSDN documentation on sscanf.

@nizzeberra, which systems / C libraries show that strange behaviour? Do you have links to documentation?

PS: These code locations use %f:

classify/clusttool.cpp:        sscanf(line, "%" QUOTED_TOKENSIZE "s %" QUOTED_TOKENSIZE "s %f %f",
classify/ocrfeatures.cpp:    if (tfscanf(File, "%f", &(Feature->Params[i])) != 1)
wordrec/params_model.cpp:  if (sscanf(line + end_of_key, " %f", val) != 1)

AlexanderHugestrand · 2018-04-05T19:11:13Z

@hamduu I'm not sure I understand what you are asking. The file and the line that I pointed out is where the error is triggered, and should probably not be changed. And LC_NUMERIC is just an environment variable that you can set manually.

@stweil I have built and tested tesseract on Linux Mint and I have no info about specific libraries right now.

AlexanderHugestrand · 2018-04-05T19:17:14Z

Here is the man page, and it's pretty clear about the locale:

http://man7.org/linux/man-pages/man3/scanf.3.html

stweil · 2018-04-05T19:36:12Z

Linux Mint uses Debian packages, so the result should not be much different. The man page only says that LC_NUMERIC can be used to allow separators for multiples of thousand.

Here is the test scenario which I used (maybe you can try it on Linux Mint):

$ cat sscanf-test.cpp 
#include <stdio.h>

int main(int argc, char *argv[])
{
  for (int arg = 1; arg < argc; arg++) {
    float f = 0.0f;
    sscanf(argv[arg], "%f", &f);
    printf("f[%d] = %f\n", arg, f);
  }
  return 0;
}
$ g++ -std=c++11 -Wall -Wextra sscanf-test.cpp -o sscanf-test
$ ./sscanf-test 3.14
f[1] = 3.140000
$ ./sscanf-test 3,14
f[1] = 3.000000
$ LC_NUMERIC=de_DE.UTF-8 ./sscanf-test 3,14
f[1] = 3.000000

amitdo · 2018-04-05T20:57:49Z

https://en.cppreference.com/w/cpp/locale/setlocale

Here 3.14 -> 3,14

stweil · 2018-04-06T10:04:46Z

That's interesting. So C/C++ programs don't use the locale which was set in the environment, but start running with the "C" locale. That is exactly what I observed in my test. Only if I set LC_NUMERIC inside of my test program, I get a different behaviour.

That implies that we have no problem for the tesseract executable or the training programs which are provided by Tesseract. Nor will external software have a problem as long as it does not set LC_NUMERIC.

Maybe Java uses the environment settings to set LC_NUMERIC internally. That would explain the reported problem.

In addition to the problem with sscanf, more code is possibly affected by "wrong" locale settings, for example these lines:

classify/ocrfeatures.cpp:    fprintf (File, "%f  %f\n",
wordrec/params_model.cpp:    if (fprintf(fp, "%s %f\n", kParamsTrainingFeatureTypeName[i], weights[i])

@Shreeshrii, I don't think that all that code can be found and rewritten for 4.0.0. It would be possible to report a warning when the Tesseract initialisation code detects an unsupported locale setting.

Shreeshrii · 2018-04-06T10:15:35Z

@stweil Thanks for the investigation. Yes, please make possible changes to point the users in the right direction.

Related issue reg: locales.
#1250 (comment)

hamduu · 2018-04-08T05:29:03Z

sorry to ask about this in detail, i have no expertise in this.
i am trying to use this command... cntraining eng.TimesNewRoman.exp0.tr
this is the error i am getting
Error: Illegal number of feature sets!
"Fatal error encountered!" == NULL:Error:Assert failed:in file globaloc.cpp, line 75
Abort trap: 6

i even tried changing the locale from terminal,but i am still finding the same error. Have added the screenshots for the same.

how do i make this command work. can use please tell me where exactly do i need to make changes to set the locale differently? Please do help me out.

Regards

suresh443 · 2018-07-04T14:22:28Z

Hi , Tesseract is working fine in main method(JAVA), but when i try to run in web application i am facing below error

**#

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007f96274dbac7, pid=4516, tid=0x00007f9699212700

JRE version: OpenJDK Runtime Environment (8.0_171-b11) (build 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11)

Java VM: OpenJDK 64-Bit Server VM (25.171-b11 mixed mode linux-amd64 compressed oops)

Problematic frame:

C [libtesseract.so.3.0.4+0x9dac7] tesseract::Tesseract::recog_all_words(PAGE_RES, ETEXT_DESC, TBOX const, char const, int)+0x5e7

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as:

/home/ahextech/suresh/softwares/eclipse/hs_err_pid4516.log

If you would like to submit a bug report, please visit:

http://bugreport.java.com/bugreport/crash.jsp

The crash happened outside the Java Virtual Machine in native code.

See problematic frame for where to report the bug.

#**

My sample code is:
Tesseract tesseract = new Tesseract();
TesseractOCRConfig config = new TesseractOCRConfig(); config.setTessdataPath("/home/ahextech/suresh/ehubWorkspace/BITBUCKETCODE/");
config.setLanguage("Eng");
String text = tesseract.doOCR(imageProcessed);

My current Os : Ubuntu 16.04
Java JDK version 1.8
My Tomcat version : 8
Tessercat Version :3.4.4

Help me if u can

Thanks a Ton..!!!

zdenop closed this as completed Jun 29, 2017

Shreeshrii mentioned this issue Apr 10, 2018

OSD Not working --psm=0,1,12 #1463

Closed

saudet mentioned this issue Jul 18, 2018

"Error: Illegal Parameter specification!" in a Linux process invoking tesseract bytedeco/javacpp-presets#591

Closed

maximumspatium mentioned this issue Feb 25, 2019

tesseract 4.0.0-1.4.4 crashes on Mac OS bytedeco/javacpp-presets#694

Closed

amitdo added the locale label Mar 21, 2021

Error: Illegal Parameter specification! with Tesseract4Alpha #1010

Error: Illegal Parameter specification! with Tesseract4Alpha #1010

Comments

nachobit commented Jun 28, 2017

Environment

Current Behavior:

Suggested Fix:

Shreeshrii commented Jun 28, 2017

nachobit commented Jun 29, 2017

Shreeshrii commented Jun 29, 2017

Shreeshrii commented Jun 29, 2017

nachobit commented Jun 29, 2017 • edited Loading

Shreeshrii commented Jun 29, 2017 • edited Loading

nachobit commented Jun 29, 2017

Shreeshrii commented Jun 29, 2017

nachobit commented Jun 29, 2017 • edited Loading

amitdo commented Jun 29, 2017

amitdo commented Jun 29, 2017 • edited Loading

amitdo commented Jun 29, 2017

nachobit commented Jun 29, 2017

amitdo commented Jun 29, 2017

amitdo commented Jun 29, 2017

nachobit commented Jun 29, 2017

amitdo commented Jun 29, 2017 • edited Loading

Shreeshrii commented Jun 29, 2017 via email

nguyenq commented Jun 29, 2017 • edited Loading

zdenop commented Jun 29, 2017

banksone commented Aug 6, 2017

AlexanderHugestrand commented Sep 27, 2017 • edited Loading

amitdo commented Sep 27, 2017 • edited Loading

hamduu commented Apr 5, 2018

hamduu commented Apr 5, 2018

Shreeshrii commented Apr 5, 2018

stweil commented Apr 5, 2018 • edited Loading

AlexanderHugestrand commented Apr 5, 2018

AlexanderHugestrand commented Apr 5, 2018

stweil commented Apr 5, 2018

amitdo commented Apr 5, 2018

stweil commented Apr 6, 2018 • edited Loading

Shreeshrii commented Apr 6, 2018

hamduu commented Apr 8, 2018 • edited Loading

suresh443 commented Jul 4, 2018

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007f96274dbac7, pid=4516, tid=0x00007f9699212700

JRE version: OpenJDK Runtime Environment (8.0_171-b11) (build 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11)

Java VM: OpenJDK 64-Bit Server VM (25.171-b11 mixed mode linux-amd64 compressed oops)

Problematic frame:

C [libtesseract.so.3.0.4+0x9dac7] tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int)+0x5e7

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as:

/home/ahextech/suresh/softwares/eclipse/hs_err_pid4516.log

If you would like to submit a bug report, please visit:

http://bugreport.java.com/bugreport/crash.jsp

The crash happened outside the Java Virtual Machine in native code.

See problematic frame for where to report the bug.

My sample code is: Tesseract tesseract = new Tesseract(); TesseractOCRConfig config = new TesseractOCRConfig(); config.setTessdataPath("/home/ahextech/suresh/ehubWorkspace/BITBUCKETCODE/"); config.setLanguage("Eng"); String text = tesseract.doOCR(imageProcessed);

nachobit commented Jun 29, 2017 •

edited

Loading

Shreeshrii commented Jun 29, 2017 •

edited

Loading

nachobit commented Jun 29, 2017 •

edited

Loading

amitdo commented Jun 29, 2017 •

edited

Loading

amitdo commented Jun 29, 2017 •

edited

Loading

nguyenq commented Jun 29, 2017 •

edited

Loading

AlexanderHugestrand commented Sep 27, 2017 •

edited

Loading

amitdo commented Sep 27, 2017 •

edited

Loading

stweil commented Apr 5, 2018 •

edited

Loading

stweil commented Apr 6, 2018 •

edited

Loading

hamduu commented Apr 8, 2018 •

edited

Loading

C [libtesseract.so.3.0.4+0x9dac7] tesseract::Tesseract::recog_all_words(PAGE_RES, ETEXT_DESC, TBOX const, char const, int)+0x5e7

My sample code is:
Tesseract tesseract = new Tesseract();
TesseractOCRConfig config = new TesseractOCRConfig(); config.setTessdataPath("/home/ahextech/suresh/ehubWorkspace/BITBUCKETCODE/");
config.setLanguage("Eng");
String text = tesseract.doOCR(imageProcessed);