Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault with textord_blockndoc_fixed=1 #4039

Closed
jbarth-ubhd opened this issue Mar 24, 2023 · 3 comments
Closed

Segmentation fault with textord_blockndoc_fixed=1 #4039

jbarth-ubhd opened this issue Mar 24, 2023 · 3 comments

Comments

@jbarth-ubhd
Copy link

jbarth-ubhd commented Mar 24, 2023

Current Behavior

tesseract 00178917.tif 00178917-b -l deu+eng+fra+ita+script/Latin -c invert_threshold=0 --dpi 230 alnum monofriendly
Detected 9 diacritics
Segmentation fault (core dumped)

> cat alnum
tessedit_char_whitelist zyxwvutsrqponmlkjihgfedcba][ZYXWVUTSRQPONMLKJIHGFEDCBA?>=<;:9876543210/.-,+*)('&"! §°ÀÂÄÆÇÈÉÊËÎÏÔÖÙÛÜßàâäæçèéêëìíîïòóôöùúûüÿČčŒœŸŽžǍǎ

> cat monofriendly | grep -v ^#
textord_blockndoc_fixed	1
textord_words_def_fixed	0.024

no segfault when not setting textord_blockndoc_fixed

segfault-00178917.zip

Expected Behavior

nice output file

Suggested Fix

No response

tesseract -v

tesseract 5.3.0
leptonica-1.79.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.5.0
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4
Found libcurl/7.68.0 NSS/3.49.1 zlib/1.2.11 brotli/1.0.9 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3

Operating System

Ubuntu 20.04 Focal

Other Operating System

No response

uname -a

Linux pers16 5.4.0-144-generic #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Compiler

gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

CPU

Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz

Virtualization / Containers

none

Other Information

more sample images are available on request

@jbarth-ubhd jbarth-ubhd changed the title Segmentation fault Segmentation fault with textord_blockndoc_fixed=1 Mar 24, 2023
@stweil
Copy link
Contributor

stweil commented Mar 24, 2023

It is sufficient to run tesseract 00178917.tif - -c textord_blockndoc_fixed=1, and it also crashs with test/testing/8071_093.3B.tif.

@stweil stweil added the bug label Mar 24, 2023
@stweil
Copy link
Contributor

stweil commented Mar 24, 2023

Crash stack:

(gdb) i s
#0  tesseract::ERRCODE::error (this=0x555555b185b0 <tesseract::NULL_DATA>, caller=0x555555959fc4 "ELIST2_ITERATOR::data", action=tesseract::ABORT, format=0x0) at ../../../src/ccutil/errcode.cpp:87
#1  0x000055555575bdc8 in tesseract::ERRCODE::error (this=0x555555b185b0 <tesseract::NULL_DATA>, caller=0x555555959fc4 "ELIST2_ITERATOR::data", action=tesseract::ABORT) at ../../../src/ccutil/errcode.cpp:99
#2  0x00005555555b9039 in tesseract::ELIST2_ITERATOR::data (this=0x7fffffffcb60) at ../../../src/ccutil/elst2.h:194
#3  0x00005555556d1ed4 in tesseract::X_ITER<tesseract::ELIST2_ITERATOR, tesseract::TO_ROW>::data (this=0x7fffffffcb60) at ../../../src/ccutil/list.h:29
#4  0x00005555557e85bc in tesseract::try_doc_fixed (page_tr=..., port_blocks=0x7fffffffcfb8, gradient=-0.00894454401) at ../../../src/textord/topitch.cpp:409

The crash also occurs with old versions of Tesseract (tested with 4.0.0 and 5.0.0-alpha-822-gdea08).

@stweil
Copy link
Contributor

stweil commented Mar 24, 2023

This issue is fixed by commit 1569e50. Thank you @jbarth-ubhd for the report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants