Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable-legacy and lstm.train #2665

Closed
Shreeshrii opened this issue Sep 22, 2019 · 7 comments · Fixed by #2727
Closed

disable-legacy and lstm.train #2665

Shreeshrii opened this issue Sep 22, 2019 · 7 comments · Fixed by #2727

Comments

@Shreeshrii
Copy link
Collaborator

tesseract -v
tesseract 5.0.0-alpha-431-g39a63
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.4.4 : libopenjp2 2.3.0

I built tesseract with disable-legacy yesterday. Creation of lstmf files using lstm-train is now giving following warnings.

Page 1
data/eng-ground-truth/1538.eng.FreeSerif.exp0.tif
Warning: Parameter not found: disable_character_fragments
Warning: Parameter not found: il1_adaption_test
Tesseract Open Source OCR Engine v5.0.0-alpha-431-g39a63 with Leptonica
Page 1
data/eng-ground-truth/538.eng.FreeSerif.exp0.tif
Warning: Parameter not found: disable_character_fragments
Warning: Parameter not found: il1_adaption_test
Tesseract Open Source OCR Engine v5.0.0-alpha-431-g39a63 with Leptonica
Page 1
data/eng-ground-truth/2411.eng.FreeSerif.exp0.tif
Warning: Parameter not found: disable_character_fragments
Warning: Parameter not found: il1_adaption_test
Tesseract Open Source OCR Engine v5.0.0-alpha-431-g39a63 with Leptonica
Page 1
data/eng-ground-truth/274.eng.FreeSerif.exp0.tif
Warning: Parameter not found: disable_character_fragments
Warning: Parameter not found: il1_adaption_test
Tesseract Open Source OCR Engine v5.0.0-alpha-431-g39a63 with Leptonica
@stweil
Copy link
Contributor

stweil commented Sep 22, 2019

Both disable_character_fragments and il1_adaption_test are part of tessdata/configs/lstm.train, maybe because that file was created as a copy of tessdata/configs/box.train. If these parameters are unused, they should be removed from lstm.train. box.train could be removed for LSTM only builds, too.

@amitdo
Copy link
Collaborator

amitdo commented Oct 1, 2019

Hi Shree,

Do you want to send a PR that removes these 2 parameters from lstm.train?

@Shreeshrii
Copy link
Collaborator Author

@amitdo I do not know whether these configs are used or not in lstmtraining.

The training run that I did with --disable-legacy seemed to have some problems. Each lstmf file indicated that it had a large number of character, in hundreds of thousands.

So, I have gone back to regular builds without it.

@amitdo
Copy link
Collaborator

amitdo commented Oct 1, 2019

They are not used with lstmtraining

@amitdo
Copy link
Collaborator

amitdo commented Oct 6, 2019

The training run that I did with --disable-legacy seemed to have some problems. Each lstmf file indicated that it had a large number of character, in hundreds of thousands.

How did you train?
Fine tune/from scratch?
What was the error message?
Which language?
Does the problem appear with any font?

@Shreeshrii
Copy link
Collaborator Author

@amitdo I had deleted those files on error. I will rebuild and try to reproduce the problem and add details.

@Shreeshrii
Copy link
Collaborator Author

@amitdo I have created a PR as suggested by you.

#2727

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants