Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move source files which are used for training only to src/training #2614

Merged
merged 1 commit into from
Aug 12, 2019

Conversation

stweil
Copy link
Contributor

@stweil stweil commented Aug 12, 2019

They are moved from src/classify and src/lstm to src/training.

This reduces the size of the Tesseract library.

Signed-off-by: Stefan Weil sw@weilnetz.de

@stweil
Copy link
Contributor Author

stweil commented Aug 12, 2019

Together with pull request #2613 this completes the symbol cleanup for the Tesseract library which now only contains symbols needed for the Tesseract executable and for the C API.

The Tesseract library is reduced significantly in size by both pull requests:

3590092	  47504	  15616	3653212	 37be5c	/tmp/libtesseract.so.5.0.0-old
3451063	  45808	  15616	3512487	 3598a7	/tmp/libtesseract.so.5.0.0-new

@stweil stweil mentioned this pull request Aug 12, 2019
They are moved from src/classify and src/lstm to src/training.

This reduces the size of the Tesseract library.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
@stweil
Copy link
Contributor Author

stweil commented Aug 12, 2019

Updated PR to fix CMakeLists.txt and src/training/CMakeLists.txt, so CI should pass now.

sampleiterator.cpp
sampleiterator.h
trainingsampleset.cpp
trainingsampleset.h
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@egorpugin, please review my changes for CMake. I don't know why there are three libraries for training and added the moved code to one of them:

src/training/libunicharset_training.a
src/training/libtessopt.a
src/training/libcommon_training.a

Wouldn't it be simpler to have a single library for training?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to have a single library for training?

Only in case when files are used in >1 target.
Otherwise personally I prefer to keep files near users (when use count ==1).

@egorpugin egorpugin merged commit 73f7135 into tesseract-ocr:master Aug 12, 2019
@egorpugin
Copy link
Contributor

egorpugin commented Aug 12, 2019

D:/dev/tesseract/src/classify/trainingsample.cpp(27): fatal error C1083: Cannot open include file: 'intfeaturemap.h': No such file or directory

Main source file depends on training. Is it ok?

@stweil
Copy link
Contributor Author

stweil commented Aug 12, 2019

@egorpugin, yes, some files in src/classify require include files which are now in src/training. I just notices that sw.cpp also needs some changes. Should I try to fix that?

@stweil stweil deleted the training branch August 12, 2019 16:47
@egorpugin
Copy link
Contributor

egorpugin commented Aug 12, 2019

Yes, please. Now it is very dummy implementation copied from cmake and not makefiles (which seems is in better shape).
Are you using win or linux?
Since I got first working sw builds on linux, I can try to build tess there for the first time, so it will be usable there.

@egorpugin
Copy link
Contributor

I've removed moved files from the installation, but that single error is still there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants