Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend function BoxFileName to handle more common image names #2686

Merged
merged 1 commit into from
Oct 5, 2019

Conversation

stweil
Copy link
Contributor

@stweil stweil commented Oct 2, 2019

The function derives the file name for the .box file from an image name.

For training from existing line images, it is useful to directly support
the image names which are commonly used.

While generated images for Tesseract training typically use the name
pattern NAME.tif, other ground truth sets use NAME.bin.png for binarized
or NAME.nrm.png for grayscale images.

BoxFileName is also now a local function as it is only used locally.

Signed-off-by: Stefan Weil sw@weilnetz.de

if (last == ".bin.png" || last == ".nrm.png") {
box_filename.resize(length - 8);
} else {
const char* lastdot = strrchr(box_filename.c_str(), '.');
Copy link
Contributor

@egorpugin egorpugin Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use str.rfind('.')?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. str.find_last_of('.') might even be better. I'll update the PR tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about find_last_of().

The function derives the file name for the .box file from an image name.

For training from existing line images, it is useful to directly support
the image names which are commonly used.

While generated images for Tesseract training typically use the name
pattern NAME.tif, other ground truth sets use NAME.bin.png for binarized
or NAME.nrm.png for grayscale images.

BoxFileName is also now a local function as it is only used locally.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
@stweil
Copy link
Contributor Author

stweil commented Oct 5, 2019

I updated and rebased the pull request and think that it is ready for merging now.

@zdenop zdenop merged commit 6d171b8 into tesseract-ocr:master Oct 5, 2019
@zdenop
Copy link
Contributor

zdenop commented Oct 5, 2019

thanks

@stweil stweil deleted the boxfilename branch October 5, 2019 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants