forked from tesseract-ocr/langdata
-
Notifications
You must be signed in to change notification settings - Fork 0
/
lat.config
25 lines (19 loc) · 857 Bytes
/
lat.config
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Tesseract Latin training - http://ryanfb.github.io/latinocr/
# Build from the https://github.com/ryanfb/latinocr-lat/ repository
# commit: b6885bca0fa755fbed2bbb36d3f5cebf866a15e0
# New segsearch produces better results
enable_new_segsearch 1
# Increase penalty for incorrect punctuation, important as
# diacritics can easily be misrecognised as punctuation
language_model_penalty_punc 0.35
# Increase minimum linesize. This minimises cases of accents
# being incorrectly recognised as separate lines.
textord_min_linesize 2.25
# Also helps to ensure that accents aren't incorrectly recognised
# as separate lines
textord_occupancy_threshold 0.7
# Helps to ensure rows don't overlap
textord_excess_blobsize 0.6
# Disable rare, variant, macron characters
# (can be enabled with tessedit_char_unblacklist)
tessedit_char_blacklist ĀāĒēĪīŌōŪū