Skip to content

Commit

Permalink
Update "lat" langdata to files generated from ryanfb/latinocr-lat@b68…
Browse files Browse the repository at this point in the history
  • Loading branch information
ryanfb committed Jan 13, 2016
1 parent 05ec588 commit 68840af
Show file tree
Hide file tree
Showing 11 changed files with 867,572 additions and 499,726 deletions.
4 changes: 0 additions & 4 deletions lat/desired_characters

This file was deleted.

25 changes: 25 additions & 0 deletions lat/lat.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Tesseract Latin training - http://ryanfb.github.io/latinocr/
# Build from the https://github.com/ryanfb/latinocr-lat/ repository
# commit: b6885bca0fa755fbed2bbb36d3f5cebf866a15e0

# New segsearch produces better results
enable_new_segsearch 1

# Increase penalty for incorrect punctuation, important as
# diacritics can easily be misrecognised as punctuation
language_model_penalty_punc 0.35

# Increase minimum linesize. This minimises cases of accents
# being incorrectly recognised as separate lines.
textord_min_linesize 2.25

# Also helps to ensure that accents aren't incorrectly recognised
# as separate lines
textord_occupancy_threshold 0.7

# Helps to ensure rows don't overlap
textord_excess_blobsize 0.6

# Disable rare, variant, macron characters
# (can be enabled with tessedit_char_unblacklist)
tessedit_char_blacklist ĀāĒēĪīŌōŪū
Loading

0 comments on commit 68840af

Please sign in to comment.