Added Tuvan language #22

vigneshv59 · 2015-12-30T21:53:07Z

This PR adds training data for the Tuvan language. For some reason, Kyrgyz performs better on Tuvan text than the current version of Tuvan I have, but I'm not quite sure why.

jonorthwash · 2015-12-30T22:16:00Z

It should be noted that the Tuvan and Kyrgyz alphabet are essentially identical, but the two languages don't have a high number of identical word forms. Character bigrams from Kyrgyz would help Tuvan (or generated character bigrams for Tuvan would be even better), but we couldn't figure out how to make use of such.

jonorthwash · 2018-02-21T15:47:21Z

Are there any thoughts on integrating bigrams? I'd be happy to do some help with this.

vigneshv59 added 2 commits December 29, 2015 23:36

Added tuvan as a language.

314221e

Updated training text.

85400a5

zdenop merged commit 4ace513 into tesseract-ocr:master Feb 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Tuvan language #22

Added Tuvan language #22

vigneshv59 commented Dec 30, 2015

jonorthwash commented Dec 30, 2015

jonorthwash commented Feb 21, 2018

Added Tuvan language #22

Added Tuvan language #22

Conversation

vigneshv59 commented Dec 30, 2015

jonorthwash commented Dec 30, 2015

jonorthwash commented Feb 21, 2018