Skip to content

zhangshiguang/tessdoc

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tesseract User Manual

Introduction

Tesseract is an open source text recognition (OCR) Engine, available under the Apache 2.0 license..

  • The current official release is 4.1.1.
  • The master branch on Github can be used by those who want the latest 5.0.0.Alpha code for LSTM (--oem 1) and legacy (--oem 0) Tesseract.
  • The 3.05 branch on GitHub can be used by those who want the bug fixes for 3.05.02 release for legacy Tesseract.

Tesseract can be used directly via command line, or (for programmers) using an API to extract printed text from images. It supports a wide variety of languages. Tesseract doesn't have a built-in GUI, but there are several available from the 3rdParty page.

Tesseract can be used in your own project, under the terms of the Apache License 2.0. It has a fully featured API, and can be compiled for a variety of targets including Android and the iPhone. See the 3rdParty page for a sample of what has been done with it.

If you have a question, first read the documentation, particularly the FAQ to see if your problem is addressed there. If not, search the Tesseract user forum or the Tesseract developer forum, and if you still can't find what you need, please ask us there.

Also, it is free software, so if you want to pitch in and help, please do! If you find a bug and fix it yourself, the best thing to do is to attach the patch to your bug report in the Issues List.

This user manual is for Tesseract versions 4.x.x and 5.0.0.Alpha. For versions 3.05.02 and older, see the documentation for old versions.

Releases and Changelog

4.0 with LSTM

Tesseract 4.0x+ added a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 100+ languages and 35+ scripts is available in tessdata, tessdata_best, tessdata_fast repositories.

5.0.0.Alpha

Tesseract 5.0.0.Alpha source code is available in the 'master' branch of the repository. The master branch is using 5.0.0 versioning because code modernization caused API compatibility issues with 4.x release.

Compiling and Installation

Language Traineddata Files

Usage

Technical Information

Training

Testing

External Projects

User Manual for Old Versions

Releases

No releases published

Packages

No packages published

Languages

  • HTML 100.0%