Skip to content

soduco/processor-ocr-pero

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

processor-ocr-pero

CLI processor for the SoDUCo project. OCR processing. Takes a series of JSON files in a given directory, and produces an updated version of these files in another directory. It will look for specific regions and run text line detection and recognition on them. Some extra options enable to produce other output formats: PAGE XML and ALTO XML.

Install and tests

pipenv install
pipenv run python -m pero-cli -i ./tests/input  -o ./tests/output -f json -f image

Usage

usage: __main__.py [-h] -i INPUT_DIR -o OUTPUT_DIR -f {json,alto,page,image} [--pero-config-file PERO_CONFIG_FILE]

PERO OCR command line argument

options:
  -h, --help            show this help message and exit
  -i INPUT_DIR, --input-dir INPUT_DIR
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR
  -f {json,alto,page,image}, --export-format {json,alto,page,image}
  --pero-config-file PERO_CONFIG_FILE

About

CLI tool to process directories using PERO OCR

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published