Skip to content

This repository contains an implementation of the "Representation Learning for Information Extraction from Form-like Documents" paper.

Notifications You must be signed in to change notification settings

tulasiram58827/Information-Extraction-From-Documents

Repository files navigation

Note : We are still in the process of implementing. Use it at your own risk.

This repository contains an implementation of the Representation Learning for Information Extraction From Form Like Documents paper.

Project setup

python -m virtualenv -p python3.8 venv
source venv/bin/activate
pip install -e .
gdown --id 10r9y17wg8Elo-3Zi61xA_8QDaKix8giN -O data.tar.xz
tar -xf data.tar.xz
gdown --id 16FzDxLOFxNmYi3JNXaYCmnZvR4x5T54I -O ocr_modified_files.tar.xz
tar -xf ocr_modified_files.tar.xz && mv ocr_modified_files data/

python data_processing.py

At this point your data dir should have box, img, key, new_processed_files, and ocr_modified_files

If you are interested about the paper or implementation details you can this report published in Weights and Biases.

About

This repository contains an implementation of the "Representation Learning for Information Extraction from Form-like Documents" paper.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published