PDF Accounts Extractor

Utliities for extracting financial data from accounts PDF.

The input PDFs must have readable text - i.e. the pages must either be from a native PDF or one that has been run through an optical character recognition process to extract the text.

Installation and requirements

The only requirement is pdfplumber which can be installed using:

pip install pdfplumber
# or
pip install -r requirements.txt

Jupyter Notebook and Pandas are required to use the working notebook used to develop the tool.

Using the tool

At the moment the tool consists of a script extract_financial_lines.py. This attempts to find lines that match a pattern of <text> <number> <number> within a PDF document. This pattern is intended to be consistent with a representation of financial data in accounts of the form <item description> <current year value> <previous year value> as found in a balance sheet, for example. An example item that might be extracted is:

Trade Creditors 57,054 62,853

The tool can be run from the command line against a PDF file:

python extract_financial_lines.py test_accounts.pdf

You can also use the script in another script.

from extract_financial_lines import get_finances
import pdfplumber

# the function requires a pdfplumber.PDF object
pdf = pdfplumber.open("test_accounts.pdf")
rows = get_finances(pdf)
for r in rows:
    print(r)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
2b-pdf-plumber.ipynb		2b-pdf-plumber.ipynb
LICENSE		LICENSE
extract_financial_lines.py		extract_financial_lines.py
readme.md		readme.md
requirements.txt		requirements.txt
test_accounts.pdf		test_accounts.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Accounts Extractor

Installation and requirements

Using the tool

About

Releases

Packages

Languages

License

drkane/pdf-accounts

Folders and files

Latest commit

History

Repository files navigation

PDF Accounts Extractor

Installation and requirements

Using the tool

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages