Skip to content

This a Python notebook to detect a table and extract it from a scanned pdf or image

Notifications You must be signed in to change notification settings

codgas/Table-with-broken-lines-OCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Table-with-broken-lines-OCR

This a Python notebook to detect a table and extract it from a scanned pdf or image

For scanned documents, especially old ones, some table lines can be broken, and that can be very hard for typical table OCR libraries to read! in this repository I deal with this issue, by first detecting the table using layout parser library, then preform OCR on the whole document , detect the contours and cells of the table, and finally assign the texts using their coordinates to their respective cells. I found that this method works much better then applying OCR to each detected contour cell

About

This a Python notebook to detect a table and extract it from a scanned pdf or image

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published