Skip to content

Commit

Permalink
Fixed a typo in the documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
python3-dev committed Jan 4, 2022
1 parent 98d4fba commit 6793aeb
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ As global trends are shifting towards data-driven industries, the demand for aut
<img align="left" width="390" height="" src="gifs/fully_bordered.gif">
For TSR on fully bordered tables, we use the erosion and dilation operation to extract the row-column grid cell image without any text or characters. The erosion kernels are generally thin vertical and horizontal strips that are longer than the overall font size but shorter than the size of the smallest grid cell and, in particular, must not be wider than the smallest table border width. Using these kernel size constraints results in the erosion operation removing all fonts and characters from the table while preserving the table borders. In order to restore the original line shape, the algorithm applies the dilation operation using the same kernel size on each of the two eroded images, producing an image with vertical and a second with horizontal lines. Finally, the algorithm combines both images by using a bit-wise ```or``` operation and re-inverting the pixel values to obtain a raster cell image. We then use the contours function on the grid-cell image to extract the bounding-boxes for every single grid cell.

## Multi-Type-TD-TSR on Unboardered Tables
## Multi-Type-TD-TSR on Unbordered Tables
<img align="right" width="390" height="" src="gifs/unboardered.gif">
The TSR algorithm for unbordered tables works similarly to the one for bordered tables but utilizes the erosion operation in a different way. The erosion kernel is in general a thin strip with the difference that the horizontal size of the horizontal kernel includes the full image width and the vertical size of the vertical kernel the full image height. The algorithm slides both kernels independently over the whole image from left to right for the vertical kernel, and from top to bottom for the horizontal kernel. During this process it is looking for empty rows and columns that do not contain any characters or font. The resulting images are inverted and combined by a bit-wise ```and``` operation producing the final output. The output is a grid-cell image similar to the one from TSR for bordered tables, where the overlapping areas of the two resulting images represent the bounding-boxes for every single grid cell.

Expand Down

0 comments on commit 6793aeb

Please sign in to comment.