Skip to content

Commit

Permalink
Updated ImproveQuality (markdown)
Browse files Browse the repository at this point in the history
  • Loading branch information
thadguidry committed Jan 26, 2020
1 parent 69fbb9f commit fc4ffee
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions ImproveQuality.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,19 @@ If you are not able to fix this by providing a better input image, you can try a

Noise is random variation of brightness or colour in an image, that can make the text of the image more difficult to read. Certain types of noise cannot be removed by Tesseract in the binarisation step, which can cause accuracy rates to drop.

### Dilation and Erosion

Bold characters or Thin characters (especially those with [Serifs](https://en.wikipedia.org/wiki/Serif)) may impact the recognition of details and reduce recognition accuracy. Many image processing programs allow Propagation either negative or positive values to allow [Dialation and Erosion](http://www.mif.vu.lt/atpazinimas/dip/FIP/fip-Morpholo.html#Heading96) of edges of characters against a common background.

Heavy ink bleeding from historical documents can be compensated for by using an Erosion technique.

For example, GIMP's Value Propagate filter can create Erosion of extra bold historical fonts by reducing the Lower threshold value)

Original: (Erosion_original.jpg)


Erosion applied: (Erosion_applied.jpg)


### Rotation / Deskewing

Expand Down

0 comments on commit fc4ffee

Please sign in to comment.