Building an Optical Character Recognizer in Python

Anish Gupta
6 min readAug 23, 2020

In Optical Character Recognition, we try to recognize the text written in the image. Here, we will be using Tesseract — an open-source library for OCR. We will also use the OpenCV library to preprocess the image before passing the image to the OCR library.

Prerequisites:

  1. Install pytesseract
  2. Install pillow: image processing library

OpenCV

Some popular application of OpenCV are:

1. Preprocessing image

2. Detecting lines, circles, other shapes, etc.

3. Edge detection

4. Image segmentation

5. Object tracking

OpenCV vs Deep Learning libraries(Keras, TensorFlow, PyTorch, etc): In deep learning, the kernels are learned during backpropagation. While in OpenCV, we define a particular kernel for a particular task. There is no backpropagation, so the value of the kernel is not changed.

For example, in edge detection: we define a specific value of kernel for extracting the edges.

Figure 1: Edge detection using kernel

OCR using pytesseract

Step 1: import pytesseract and required libraries

Step 2: read the image & pass it to pytesseract

Function to plot image with a better aspect ratio:

Figure 2: Image with text

Extracting text using OCR

Figure 3: pytesseract output on image

Was it easy? Let’s take a difficult image and try to extract text

Figure 4: Image with unclear text
Figure 5: pytesseract output on unclear text image

Tesseract is an excellent open-source library. It performs exceptionally well on images that have clearly visible text. But, if the image itself is not that clear, the OCR can fail miserably. This is where OpenCV and computer vision practices come into the picture. The objective here is to sharpen the boundaries of the characters present in the image so that OCR can recognize them with less effort.

Now that we are not able to extract text from the image, let’s try to improve the result using some pre-processing

Preprocessing Techniques

You can improve the result by some of the following pre-processing techniques:
1. Increasing resolution: Increase the resolution of the image
2. Deskewing: Deskew the image. It also makes it easier to do more processing.
3. Blurring: It is useful for removing noise.
3. Convert to Black and White: Convert the image into black and white after deskewing and resizing. It will produce consistent character size and thickness.
4. Remove Noise: Remove noise from black and white image. Perform operations like morphological transformation, contours, etc. to remove the noise.
5. Train Tesseract on the Font

Converting to grayscale
Generally, we convert any color image to grayscale for better preprocessing.

Figure 6: Grayscale converted image

Smoothening using blur
There is no need to do deskewing as the fonts are almost straight. The resolution of the image is also good.

1. Gaussian Noise:
Gaussian Noise is modeled by adding random values to an image. Gaussian filter helps in removing Gaussian noise from the image.

2. Salt and Pepper Noise:
An image containing salt-and-pepper noise will have dark pixels in bright regions and bright pixels in dark regions. Median Filter helps in removing salt-and-pepper noise.

Let’s blur the image for smoothing. https://docs.opencv.org/3.1.0/d4/d13/tutorial_py_filtering.html

1. Gaussian blur
2. Median blur

Figure 7: Gaussian blurred image
Figure 8: Median blurred image

Here, there is not much difference between gaussian blur and median blur.

Let’s find the text output after blurring:

Figure 9: Result after blurring

You can see that the results have improved but still, the results are not that good. Let’s do further preprocessing.

Thresholding the image

The image after smoothing gets blurred. Generally, for OCR to work better, we want sharp borders between characters with high contrast. Binarization makes the image sharp. Also, it reduces the size of the image, which helps in preprocessing in OCR. Let’s see some thresholding techniques:

1. Simple Thresholding
2. Adaptive Thresholding
3. Otsu’s Binarization

Simple Thresholding

Figure 10: Result after applying simple thresholding

Adaptive Thresholding

Figure 11: Result after applying adaptive thresholding

Otsu’s Thresholding

Figure 12: Result after applying Otsu’s thresholding

Morphological transformations
Morphological transformations are normally performed on binary images. It needs two inputs, one is the image, the other is structuring element or kernel which decides the nature of the operation. Two basic morphological operators are Erosion and Dilation.

Figure 13:Structuring elements
Figure 14: Dilation and erosion on thresholded image
Figure 15: Result after morphological transformations

If you perform more dilation and erosion, the characters will also deteriorate. As you can see, because of dots, the result has deteriorated instead of improving.

Contours

Contour is an outline representing or bounding the shape or form of something. It is a curve joining all the continuous points (along the boundary), having the same color or intensity. Here, we will identify the shape of the dots using contours and remove it. Once we find all the shapes/contours, we will identify the dots which will have shapes having areas less than a certain number, aspect ratio, etc.

Reverse the image
Generally, the object that we want to identify is in white and background is in the black when using contours. So, reversing the image to convert it in the required format.

Figure 16: Result after reversing the image

Function to draw and show contours:

Finding all the possible contours in the image and showing the contours.

Figure 17: Contour detection result

You can see that 1790 contours have been identified and we are plotting just the 9the contour which is a dot. Now, we have to remove all such dots.

Blob

Defining Blob class to store the properties of the contours such as the centre of contour, aspect ratio, diagonal size, etc.

Function to plot the Blob

Finding the dots

Figure 18: Finding dots

Image after removing the dots

Figure 19: Image after removing dots
Figure 20: Reversed clean image
Figure 21: Final result

You can see that results have greatly improved. But the results are not great as the text image itself had problems.

Another method

Let’s use bilateral filter in the first stage itself and see the results. Sometimes the filters do not only dissolve the noise but also smooth away the edges. To avoid this (at certain extent at least), we can use a bilateral filter.

Figure 21: Bilateral filter on image
Figure 22: Final result

--

--