Artificial intelligence is reinventing traditional data and image processing capabilities for businesses to extract valuable insights. As an experiential provider of AI development services, Oodles AI discusses the fundamentals of traditional Optical Character Recognition systems and how AI tools and applications are improving OCR accuracy.
OCR stands for Optical Character Recognition. An OCR engine is the software that is used to extract text from scanned images of physical documents. There are multiple open-source engines used to perform OCR such as Cloud vision and tesseract. Tesseract is the most accurate and most commonly used open-source OCR engine.
Emerging providers of OCR systems are proactively using AI technologies such as computer vision services to optimize data extractions tasks. We, at Oodles, harness computer vision and natural language processing technologies to build dynamic OR systems for identity verification and healthcare services.
Most OCR engine provides 96% - 98% accuracy at the page level. That means in a page of 100 words 96 – 98 words are accurate. OCR accuracy is measure by taking the output text of OCR results of an image and comparing it to the original image text. Sometimes OCR provides poor results because of the image quality is bad or image resolution is low
Get perspective transform of an image
Using get perspective and warp perspective in Opencv library and python, we can easily change the geometric transformation of an image by detecting its edges using a canny edge detection feature. Here the transformation image is shown below-
Image quality and format is good( prefer tiff and png format)
If the image source quality is good then we get good OCR output. We take care of that the image is not hazy, it is important to use the cleanest image source. Accuracy also depends on image format if the image format is jpeg then sometimes it gives poor results. But if the image format is png or tiff or jpg than it improves OCR accuracy.
Cropping of an image
When we try to deploy OCR systems for an image that contains text in some area, then cropping is required. We crop only that part of the image which contains text, it increases the OCR accuracy of extracted data in compare of without cropped image.
Binarization is used to convert colored images (RGB) to a black and white image. Use features of OpenCV library like Adaptive Thresholding, we can convert image to white and black. Most Ocr engine uses binarization internally. Here we see the binarization of an image-
Increase Contrast and Sharpness of the image
Increase the contrast and density of the image before practicing OCR. By increasing the contrast between the text/image and its background, it gives out more accuracy in the output. If the Sharpness of an image is good it gives more clarity in the text.
Increase Scanning Resolution
The Standard size of the image is scaled to at least 300 (DPI) Dots Per Inch. DPI lower than 200 will give unclear results while keeping the DPI above 600 will increase the size of the output file without much quality.
Rotate pages to the correct orientation(Deskew)
An image that is not straight is called a skewed image. De-skewing the image means to bring an image to correct orientation by rotating it. If the image is skewed to any side we have to do the following steps:
1. Detect the text block of image.
2. Calculate the angle of rotation.
3. Rotate the image to correct skew.
Below is a demonstration of a deskewed image on the right side
Open-source Tools and Libraries for Image Processing