How to Use Tesseract OCR as an Assist for Barcode Scan

Jan 07, 2020

When scanning barcodes, the recognition rate is affected by image quality. If a barcode image is severely damaged, the barcode algorithm may fail to work. Fortunately, most of the linear barcodes (1D barcodes) are printed with corresponding texts. OCR (optical character recognition) could be a complement to the barcode algorithm in such a scenario.

Some barcode readers are capable of decoding texts as well as barcodes. For example, Dynamsoft’s Barcode Reader SDK supports recognition of accompanying text on top of barcode reading.

Download 30-day Free Trial >

In this article, however, I will share how to use Tesseract OCR to boost the barcode scan result of Dynamsoft Barcode Reader.

Getting Started with Tesseract OCR on Windows

Install the pre-built binary package of Tesseract for Windows.

Here is the image for the test.

codabar

Add the path C:\Program Files\Tesseract-OCR to system environment, and then run the command via cmd.exe:

tesseract codabar.jpg out

tesseract ocr

The result contains English and digital characters. The expected result should be digits only. We can optimize the command to output digital characters as follows:

tesseract codabar.jpg out digits

tesseract ocr digits

The result looks better.

Reading Barcode and Recognizing Accompanying Text in Python

OCR is ready, what about barcode detection? We can use Python to quickly create a simple program.

Install Dynamsoft Barcode Reader and PyTesseract:

pip install dbr pytesseract

Get a free trial license, with which we can read barcodes using a few lines of code:

from dbr import DynamsoftBarcodeReader
dbr = DynamsoftBarcodeReader()
dbr.initLicense('LICENSE-KEY')
    try:
        results = dbr.DecodeFile(image)
        textResults = results["TextResults"]
        resultsLength = len(textResults)
        print("count: " + str(resultsLength))
        if resultsLength != 0:
            for textResult in textResults:
                print('Barcode Type: %s' % (textResult["BarcodeFormatString"]))
                print('Barcode Result: %s' % (textResult["BarcodeText"]))
        else :
            print("No barcode detected")
    except Exception as err:
        print(err)

Recognize text using pytesseract:

import pytesseract

custom_oem_psm_config = r'digits'

result = pytesseract.image_to_string(Image.open(image), config=custom_oem_psm_config)
print('OCR Result:     %s' % (result))

pytesseract

The results of barcode recognition and OCR are the same. It looks perfect.

Now, do some changes to the image and save it as a damaged.png file:

damaged codabar

Rerun the app:

python ocr damaged barcode

In this scenario, the barcode SDK failed to work, but OCR can work well. It shows the value of OCR as the assist for scanning barcodes.

In my testing case, the OCR result is 100% correct. However, most of the time, OCR cannot output perfect results due to image quality. It cannot replace the barcode algorithm for 1D barcode scanning.

Source Code

https://gist.github.com/yushulx/32566858fc799b7d2e59899f0712c735