How to Use Tesseract OCR as an Assist for Barcode Scan
When scanning barcodes, the recognition rate is affected by image quality. If a barcode image is severely damaged, the barcode algorithm may fail to work. Fortunately, most of the linear barcodes (1D barcodes) are printed with corresponding texts. OCR (optical character recognition) could be a complement to the barcode algorithm in such a scenario.
Some barcode readers are capable of decoding texts as well as barcodes. For example, Dynamsoft’s Barcode Reader SDK supports recognition of accompanying text on top of barcode reading.
In this article, however, I will share how to use Tesseract OCR to boost the barcode scan result of Dynamsoft Barcode Reader.
Getting Started with Tesseract OCR on Windows
Install the pre-built binary package of Tesseract for Windows.
Here is the image for the test.
Add the path C:\Program Files\Tesseract-OCR to system environment, and then run the command via cmd.exe:
tesseract codabar.jpg out
The result contains English and digital characters. The expected result should be digits only. We can optimize the command to output digital characters as follows:
tesseract codabar.jpg out digits
The result looks better.
Reading Barcode and Recognizing Accompanying Text in Python
OCR is ready, what about barcode detection? We can use Python to quickly create a simple program.
Install Dynamsoft Barcode Reader and PyTesseract:
pip install dbr pytesseract
Get a free trial license, with which we can read barcodes using a few lines of code:
from dbr import DynamsoftBarcodeReader
dbr = DynamsoftBarcodeReader()
dbr.initLicense('DLS2eyJoYW5kc2hha2VDb2RlIjoiMjAwMDAxLTE2NDk4Mjk3OTI2MzUiLCJvcmdhbml6YXRpb25JRCI6IjIwMDAwMSIsInNlc3Npb25QYXNzd29yZCI6IndTcGR6Vm05WDJrcEQ5YUoifQ==')
try:
results = dbr.DecodeFile(image)
textResults = results["TextResults"]
resultsLength = len(textResults)
print("count: " + str(resultsLength))
if resultsLength != 0:
for textResult in textResults:
print('Barcode Type: %s' % (textResult["BarcodeFormatString"]))
print('Barcode Result: %s' % (textResult["BarcodeText"]))
else :
print("No barcode detected")
except Exception as err:
print(err)
Recognize text using pytesseract:
import pytesseract
custom_oem_psm_config = r'digits'
result = pytesseract.image_to_string(Image.open(image), config=custom_oem_psm_config)
print('OCR Result: %s' % (result))
The results of barcode recognition and OCR are the same. It looks perfect.
Now, do some changes to the image and save it as a damaged.png file:
Rerun the app:
In this scenario, the barcode SDK failed to work, but OCR can work well. It shows the value of OCR as the assist for scanning barcodes.
In my testing case, the OCR result is 100% correct. However, most of the time, OCR cannot output perfect results due to image quality. It cannot replace the barcode algorithm for 1D barcode scanning.
Source Code
https://gist.github.com/yushulx/32566858fc799b7d2e59899f0712c735