Recommended Scan Settings for the Best OCR Accuracy

Jul 15, 2019

Business productivity decreases when paper starts stacking up in the office, making it impossible to have an office workspace without a document scanner. Millions of paper documents are scanned daily to find, store, verify, and share information. This digital transit of documents requires fast and accurate OCR functionality working at optimal scanning settings. So, what are the best-recommended OCR scanning settings? Let’s find out here.

Resolution

Since document image quality plays a crucial role in achieving higher OCR accuracy, image resolution becomes a controlling factor for image quality. Image resolution is measured in dots per inch or DPI. For font sizes above 10 pts or 3.528, 300 DPI is recommended. A higher DPI, say 400 DPI, is recommended for smaller font sizes. The key point to note is that lower resolution produces lower-quality images, which might affect text recognition. Similarly, higher resolutions will produce bigger and clearer images, however, it will take more time for image processing.

Font Color

When scanning, one can choose among three color modes: black and white, grayscale, and colored. However, the recommended color mode to yield optimal OCR accuracy is grayscale. Black-and-white would also work for most text documents with clear font. But, when the font is small, and the image quality is not so good, B/W might undermine OCR recognition. On the other hand, grayscale will keep significantly more details than B/W and would be a better option. If your document contains pictures and you need to save the colors, choose to scan in color mode.

File Type and Compression

Among the two types of image compression, lossy and lossless, the latter is recommended to go for better OCR recognition. As with lossy file format, some data is discarded, reducing the overall amount of data, it is not recommended for important documents. Lossy file formats like JPEG are small, making it easier to store many documents. However, in lossless compression, files are reduced without loss of data. PDF is one such loose standard that allows lossless compression of text documents.

Other document types to save scanned images in uncompressed TIFF or PNG format. These allow for better future processing.

Brightness

The brightness setting in scanners balances the light and dark shades in your scanned images. Too high or too low brightness settings can make some data unclear, decreasing OCR accuracy. Hence, a default brightness of 50% is recommended for all scanning requirements.

Dynamsoft Label Recognition

The Dynamsoft Label Recognizer SDK is a text recognition tool that leverages OCR technology to extract alphanumeric characters and standard symbols from images, regardless of their background color, font, or text size. Unlike traditional OCR systems, our label recognizer is specifically designed to interpret text that does not conform to natural language rules.

This software development kit (SDK) can be customized to recognize specific patterns of characters and symbols for a variety of applications, including identification cards, inventory labels, price tags, automotive VIN codes, and license plates.

Developers can fully take control of the data and improve recognition accuracy.

The specialized OCR SDK is optimized for data extraction from compact text blocks.
Excelling in interpreting alphanumeric characters and symbols, setting it apart from conventional OCR tools.
Optimized Data Extraction: Ideal for extracting field-value pairings from labels, forms, or other similar documents.
Target Region Specification: The capability to designate target areas from a reference region ensures higher accuracy and efficiency. Learn more details about defining multiple reference regions and text areas.

Take the next step

Experience unmatched precision with Dynamsoft Label Recognizer. Please check out our Online Documentation.

Get in touch with one of our Technical Support Members