Recommended Scan Settings for the Best OCR Accuracy
Business productivity goes down when the paper starts stacking up in the office, which makes it impossible to have an office workspace without a document scanner. Every day millions of documents are scanned for finding, storing, verifying, and sharing information. This digital transit of documents requires a fast and accurate OCR functionality, working at optimal scanning settings. So, what are the best-recommended OCR scanning settings? Let’s find out here
Since document image quality plays a crucial role in achieving higher OCR accuracy, image resolution becomes a controlling factor for image quality. Image resolution is measured in dots per inch or DPI. For font size above 10 pts or 3.528, 300 DPI is recommended. For smaller font sizes, a higher DPI, say 400 DPI, is recommended. The key point to note here is that lower resolution produces lower-quality images, which might affect text recognition. Similarly, higher resolutions will produce bigger and clear images, however, it will take more time for image processing.
Font Color Considerations
When scanning, one can choose among three color modes: black and white, grayscale, and colored. However, the recommended color mode to yield optimal OCR accuracy is grayscale. Black-and-white would also work for most text documents with clear font. But, when the font is small and image quality is not so good, B/W might undermine OCR recognition. On the other hand, grayscale will keep significantly more details than B/W and would be a better option. If your document contains pictures and you need to save the colors, then choose to scan in color mode.
File Type and Compression Considerations
Among the two types of image compression, lossy and lossless, the latter is recommended go for better OCR recognition. As with lossy file format, some data is discarded, reducing the overall amount of data, it is not recommended for important documents. However, lossy file formats like JPEG, are small in size, making it easier for storing a large number of documents. On the other hand, in lossless compression, files are reduced in size without any loss of data. PDF is one such loose standard that allows lossless compression of text documents. Other document types to save scanned images in uncompressed TIFF or PNG format. These allow for better future processing.
The brightness setting in scanners is used to balance the light and dark shades in your scanned images. Too high or too low brightness settings can make some data unclear, which in turn will decrease the accuracy of OCR Hence, a default brightness of 50% is recommended for all scanning requirements.
Dynamsoft Label Recognition
Dynamsoft Label Recognition is a text recognition SDK and a data control tool. Developers can fully take control of the data and improve recognition accuracy. Learn more details about defining multiple reference regions and text areas.