Scan Barcode and QR Code From PDF Files
Accurately capturing data from paper forms can be extremely challenging. Re-keyed-paper-based-form data not only is human-intensive, it’s also error-prone. Optical Character Recognition (OCR) can help reduce the amount of human involvement. However, it still requires some human supervision to ensure the data capture process is done correctly.
QR Code has been proven to streamline manual processes and improve efficiency. A QR Code can be embedded in the paper-based form. It would encode all the information that a user enters when the form is filled. The recipient would simply capture the documents via a traditional scanner or a camera and save as a PDF file. The QR code enclosed within the PDF file can be decoded to reveal the complete user-supplied data on the form. QR Codes are generated with error correction. So, as long as the QR codes are clearly captured, a near 100% accuracy in form-based data capture is achievable.
In this article, we will explore how the industry’s best barcode SDK decodes a QR code in different types of PDFs.
Based on how PDF files are created, PDFs can be roughly divided into 3 categories:
- Digitally created PDF, or “True” PDF. They are created using software such as Microsoft® Word®, Excel® or via the “print” function within a software application (virtual printer).
- Image-only PDF. These are mostly from scanned documents or snapshots from cameras. The text is not searchable.
- Image-over-text PDF. They are created using OCR to recognize the text in image-only PDFs.
Dynamsoft Barcode Reader comes with a PDF library embedded. To get the best result in all 3 types of files, Dynamsoft Barcode Reader comes with a customizable parameter ‘PDFReadingMode’ which can be used to set how to render PDF files.
Scan Barcodes from a Digitally Created PDF
Digitally created PDFs are sometimes referred to as ‘true’ PDFs. They consist of text and images. The images can be either vector images or raster images.
Barcodes Stored as Vector Images
When it comes to scanning barcodes from a vector PDF, Dynamsoft’s barcode SDK has a unique advantage. Unlike other barcode SDKs which convert the pages into full-page images and then try to localize the barcodes, Dynamsoft’s SDK directly extracts PDF vector data for barcode region localization and decoding. Skipping the rendering process not only improves the speed but also helps in achieving a higher accuracy,
To make use of the unique technique, simply set the parameter ‘PDFReadingMode’ as ‘PDFRM_VECTOR’.
C# Code snippet:
PublicRuntimeSettings settings = reader.GetRuntimeSettings(); settings.PDFReadingMode = EnumPDFReadingMode.PDFRM_VECTOR; reader.UpdateRuntimeSettings(settings);
Now let’s examine how well it performs. For comparison, we picked two top commercial SDKs on the market.
A summary of the performance data:
|File1 (download the sample file)|
|Commercial SDK 1||Found 3 barcodes in 1501 ms|
|Commercial SDK 2||Found 3 barcodes in 605 ms|
|Dynamsoft (Vector Mode)||Found 3 barcodes in 4 ms|
Overall Dynamsoft’s SDK is 100 times faster than the industry’s 2nd and 3rd best barcode SDKs. We’ve tested several other PDF files and the results were consistent.
Note: The ‘PDFRM_VECTOR’ mode only works for linear barcodes at this point.
Barcodes Stored as Raster Images
More commonly, the images in digitally created PDFs are in a raster format, such as PNG, JPG, BMP, etc.
There are two options to decode the barcodes in such PDFs:
- Extract images without rendering the whole page
- Render the whole page as an image
Extract images without rendering the whole page
A time-efficient approach is to extract the images with the barcode regions and then transfer the images to a barcode engine for processing. The text is left out in this approach.
Render each page of the PDF as an image
Another option is to render the page as a full-page image. We set the parameter ‘PDFReadingMode’ as ‘PDFRM_RASTER’. Refer to the code snippet below.
C# Code snippet (barcode format; pdf page; pdf reading mode; PDFRasterDPI )
Dynamsoft.DBR.PublicRuntimeSettings settings = reader.GetRuntimeSettings(); settings.PDFRasterDPI = 200; settings.PDFReadingMode = EnumPDFReadingMode.PDFRM_RASTER; reader.UpdateRuntimeSettings(settings);
The tricky part is to choose the appropriate value of ‘PDFRasterDPI’ for rendering the image. A large ‘PDFRasterDPI’ value renders a large image and slows down the processing speed. Conversely, a small ‘PDFRasterDPI’ renders a small image where the barcode region may be distorted.
Note: The ‘PDFRM_RASTER’ mode works for all barcode formats, including linear barcode, PDF417, QR code, and other 2D codes.
Scan QR Code from an Image-only PDF
Image-only PDFs are mostly generated from MFPs, scanners, and camera captures. This is why they are sometimes referred to as scanned PDFs. Scanning barcodes and QR codes from scanned PDFs is not too different from scanning from images, except that the PDF library needs to separate the pages into individual images.
Scan QR Code from an Image-over-Text PDF
When running an image-only PDF file through an OCR engine, the recognized text is embedded behind the scanned image of a document. A mapping is created for each word from the OCR text to the zone from which the text was located on the scanned image. Image-over-Text PDFs are also referred to as searchable PDFs.
Since the image and the recognized text are isolated, a PDF library only needs to extract the page image layer and pass it to a barcode engine for processing.
About Dynamsoft Barcode Reader
Dynamsoft Barcode Reader enables developers to quickly implement 1D and 2D barcode scanning into their applications running on different platforms. On top of scanning linear barcodes, it can function as a powerful QR Code reader or a 2D imager.