Scan Barcode and QR Code From PDF Files

Oct 27, 2021

Accurately capturing data from paper forms can be extremely challenging. Re-keyed-paper-based-form data not only is human-intensive, it’s also error-prone. Optical Character Recognition (OCR) can help reduce the amount of human involvement. However, it still requires some human supervision to ensure the data capture process is done correctly.

Streamline workflow using QR codes

QR Code has been proven to streamline manual processes and improve efficiency. A QR Code can be embedded in the paper-based form. It would encode all the information that a user enters when the form is filled. The recipient would simply capture the documents via a traditional scanner or a camera and save as a PDF file. The QR code enclosed within the PDF file can be decoded to reveal the complete user-supplied data on the form. QR Codes are generated with error correction. So, as long as the QR codes are clearly captured, a near 100% accuracy in form-based data capture is achievable.

Types of PDF formats

Based on how PDF files are created, PDFs can be roughly divided into 3 categories:

Digitally created PDF, or “True” PDF. They are created using software such as Microsoft® Word®, Excel® or via the “print” function within a software application (virtual printer).
Image-only PDF. These are mostly from scanned documents or snapshots from cameras. The text is not searchable.
Image-over-text PDF. They are created using OCR to recognize the text in image-only PDFs.

In this article, we will explore how the industry’s best barcode SDK decodes a QR code in different types of PDFs.

Dynamsoft Barcode Reader comes with a PDF library embedded. To get the best result in all 3 types of files, Dynamsoft Barcode Reader comes with a customizable parameter ‘PDFReadingMode’ which can be used to set how to render PDF files.

FREE DOWNLOAD Dynamsoft Barcode Reader 30-day trial

Scan Barcodes from a Digitally Created PDF

Digitally created PDFs are sometimes referred to as ‘true’ PDFs. They consist of text and images. The images can be either vector images or raster images.

1. Barcodes Stored as Vector Images

When it comes to scanning barcodes from a vector PDF, Dynamsoft’s barcode SDK has a unique advantage. Unlike other barcode SDKs which convert the pages into full-page images and then try to localize the barcodes, Dynamsoft’s SDK directly extracts PDF vector data for barcode region localization and decoding. Skipping the rendering process not only improves the speed but also helps in achieving a higher accuracy,

scan qr code from pdf-vector image

To make use of the unique technique, simply set the parameter ‘PDFReadingMode’ as ‘PDFRM_VECTOR’.

C# Code snippet:

PublicRuntimeSettings settings = reader.GetRuntimeSettings();
settings.PDFReadingMode = EnumPDFReadingMode.PDFRM_VECTOR;
reader.UpdateRuntimeSettings(settings);

Now let’s examine how well it performs. For comparison, we picked two top commercial SDKs on the market.

A summary of the performance data:

	File1 (download the sample file)
Commercial SDK 1	Found 3 barcodes in 1501 ms
Commercial SDK 2	Found 3 barcodes in 605 ms
Dynamsoft (Vector Mode)	Found 3 barcodes in 4 ms

Overall Dynamsoft’s SDK is 100 times faster than the industry’s 2nd and 3rd best barcode SDKs. We’ve tested several other PDF files and the results were consistent.

Note: The ‘PDFRM_VECTOR’ mode only works for linear barcodes at this point.

2. Barcodes Stored as Raster Images

More commonly, the images in digitally created PDFs are in a raster format, such as PNG, JPG, BMP, etc.

scan qr code from pdf-rastor image

There are two options to decode the barcodes in such PDFs:

Extract images without rendering the whole page
Render the whole page as an image

2.1 Extract images without rendering the whole page

A time-efficient approach is to extract the images with the barcode regions and then transfer the images to a barcode engine for processing. The text is left out in this approach.

2.2 Render each page of the PDF as an image

Another option is to render the page as a full-page image. We set the parameter ‘PDFReadingMode’ as ‘PDFRM_RASTER’. Refer to the code snippet below.

C# Code snippet (barcode format; pdf page; pdf reading mode; PDFRasterDPI )

Dynamsoft.DBR.PublicRuntimeSettings settings = reader.GetRuntimeSettings();
settings.PDFRasterDPI = 200;
settings.PDFReadingMode = EnumPDFReadingMode.PDFRM_RASTER;
reader.UpdateRuntimeSettings(settings);

The tricky part is to choose the appropriate value of ‘PDFRasterDPI’ for rendering the image. A large ‘PDFRasterDPI’ value renders a large image and slows down the processing speed. Conversely, a small ‘PDFRasterDPI’ renders a small image where the barcode region may be distorted.

Note: The ‘PDFRM_RASTER’ mode works for all barcode formats, including linear barcode, PDF417, QR code, and other 2D codes.

Scan QR Code from an Image-only PDF

Image-only PDFs are mostly generated from MFPs, scanners, and camera captures. This is why they are sometimes referred to as scanned PDFs. Scanning barcodes and QR codes from scanned PDFs is not too different from scanning from images, except that the PDF library needs to separate the pages into individual images.

Scan QR Code from an Image-over-Text PDF

When running an image-only PDF file through an OCR engine, the recognized text is embedded behind the scanned image of a document. A mapping is created for each word from the OCR text to the zone from which the text was located on the scanned image. Image-over-Text PDFs are also referred to as searchable PDFs.

scan qr code from pdf-page image

Since the image and the recognized text are isolated, a PDF library only needs to extract the page image layer and pass it to a barcode engine for processing.

About Dynamsoft Barcode Reader

Dynamsoft Barcode Reader enables developers to quickly implement 1D and 2D barcode scanning into their applications running on different platforms. On top of scanning linear barcodes, it can function as a powerful QR Code reader or a 2D imager.

Supported programming languages: C#, VB.net, Java, C++, Python, C, Android Java, Swift, Kotlin, and JavaScript.