# PDF417

PDF417 barcodes are a type of stacked, variable length, bidirectional 2D barcode. It is one of the most widely used 2D barcodes, most commonly found in logistics, transportation (boarding passes), government identification (driver licenses and identification cards), inventory and document management (postal packages).

Some of the main features of PDF417 codes are:

- Encodes all 128 ASCII characters and extended characters
- High data capacity – holds up to 1850 alphanumeric characters, over 2700 digits or roughly 1100 bytes of data
- Public domain format without needing a license to use
- Error correction capability 0 – 8

PDF417 barcodes can be understood by breaking it down into sections, rows, columns, and data words. In this post, we deconstruct the PDF417 symbol and identify all its elements, as well as show you how to decode PDF417 barcodes.

Sections

PDF417 barcodes are made up of a number of modules, but can be categorized into three distinct sections. Each module starts with solid black column and ends with solid white column so you can visually see where each module begins and ends. There is also a blank margin on either side of a barcode known as the quiet zone. This is used to localize the barcode, which helps the barcode scanner recognize where the barcode starts and stops, and prevents other information that may surround the barcode from being scanned. The format for PDF417 symbols are as follows:

- Quiet zone
- Start Pattern
- Left Indicator
- Data Codewords
- Right Indicator
- Stop pattern

Extract Data from PDF417 of Driver Licenses

Start and Stop Patterns

Comprised of a set of black and white vertical bars and spaces, the start and stop patterns indicate the beginning and end regions of the barcode. They help the barcode scanner locate the barcode, but don't contain any data.

Left and Right Indicators

The left and right indicators do not contain any text data, rather they contain information about the barcode such as how many rows the barcode has, the error correction level, and so on.

Data Codewords

The data codewords section is where numbers, letters, or other symbols are decoded in a cluster pattern of bars and spaces, each separated by a solid white column. In the figure above, two sections are shown. There can be as few as one or as many as 30 data codeword clusters. The size of the PDF417 barcode depends on how much data is encoded.

Rows

PDF417 barcodes are comprised of rows. It must have at least three rows and a maximum of 90, each acting like a small linear barcode. In the figure above, eight rows are shown. Each row is the same width and has the same number of codewords.

Columns

Each module is made up of 17 columns. As mentioned previously, each module starts with a solid black column and ends with a solid white column.

Data Words

Each data word section is 17 cells long and are comprised of 4 black bars and 4 white spaces, which is where the name PDF417 comes from — Portable Data File (PDF) + the data word pattern is comprised of 4 black and white cells, and are 17 cells long. Each data word is read left to right, top to bottom.

Millennial Vision, Inc. Chooses Dynamsoft Barcode Reader to Auto-Populate Driver’s License Data

Encoding

PDF417 uses a base 929 encoding, where each data word represents a value from 0 to 928. The value is determined by the sequence of black and white cells. In the figure above, we see a sequence in one data word consisting of: 1 black, 4 white, 2 black, 3 white, 2 black, 2 white, 1 black, and 2 white. Together, this creates a value of 14232212.

Note, the beginning black and the end white cells are included. Next, we can look up our sequence in a PDF417 Codeword Combo site, which lists all the 929 encoding values. We can use CTRL F to search for our sequence.

Now that we have found our sequence, we see that it equates to a value of 900, which means we’re doing text encoding. So what does this mean?

Of the 929 available code words, 0 – 900 are used for data, and the other 28 are used for special functions that define the barcode. Typically, PDF417 are solely for text.

Whatever the number of the sequence, you need to apply the following mathematical formula to extract the data chunk.

Note: F stands for first character, and S stands for the second character.

- F = # MOD 30
- S = (# – F) /30

Note: In computing, a MOD command is the remainder after division of one number by another.

In our example, one of our data words is 733. So we take 733 MOD 30, and that gives us 13.

Next, we take our number 733 and subtract 13, then divide by 30, which is 24.

Now we can take these numbers and look them up in a PDF417 Text Decoder Table to decipher the encoded data.

In the figure above, notice there are five columns:

- Number
- Alpha
- Lower
- Mixed
- Punctuation

By default, PDF417 starts in the Alpha column. So when we apply the formula to our data codewords, the first character we get is 5, which according to our table is capital D.

Some characters (26 – 29) translate to special functions. In our example, our next character is 27, which equates to ll. Using the table, we see that ll = latch to lower. This means that we need to latch everything after the first letter to the next column (i.e. the Lower-case column).

When we apply the formula to all our data codewords, we are able to decipher the text within our PDF417. In our example, our PDF417 barcode says “Dynamsoft”.

Error Correction

PDF417 uses Reed–Solomon error correction, which has added redundancy, ensuring the barcode is still readable even if it has been damaged. The error correction levels range from 0 to 8. The higher the error correction level, the more redundancy the barcode has. However, the more areas used for error correction means less data can be encoded into the barcode. As per AIM standards, a minimum error correction level of 2 is recommended.

The chart below shows the number of error correction codewords that are added to the PDF417 barcode as well as AIM error correction recommendations.

EC Level | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |

EC Codewords Generated | 2 | 4 | 6 | 8 | 16 | 32 | 64 | 128 | 512 |

Data Codewords | 1-40 | 41-160 | 161-320 | 321-863 | |||||

Data Bytes Encoded | 1-56 | 57-192 | 193-384 | 385-1035 |