How to Decode, Decrypt and Parse South African Driving License in Python

If you are looking for the specification of South African driving license, you may be disappointed. There is no official reference except for the Stack Overflow Q&A, an incomplete document - ZA Drivers License format and an C# open source project. The Stack Overflow Q&A provides the RSA public key for decrypting the data encoded as PDF417, and the incomplete document helps to parse the decrypted data. In this article, I will show you how to decode, decrypt and parse South African driving license in Python.

Decode South African Driving License from PDF417 Barcode

  1. Download Dynamsoft Barcode Reader SDK:
     pip install dbr
    

    The SDK can be installed on any desktop or server operating system that supports Python 3.6 or later.

  2. Get a license key and then initialize the barcode reader as follows:
     from dbr import *
     BarcodeReader.init_license("DLS2eyJoYW5kc2hha2VDb2RlIjoiMjAwMDAxLTE2NDk4Mjk3OTI2MzUiLCJvcmdhbml6YXRpb25JRCI6IjIwMDAwMSIsInNlc3Npb25QYXNzd29yZCI6IndTcGR6Vm05WDJrcEQ5YUoifQ==")
     reader = BarcodeReader()
    
  3. Decode PDF417 to get the raw bytes of the driving license data:
     results = reader.decode_file(image_file)
     if results != None and len(results) > 0:
         return results[0].barcode_bytes
    

Dynamsoft Barcode Reader SDK can guarantee the barcode decoding performance. The next step is to decrypt the data with RSA public key.

Decrypt Driving License Data with RSA Public Key

The valid data decoded from PDF417 contains 720 bytes. There are two versions of South African driving license according to the Stack Overflow Q&A.

v1 = [0x01, 0xe1, 0x02, 0x45]
v2 = [0x01, 0x9b, 0x09, 0x45]

The next two bytes are zeros, so the payload contains 714 bytes. The 714 bytes form 6 blocks: 5 blocks of 128 bytes and 1 block of 74 bytes. The first 5 blocks are encrypted with the same RSA public key, and the last block is encrypted with a different RSA public key. The RSA public keys are provided in the Stack Overflow Q&A.

pk_v1_128 = '''
-----BEGIN RSA PUBLIC KEY-----
MIGXAoGBAP7S4cJ+M2MxbncxenpSxUmBOVGGvkl0dgxyUY1j4FRKSNCIszLFsMNwx2XWXZg8H53gpCsxDMwHrncL0rYdak3M6sdXaJvcv2CEePrzEvYIfMSWw3Ys9cRlHK7No0mfrn7bfrQOPhjrMEFw6R7VsVaqzm9DLW7KbMNYUd6MZ49nAhEAu3l//ex/nkLJ1vebE3BZ2w==
-----END RSA PUBLIC KEY-----
'''

pk_v1_74 = '''
-----BEGIN RSA PUBLIC KEY-----
MGACSwD/POxrX0Djw2YUUbn8+u866wbcIynA5vTczJJ5cmcWzhW74F7tLFcRvPj1tsj3J221xDv6owQNwBqxS5xNFvccDOXqlT8MdUxrFwIRANsFuoItmswz+rfY9Cf5zmU=
-----END RSA PUBLIC KEY-----
'''

pk_v2_128 = '''
-----BEGIN RSA PUBLIC KEY-----
MIGWAoGBAMqfGO9sPz+kxaRh/qVKsZQGul7NdG1gonSS3KPXTjtcHTFfexA4MkGAmwKeu9XeTRFgMMxX99WmyaFvNzuxSlCFI/foCkx0TZCFZjpKFHLXryxWrkG1Bl9++gKTvTJ4rWk1RvnxYhm3n/Rxo2NoJM/822Oo7YBZ5rmk8NuJU4HLAhAYcJLaZFTOsYU+aRX4RmoF
-----END RSA PUBLIC KEY-----
'''

pk_v2_74 = '''
-----BEGIN RSA PUBLIC KEY-----
MF8CSwC0BKDfEdHKz/GhoEjU1XP5U6YsWD10klknVhpteh4rFAQlJq9wtVBUc5DqbsdI0w/bga20kODDahmGtASy9fae9dobZj5ZUJEw5wIQMJz+2XGf4qXiDJu0R2U4Kw==
-----END RSA PUBLIC KEY-----
'''

The following steps demonstrate how to use the RSA public key to decrypt the data:

  1. Load the RSA public key from the PEM format.

     import rsa
     def decrypt_data(data):
         pubKey = rsa.PublicKey.load_pkcs1(pk128)
         pubKey = rsa.PublicKey.load_pkcs1(pk74)
    
  2. Convert each block byte array to a big integer, and use exponent e and modulus n to calculate the decrypted value.

     all = bytearray()
     pubKey = rsa.PublicKey.load_pkcs1(pk128)
     start = 6
     for i in range(5):
         block = data[start: start + 128]
         input = int.from_bytes(block, byteorder='big', signed=False)
         output = pow(input, pubKey.e, mod=pubKey.n)
            
         decrypted_bytes = output.to_bytes(128, byteorder='big', signed=False)
         all += decrypted_bytes
            
         start = start + 128
        
     pubKey = rsa.PublicKey.load_pkcs1(pk74)
     block = data[start: start + 74]
     input = int.from_bytes(block, byteorder='big', signed=False)
     output = pow(input, pubKey.e, mod=pubKey.n)
        
     decrypted_bytes = output.to_bytes(74, byteorder='big', signed=False)
     all += decrypted_bytes
     return all
    

After getting the decrypted bytes, we can get started to parse the information of the driving license.

Parse South African Driving License

The decrypted data consists of 4 sections: header, strings, binary data, and image data. We can skip to the strings section by finding hex 0x82.

index = 0
for i in range(0, len(data)):
    if data[i] == 0x82:
        index = i
        break

The next byte needs to be ignored, so the payload starts from index + 2. The strings are delimited by hex 0xe0 and 0xe1. 0xe1 does not only indicate the delimiter, but also represents an empty string. For example, 41 e0 42 e1 e1 means A,B.

We create two functions to read single string and multiple strings.

def readStrings(data, index, length):
    strings = []
    
    i = 0
    while i < length:
        value = ''
        while True:
            currentByte = data[index]
            index += 1
            
            if currentByte == 0xe0:
                break
            elif currentByte == 0xe1:
                if value != '':
                    i += 1
                break
            
            value += chr(currentByte)
            
        i += 1
        
        if value != '':
            strings.append(value)
            
    return strings, index

def readString(data, index):
    value = ''
    delimiter = 0xe0

    while True:
        currentByte = data[index]
        index += 1
        
        if currentByte == 0xe0 or currentByte == 0xe1:
            delimiter = currentByte
            break

        value += chr(currentByte)
        
    return value, index, delimiter

Then read all strings one by one.

def parse_data(data):
    vehicleCodes, index = readStrings(data, index + 2, 4)

    surname, index, delimiter = readString(data, index)

    initials, index, delimiter = readString(data, index)

    PrDPCode = ''
    if delimiter == 0xe0:
        PrDPCode, index, delimiter = readString(data, index)

    idCountryOfIssue, index, delimiter = readString(data, index)

    licenseCountryOfIssue, index, delimiter = readString(data, index)

    vehicleRestrictions, index = readStrings(data, index, 4)

    licenseNumber, index, delimiter = readString(data, index)

    idNumber = ''
    for i in range(13):
        idNumber += chr(data[index])
        index += 1

From the binary data section, we can get the date of birth, date of license issue, date of license expiry and gender.

idNumberType = f'{data[index]:02d}'
index += 1

nibbleQueue = []
while True:
    currentByte = data[index]
    index += 1
    if currentByte == 0x57:
        break

    nibbles = [currentByte >> 4, currentByte & 0x0f]
    
    nibbleQueue += nibbles
    
licenseCodeIssueDates = readNibbleDateList(nibbleQueue, 4)

driverRestrictionCodes = f'{nibbleQueue.pop(0)}{nibbleQueue.pop(0)}'

PrDPermitExpiryDate = readNibbleDateString(nibbleQueue)

licenseIssueNumber = f'{nibbleQueue.pop(0)}{nibbleQueue.pop(0)}'

birthdate = readNibbleDateString(nibbleQueue)

licenseIssueDate = readNibbleDateString(nibbleQueue)

licenseExpiryDate = readNibbleDateString(nibbleQueue)

gender = f'{nibbleQueue.pop(0)}{nibbleQueue.pop(0)}'
if  gender == '01':
    gender = 'male'
else:
    gender = 'female'

The functions for nibble date are as follows:

def readNibbleDateString(nibbleQueue):
    m = nibbleQueue.pop(0)
    if m == 10:
        return ''
    
    c = nibbleQueue.pop(0)
    d = nibbleQueue.pop(0)
    y = nibbleQueue.pop(0)

    m1 = nibbleQueue.pop(0)
    m2 = nibbleQueue.pop(0)

    d1 = nibbleQueue.pop(0)
    d2 = nibbleQueue.pop(0)
    
    return f'{m}{c}{d}{y}/{m1}{m2}/{d1}{d2}'
    
def readNibbleDateList(nibbleQueue, length):
    dateList = []
    
    for i in range(length):
        dateString = readNibbleDateString(nibbleQueue)
        if dateString != '':
            dateList.append(dateString)
            
    return dateList

The final image data section is still unknown.

Read South African Driving License from Image File, Byte Array or Base64 String

We can read South African driving license from an image file, a byte array or a base64 string. The byte array and base64 string could be encrypted or decrypted.

  • Image file:
      BarcodeReader.init_license(key)
      reader = BarcodeReader()
      results = reader.decode_file(source)
      if results != None and len(results) > 0:
          data = results[0].barcode_bytes
          if data == None or len(data) != 720:
              return None
            
          return parse_bytes(data, encrypted)
    
  • Byte array:
      data = Path(source).read_bytes()
      if len(data) != 720 and encrypted == True:
          return None
        
      if encrypted:
          data = decrypt_data(data)
      return parse_data(data)
    
  • Base64 string:
      with open(source, 'r') as f:
          source = f.read()
          data = base64.b64decode(source)
          if len(data) != 720 and encrypted == True:
              return None
            
          if encrypted:
              data = decrypt_data(data)
          return parse_data(data)
    

Use ArgumentParser to parse command line arguments for different input types.

parser = argparse.ArgumentParser(description='Decode, decrypt and parse South Africa driving license.')
parser.add_argument('source', help='A source file containing information of driving license.')
parser.add_argument('-t', '--types', default=1, type=int, help='Specify the source type. 1: PDF417 image 2: Base64 string 3: Raw bytes')
parser.add_argument('-e', '--encrypted', default=1, type=int, help='Is the source encrypted? 0: No 1: Yes')
parser.add_argument('-l', '--license', default='', type=str, help='The license key is required for decoding PDF417')

Decode South African Driving License in Python

Source Code

https://github.com/yushulx/South-Africa-driving-license