Combining Deep Learning and Computer Vision for Barcode Recognition

Last week, I trained a YOLOv3 model and a YOLOv3-tiny model to do barcode localization via deep learning. By comparing their performance, I dropped YOLOv3, because YOLOv3-tiny is much faster. I am satisfied with the QR code detection speed by running the YOLOv3-tiny model on my GeForce RTX2060 graphics card. In this article, I will power Darknet to decode QR code by integrating Dynamsoft C/C++ barcode SDK. My goal is to explore whether it is possible to utilize deep learning to boost barcode recognition performance.

Barcode SDK Download

Dynamsoft C/C++ Barcode SDK for Windows

Darknet Download

git clone https://github.com/AlexeyAB/darknet --depth 1

Barcode Recognition with Deep Learning and Computer Vision

Let’s find the function run_detector() which parses input arguments in detector.c. Duplicate the line of calling the function test_detector() and rename it for barcode:

if (0 == strcmp(argv[2], "barcode")) barcode_detector(datacfg, cfg, weights, filename, thresh, hier_thresh, dont_show, ext_output, save_labels, outfile, letter_box, benchmark_layers);
    else if (0 == strcmp(argv[2], "test")) test_detector(datacfg, cfg, weights, filename, thresh, hier_thresh, dont_show, ext_output, save_labels, outfile, letter_box, benchmark_layers);

Based on the modification, the barcode recognition command is:

darknet detector barcode …

The function barcode_detector() is so far the same to the function test_detector(). The code I’m going to change is to get the QR code bounding box as network prediction is done and then call the function DBR_DecodeBuffer() to decode QR code with the region values.

Here is the code for getting the object bounding box and class name:

        int selected_detections_num;
        detection_with_class* selected_detections = get_actual_detections(dets, nboxes, thresh, &selected_detections_num, names);
        int i;
        for (i = 0; i < selected_detections_num; ++i) {
            const int best_class = selected_detections[i].best_class;
            if (selected_detections[i].det.prob[best_class] < 0.5) continue;
            
            printf("%s: %.0f%%\n\n", names[best_class],    selected_detections[i].det.prob[best_class] * 100);
            box b = selected_detections[i].det.bbox;
            int left = (b.x - b.w / 2.)*im.w;
            int right = (b.x + b.w / 2.)*im.w;
            int top = (b.y - b.h / 2.)*im.h;
            int bot = (b.y + b.h / 2.)*im.h;

            decode_barcode_buffer(barcodeReader, image_buffer, im.w, im.h, im.c, TRUE, left, right, top, bot);
        }

We can filter results by confidence value. I set 0.5 as the threshold.

The type of image buffer used for DBR_DecodeBuffer() is unsigned char*, whereas the type of data used for network prediction is float *, which is not compatible with unsigned char*. To figure out how to get the correct data type, we can go to the line:

image im = load_image(input, 0, 0, net.c);

By tracing the call stack load_image() < load_image_cv() < mat_to_image(), we can see how float * converted from unsigned char *:

extern "C" image mat_to_image(cv::Mat mat)
{
    int w = mat.cols;
    int h = mat.rows;
    int c = mat.channels();
    image im = make_image(w, h, c);
    unsigned char *data = (unsigned char *)mat.data;
    int step = mat.step;
    for (int y = 0; y < h; ++y) {
        for (int k = 0; k < c; ++k) {
            for (int x = 0; x < w; ++x) {
                im.data[k*w*h + y*w + x] = data[y*step + x*c + k] / 255.0f;
            }
        }
    }
    return im;
}

To get the unsigned char array decoded from the image file, I create two new functions, load_image_buffer() and  buffer_to_image(), to substitute mat_to_image():

extern "C" void load_image_buffer(char *filename, int channels, unsigned char** buffer, int *width, int *height, int *channel)
{
    cv::Mat mat = load_image_mat(filename, channels);

    int w = mat.cols;
    int h = mat.rows;
    int c = mat.channels();
    unsigned char *data = (unsigned char *)mat.data;
    int size = sizeof(unsigned char) * w * h * c;
    unsigned char* image_buffer = (unsigned char*)malloc(size);
    memcpy(image_buffer, data, size); 
    *buffer = image_buffer;
    *width = w;
    *height = h;
    *channel = c;
}
extern "C" image buffer_to_image(unsigned char* buffer, int w, int h, int c)
{
    cv::Mat mat = cv::Mat(h, w, CV_8UC(c));
    int step = mat.step;
    image im = make_image(w, h, c);
    unsigned char *data = buffer;
    for (int y = 0; y < h; ++y) {
        for (int k = 0; k < c; ++k) {
            for (int x = 0; x < w; ++x) {
                im.data[k*w*h + y*w + x] = data[y*step + x*c + k] / 255.0f;
            }
        }
    }
    return im;
}

Finally, I implement a function decode_barcode_buffer() to decode barcode from a full image or do a region decoding for a QR code detected by YOLO model.

void decode_barcode_buffer(void* barcodeReader, const unsigned char *data, int width, int height, int channel, boolean has_region, int left, int right, int top, int bottom) 
{
    if (has_region)
    {
        PublicRuntimeSettings settings;
        char errorMessage[256];
        int errorCode = DBR_GetRuntimeSettings(barcodeReader, &amp;settings);
        settings.region.regionLeft = left;
        settings.region.regionRight = right;
        settings.region.regionTop = top;
        settings.region.regionBottom = bottom;
        settings.region.regionMeasuredByPercentage = 0;
        settings.barcodeFormatIds = BF_QR_CODE;
        settings.expectedBarcodesCount = 1;
        settings.localizationModes[2] = LM_SKIP;
        settings.localizationModes[3] = LM_SKIP;
        DBR_UpdateRuntimeSettings(barcodeReader, &amp;settings, errorMessage, 256);
    }

    double time = get_time_point();
    ImagePixelFormat format = IPF_RGB_888;
    if (channel == 1)
    {
        format = IPF_GRAYSCALED;
    }
    else if (channel == 4)
    {
        format = IPF_ARGB_8888;
    }

    int errorCode = DBR_DecodeBuffer(barcodeReader, data, width, height, width * channel, format, "");

    printf(" Barcode buffer decoding in %lf milli-seconds.\n", ((double)get_time_point() - time) / 1000);

    TextResultArray *resultArray = NULL;
    DBR_GetAllTextResults(barcodeReader, &amp;resultArray);
    if (resultArray->resultsCount == 0)
    {
        printf("No barcode found.\n");
    }
    else
    {
        int index = 0;
        for (; index < resultArray->resultsCount; index++)
        {
            printf(" Type: %s, Value: %s \n\n", resultArray->results[index]->barcodeFormatString, resultArray->results[index]->barcodeText);
        }   
    }
    
    DBR_FreeTextResults(&amp;resultArray);
}

The difference is the former one totally relies on the computer vision algorithm, and the latter one utilizes deep learning for QR code region detection and computer vision algorithm for decoding QR code from a specific region, which may be much smaller than the full image size.

I can’t wait to build and test the program.

> build.ps1
> darknet.exe  detector barcode qrcode.data qrcode-yolov3-tiny.cfg qrcode-yolov3-tiny_last.weights 20201105151910.jpg

Performance Evaluation

Computer vision only

Barcode buffer decoding in 175.577000 milli-seconds.

Deep learning + computer vision

20201105151910.jpg: Predicted in 4.021 milli-seconds.
Barcode buffer decoding in 84.98 milli-seconds.

It seems deep learning brings a huge performance leap. However, don’t get excited too early. Let’s analyze the missing parts through the whole barcode recognition process:

  • Read an image to unsigned char array by OpenCV API. This part is used for both cases, so we can ignore it.
  • Convert unsigned char array to the image object. This part takes 120.461 milliseconds.
  • Resize the image to 416×416 – the dimension used for training my model. This part takes 49.898 milliseconds.
The time cost of computer vision only = 175.577000 ms.
The time cost of deep learning and computer vision = 120.461 + 49.898 + 4.021 + 84.98 = 259.36ms.
deep learning vs computer vision for QR code detection and recognition
darknet cuda YOLOv3-tiny for QR detection

According to the total elapsed time calculation, the pure computer vision algorithm is the winner for QR code recognition on my PC.

The Pros and Cons of Deep Learning for Barcode Scanning

Pros

  • As an image preprocessing approach (cropping barcode objects from an image) can dramatically boost performance.

Cons

  • The deep learning model training is expensive.
  • Image loading and preprocessing take much more time than computer vision algorithms.
  • Highly relies on GPU performance.

All my above points are based on the comparison between deep learning and Dynamsoft Barcode Reader SDK. Nevertheless, not all barcode SDKs are as powerful as Dynamsoft Barcode Reader SDK. I have not tested other barcode SDKs implemented in the computer vision algorithm yet. Probably deep learning could be a good complement for free barcode SDKs like ZXing.

Source Code

https://github.com/yushulx/darknet