Scan Documents in Headless Chrome with Selenium

Dynamic Web TWAIN is an SDK which makes it possible to scan documents in browsers. Selenium is a suite of tools for automating web browsers, like Firefox, Edge and Chrome. Primarily, it is for testing purposes, but it can also serve for other tasks.

In this article, we are going to combine the two to create a web app which can be used on any device to scan documents. We are going to use Python and Selenium to host and control a document scanning web page in headless Chrome and then provide HTTP API interfaces for a web app to use. The reason why we need to run it in headless mode is that we can run the app from the command line.

Let’s do this in steps.

Getting Started With Dynamic Web TWAIN

Write a Document Scanning Web Page

  1. Download Dynamic Web TWAIN and then install it.

    PS: For Linux, you need to manually install the Dynamsoft Service (more about Dynamsoft Service). For example, run the following to install it on Ubuntu/Debian.

    dpkg -i Resources\dist\DynamsoftServiceSetup.deb
    
  2. Create a new project folder and copy the Resources folder of Dynamic Web TWAIN into it.
  3. Edit Resources/dynamsoft.webtwain.config to configure your own license and disable AutoLoad since we will do this manually. You can apply for a trial license here.

    + Dynamsoft.DWT.AutoLoad = false;
    - Dynamsoft.DWT.AutoLoad = true;
    Dynamsoft.DWT.ProductKey = 'LICENSE-KEY';
    
  4. Create an HTML file named DWT.html to load the library of Dynamic Web TWAIN.

    <!DOCTYPE html>
    <html>
    <head>
      <meta charset="utf-8" />
      <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
      <title>Dynamic Web TWAIN</title>
      <script src="Resources/dynamsoft.webtwain.initiate.js"></script>
      <script src="Resources/dynamsoft.webtwain.config.js"></script>
      <script src="DWT.js"></script>
    </head>
    <body>
      <p>DWT</p>
    </body>
    </html>
    
  5. Create a DWT.js file to load Dynamic Web TWAIN manually and scan documents. When the page is loaded, it will scan documents and then display the scanned image on the page.

    var DWObject = null;
    
    window.onload = function(){
      CreateDWT();
    }
    
    function CreateDWT() {
      var success = function (obj) {
        DWObject = obj;
        Scan();
      };
    
      var error = function (err) {
        console.log(err)
      };
    
      Dynamsoft.DWT.CreateDWTObjectEx({
        WebTwainId: 'dwtcontrol'
      },
        success,
        error
      );
    }
    
    function Scan() {
      if (DWObject) {
        DWObject.SelectSourceByIndex(0);
        DWObject.CloseSource();
        DWObject.OpenSource();
        DWObject.IfShowUI = true;
           
        var OnAcquireImageSuccess = function () {
          var success = function (result, indices, type) {
             var scannedImage = document.createElement("img");
            scannedImage.src = "data:image/jpg;base64," + result.getData(0, result.getLength());
            document.body.appendChild(scannedImage);
          };
    
          var error = function (errorCode, errorString) {
            console.log(errorString);
          };
             
          //1 is B&W, 8 is Gray, 24 is RGB
          if (DWObject.GetImageBitDepth(DWObject.CurrentImageIndexInBuffer) == 1) {
            DWObject.ConvertToGrayScale(DWObject.CurrentImageIndexInBuffer);
          }
               
          DWObject.ConvertToBase64(
            [DWObject.CurrentImageIndexInBuffer],
            Dynamsoft.DWT.EnumDWT_ImageType.IT_JPG,
            success,
            error
          );
             
        }
        var OnAcquireImageError = function () {
          console.log("error");
        }
        DWObject.AcquireImage(OnAcquireImageSuccess, OnAcquireImageError);
      }
    }
    

Use Selenium to Control the Document Scanning Web Page in Headless Chrome

Next, we are going to use Python and Selenium to control the web page we just write.

Setup Environment

  1. Install Selenium:

    pip install selenium
    
  2. Install the Flask web framework for hosting the web page:

    pip install flask
    
  3. Install Chrome and Download Chromedriver. Put the chromedriver.exe in the project folder.

    PS: If you are using Linux, you can just use snap to install Chromium which has Chromedriver packed.

Start an HTTP Server

We have to start an HTTP server to host the document scanning web page first.

#coding=utf-8
from flask import Flask, request
app = Flask(__name__, static_url_path='/', static_folder='static')

if __name__ == '__main__':
    app.run(host='0.0.0.0')

All static files are moved into the static folder.

Run the Python file from command line:

python server.py

We can visit the app here: http://127.0.0.1:5000/DWT.html

Start Chrome in Headless Mode

Next, start Chrome in headless mode.

from selenium import webdriver
import threading

browser = None
def start_chrome():
    chromedriver_path = 'chromedriver.exe' # You may have to change it for mac/Linux.
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('headless')
    global browser
    browser = webdriver.Chrome(executable_path=chromedriver_path, options=chrome_options)
    
if __name__ == '__main__':
    threading.Thread(target=start_chrome, args=()).start()
    app.run(host='0.0.0.0')

Execute JavaScript from Python

Now, we can execute JavaScript from Python to control the web page.

There are two methods in Selenium Python to do this: execute_script and execute_async_script.

The first one synchronously executes JavaScript and returns the result directly. Its usage is like this:

driver.execute_script('return document.title;')

The second one asynchronously executes JavaScript. It passes a callback as the last argument for the function which we can use to return the result.

script = "var callback = arguments[arguments.length - 1]; " \
         "window.setTimeout(function(){ callback('timeout') }, 3000);"
driver.execute_async_script(script)

We are going to use the two methods to interact with Dynamic Web TWAIN.

  1. Initialize Dynamic Web TWAIN.

    In Python, load the web page and then run the CreateDWT function.

    DWT_created = False
    def create_DWT():
        browser.get('http://127.0.0.1:5000/DWT.html')
        global DWT_created
        DWT_created = browser.execute_async_script('''
                                                const cb = arguments[arguments.length - 1];
                                                CreateDWT(cb);
                                                ''')
    

    JavaScript:

    function CreateDWT(callback) {
      var success = function (obj) {
        DWObject = obj;
        callback(true);
      };
    
      var error = function (err) {
        callback(false);
      };
    
      Dynamsoft.DWT.CreateDWTObjectEx({
        WebTwainId: 'dwtcontrol'
      },
        success,
        error
      );
    }
    
  2. Get the list of connected scanners.

    Python:

    scanners = browser.execute_script('''
                                        scanners = GetScannersList();
                                        return scanners;
                                        ''')
    

    JavaScript:

    function GetScannersList() {
      var scanners = [];
      var count = DWObject.SourceCount;
      for (var i = 0; i < count; i++) {
        scanners.push(DWObject.GetSourceNameItems(i));
      }
      return scanners;
    }
    
  3. Scan a document.

    Python:

    resolution = '300'
    selected_index = '0'
    pixelType = '0' # 0: black and white, 1: gray, 2: color
    js = '''
        const cb = arguments[arguments.length - 1];
        var options = {};
        options.showUI = false;
        options.resolution = '''+resolution+''';
        options.selectedIndex = '''+selected_index+''';
        options.pixelType = '''+pixelType+''';
        Scan(options,cb);
        '''
    result = browser.execute_async_script(js);
    

    JavaScript:

    function Scan(options,callback) {
      if (DWObject) {
        DWObject.SelectSourceByIndex(options.selectedIndex);
        DWObject.CloseSource();
        DWObject.OpenSource();
        DWObject.IfShowUI = options.showUI;
        DWObject.PixelType = options.pixelType;
        DWObject.Resolution = options.resolution;
    
        var OnAcquireImageSuccess = function () {
          var success = function (result, indices, type) {
            DWObject.RemoveAllImages();
            callback(result.getData(0, result.getLength()));
          };
    
          var error = function (errorCode, errorString) {
            console.log(errorString);
            DWObject.RemoveAllImages();
            callback(false);
          };
          //1 is B&W, 8 is Gray, 24 is RGB
          if (DWObject.GetImageBitDepth(DWObject.CurrentImageIndexInBuffer) == 1) {
            DWObject.ConvertToGrayScale(DWObject.CurrentImageIndexInBuffer);
          }
               
          DWObject.ConvertToBase64(
            [DWObject.CurrentImageIndexInBuffer],
            Dynamsoft.DWT.EnumDWT_ImageType.IT_JPG,
            success,
            error
          );
             
        }
        var OnAcquireImageError = function () {
          callback(false);
        }
        DWObject.AcquireImage(OnAcquireImageSuccess, OnAcquireImageError);
      } else {
        callback(false);
      }
    }
    

Wrap the Functions into HTTP Interfaces

Next, let’s wrap the functions into HTTP interfaces.

  1. api/dwtpage/load

    This API checks whether Chrome has loaded the web page of the document scanning app and loads the page if it hasn’t.

    @app.route('/api/dwtpage/load')
    def load():
        if DWT_created == False:
            print("dwt loading")
            create_DWT()
            if DWT_created == True:
                return {"loaded":True}
            else:
                return {"loaded":False}
        else:
            return {"loaded":True}
    
  2. api/get_scanner_list

    This API returns the list of scanners.

    @app.route('/api/scanner/getlist')
    def get_scanner_list():
        scanners = browser.execute_script('''
                                            scanners = GetScannersList();
                                            return scanners;
                                            ''')
        return {"scanners":scanners}
    
  3. api/scan

    This API scans a document and returns the base64 result.

    @app.route('/api/scan')
    def scan():
        resolution = request.args.get('resolution', '300')
        selected_index = request.args.get('selectedIndex', '0')
        pixelType = request.args.get('pixelType', '0')
        js = '''
            const cb = arguments[arguments.length - 1];
            var options = {};
            options.showUI = false;
            options.resolution = '''+resolution+''';
            options.selectedIndex = '''+selected_index+''';
            options.pixelType = '''+pixelType+''';
            Scan(options,cb);
            '''
        print(js)
        result = browser.execute_async_script(js);
        print(result)
        if result != False:
            return {"success":True, "base64":result}
        else:
            return {"success":False}
    

Write a Document Scanning Web App to Use the HTTP Interfaces

We can write a web app to use the HTTP interfaces. The final result looks like this:

On PC:

Scanner

On iPhone:

Scanner on iPhone

Source Code

Check out the source code to have a try:

https://github.com/xulihang/Document-Scanning-Server