Scan Documents in Headless Chrome with Selenium
Dynamic Web TWAIN is an SDK which makes it possible to scan documents in browsers. Selenium is a suite of tools for automating web browsers, like Firefox, Edge and Chrome. Primarily, it is for testing purposes, but it can also serve for other tasks.
In this article, we are going to combine the two to create a web app which can be used on any device to scan documents. We are going to use Python and Selenium to host and control a document scanning web page in headless Chrome and then provide HTTP API interfaces for a web app to use. The reason why we need to run it in headless mode is that we can run the app from the command line.
Let’s do this in steps.
Getting Started With Dynamic Web TWAIN
Write a Document Scanning Web Page
-
Download Dynamic Web TWAIN and then install it.
PS: For Linux, you need to manually install the Dynamsoft Service (more about Dynamsoft Service). For example, run the following to install it on Ubuntu/Debian.
dpkg -i Resources\dist\DynamsoftServiceSetup.deb
- Create a new project folder and copy the
Resources
folder of Dynamic Web TWAIN into it. -
Edit
Resources/dynamsoft.webtwain.config
to configure your own license and disableAutoLoad
since we will do this manually. You can apply for a trial license here.+ Dynamsoft.DWT.AutoLoad = false; - Dynamsoft.DWT.AutoLoad = true; Dynamsoft.DWT.ProductKey = 'LICENSE-KEY';
-
Create an HTML file named
DWT.html
to load the library of Dynamic Web TWAIN.<!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" /> <title>Dynamic Web TWAIN</title> <script src="Resources/dynamsoft.webtwain.initiate.js"></script> <script src="Resources/dynamsoft.webtwain.config.js"></script> <script src="DWT.js"></script> </head> <body> <p>DWT</p> </body> </html>
-
Create a
DWT.js
file to load Dynamic Web TWAIN manually and scan documents. When the page is loaded, it will scan documents and then display the scanned image on the page.var DWObject = null; window.onload = function(){ CreateDWT(); } function CreateDWT() { var success = function (obj) { DWObject = obj; Scan(); }; var error = function (err) { console.log(err) }; Dynamsoft.DWT.CreateDWTObjectEx({ WebTwainId: 'dwtcontrol' }, success, error ); } function Scan() { if (DWObject) { DWObject.SelectSourceByIndex(0); DWObject.CloseSource(); DWObject.OpenSource(); DWObject.IfShowUI = true; var OnAcquireImageSuccess = function () { var success = function (result, indices, type) { var scannedImage = document.createElement("img"); scannedImage.src = "data:image/jpg;base64," + result.getData(0, result.getLength()); document.body.appendChild(scannedImage); }; var error = function (errorCode, errorString) { console.log(errorString); }; //1 is B&W, 8 is Gray, 24 is RGB if (DWObject.GetImageBitDepth(DWObject.CurrentImageIndexInBuffer) == 1) { DWObject.ConvertToGrayScale(DWObject.CurrentImageIndexInBuffer); } DWObject.ConvertToBase64( [DWObject.CurrentImageIndexInBuffer], Dynamsoft.DWT.EnumDWT_ImageType.IT_JPG, success, error ); } var OnAcquireImageError = function () { console.log("error"); } DWObject.AcquireImage(OnAcquireImageSuccess, OnAcquireImageError); } }
Use Selenium to Control the Document Scanning Web Page in Headless Chrome
Next, we are going to use Python and Selenium to control the web page we just write.
Setup Environment
-
Install Selenium:
pip install selenium
-
Install the Flask web framework for hosting the web page:
pip install flask
-
Install Chrome and Download Chromedriver. Put the
chromedriver.exe
in the project folder.PS: If you are using Linux, you can just use snap to install Chromium which has Chromedriver packed.
Start an HTTP Server
We have to start an HTTP server to host the document scanning web page first.
#coding=utf-8
from flask import Flask, request
app = Flask(__name__, static_url_path='/', static_folder='static')
if __name__ == '__main__':
app.run(host='0.0.0.0')
All static files are moved into the static
folder.
Run the Python file from command line:
python server.py
We can visit the app here: http://127.0.0.1:5000/DWT.html
Start Chrome in Headless Mode
Next, start Chrome in headless mode.
from selenium import webdriver
import threading
browser = None
def start_chrome():
chromedriver_path = 'chromedriver.exe' # You may have to change it for mac/Linux.
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('headless')
global browser
browser = webdriver.Chrome(executable_path=chromedriver_path, options=chrome_options)
if __name__ == '__main__':
threading.Thread(target=start_chrome, args=()).start()
app.run(host='0.0.0.0')
Execute JavaScript from Python
Now, we can execute JavaScript from Python to control the web page.
There are two methods in Selenium Python to do this: execute_script and execute_async_script.
The first one synchronously executes JavaScript and returns the result directly. Its usage is like this:
driver.execute_script('return document.title;')
The second one asynchronously executes JavaScript. It passes a callback as the last argument for the function which we can use to return the result.
script = "var callback = arguments[arguments.length - 1]; " \
"window.setTimeout(function(){ callback('timeout') }, 3000);"
driver.execute_async_script(script)
We are going to use the two methods to interact with Dynamic Web TWAIN.
-
Initialize Dynamic Web TWAIN.
In Python, load the web page and then run the
CreateDWT
function.DWT_created = False def create_DWT(): browser.get('http://127.0.0.1:5000/DWT.html') global DWT_created DWT_created = browser.execute_async_script(''' const cb = arguments[arguments.length - 1]; CreateDWT(cb); ''')
JavaScript:
function CreateDWT(callback) { var success = function (obj) { DWObject = obj; callback(true); }; var error = function (err) { callback(false); }; Dynamsoft.DWT.CreateDWTObjectEx({ WebTwainId: 'dwtcontrol' }, success, error ); }
-
Get the list of connected scanners.
Python:
scanners = browser.execute_script(''' scanners = GetScannersList(); return scanners; ''')
JavaScript:
function GetScannersList() { var scanners = []; var count = DWObject.SourceCount; for (var i = 0; i < count; i++) { scanners.push(DWObject.GetSourceNameItems(i)); } return scanners; }
-
Scan a document.
Python:
resolution = '300' selected_index = '0' pixelType = '0' # 0: black and white, 1: gray, 2: color js = ''' const cb = arguments[arguments.length - 1]; var options = {}; options.showUI = false; options.resolution = '''+resolution+'''; options.selectedIndex = '''+selected_index+'''; options.pixelType = '''+pixelType+'''; Scan(options,cb); ''' result = browser.execute_async_script(js);
JavaScript:
function Scan(options,callback) { if (DWObject) { DWObject.SelectSourceByIndex(options.selectedIndex); DWObject.CloseSource(); DWObject.OpenSource(); DWObject.IfShowUI = options.showUI; DWObject.PixelType = options.pixelType; DWObject.Resolution = options.resolution; var OnAcquireImageSuccess = function () { var success = function (result, indices, type) { DWObject.RemoveAllImages(); callback(result.getData(0, result.getLength())); }; var error = function (errorCode, errorString) { console.log(errorString); DWObject.RemoveAllImages(); callback(false); }; //1 is B&W, 8 is Gray, 24 is RGB if (DWObject.GetImageBitDepth(DWObject.CurrentImageIndexInBuffer) == 1) { DWObject.ConvertToGrayScale(DWObject.CurrentImageIndexInBuffer); } DWObject.ConvertToBase64( [DWObject.CurrentImageIndexInBuffer], Dynamsoft.DWT.EnumDWT_ImageType.IT_JPG, success, error ); } var OnAcquireImageError = function () { callback(false); } DWObject.AcquireImage(OnAcquireImageSuccess, OnAcquireImageError); } else { callback(false); } }
Wrap the Functions into HTTP Interfaces
Next, let’s wrap the functions into HTTP interfaces.
-
api/dwtpage/load
This API checks whether Chrome has loaded the web page of the document scanning app and loads the page if it hasn’t.
@app.route('/api/dwtpage/load') def load(): if DWT_created == False: print("dwt loading") create_DWT() if DWT_created == True: return {"loaded":True} else: return {"loaded":False} else: return {"loaded":True}
-
api/get_scanner_list
This API returns the list of scanners.
@app.route('/api/scanner/getlist') def get_scanner_list(): scanners = browser.execute_script(''' scanners = GetScannersList(); return scanners; ''') return {"scanners":scanners}
-
api/scan
This API scans a document and returns the base64 result.
@app.route('/api/scan') def scan(): resolution = request.args.get('resolution', '300') selected_index = request.args.get('selectedIndex', '0') pixelType = request.args.get('pixelType', '0') js = ''' const cb = arguments[arguments.length - 1]; var options = {}; options.showUI = false; options.resolution = '''+resolution+'''; options.selectedIndex = '''+selected_index+'''; options.pixelType = '''+pixelType+'''; Scan(options,cb); ''' print(js) result = browser.execute_async_script(js); print(result) if result != False: return {"success":True, "base64":result} else: return {"success":False}
Write a Document Scanning Web App to Use the HTTP Interfaces
We can write a web app to use the HTTP interfaces. The final result looks like this:
On PC:
On iPhone:
Source Code
Check out the source code to have a try: