How to Scan Documents from the Command Line
In this article, we are going to talk about how to scan documents from the command line (CLI), which allows scanning and saving documents to be automated and/or scripted.
There are different APIs to access document scanners and here is a comparison table about them.
| Feature | TWAIN | WIA (Windows Image Acquisition) | SANE (Scanner Access Now Easy) | ICA (Image Capture Architecture) | eSCL |
|---|---|---|---|---|---|
| Developer | TWAIN Working Group | Microsoft | SANE Open-Source Community | Apple | Mopria |
| Operating Systems | Windows, macOS, Linux (partial) | Windows | Linux, macOS, Unix-like | macOS, iOS | Cross-platform (Windows/macOS/Linux/mobile) |
| Supported Scanners | Broad | Less broad | Less broad (community support) | Less broad | Only modern network scanners/MFPs |
| Functionality | Advanced controls (ADF, barcode detection, etc) | Basic controls like color mode | Medium-to-advanced controls | Basic controls like color mode | Basic controls like color mode |
We are going to use all the APIs to scan documents from the command line. Since only SANE provides a command-line tool and the others do not, we need to write command-line tools to use the other APIs.
SANE
SANE has a command-line tool scanimage. Here is its basic usage:
-
List connected scanners.
scanimage -L -
Acquire an image with a specified scanner.
scanimage -d "scanner name" -o out.png
Command line tools of the other APIs we are going to write will have the same usage.
TWAIN
The TWAIN interface is implemented with C++ and has a Python library. We are going to use Python to write the command-line tool.
Here are the key parts:
-
Import the library.
import twain -
List scanners.
with twain.SourceManager() as sm: for source in sm.source_list: print(source) -
Scan with a scanner.
from PIL import Image from io import BytesIO with twain.SourceManager() as sm: src = sm.open_source("scanner_name") src.request_acquire(show_ui=False, modal_ui=False) (handle, remaining_count) = src.xfer_image_natively() bmp_bytes = twain.dib_to_bm_file(handle) img = Image.open(BytesIO(bmp_bytes), formats=["bmp"]) img.save("output_path")
WIA
WIA provides APIs as well as a COM layer. We are going to use Python and COM to use WIA.
Here are the key parts:
-
Import libraries.
from PIL import Image import pythoncom from win32com.client import Dispatch -
List scanners.
manager = Dispatch("WIA.DeviceManager") devices = manager.DeviceInfos print("Available scanners:") for i in range(1, devices.Count + 1): device = devices.Item(i) # Check if the device is a scanner (Type = 1) if device.Type == 1: print(f" Name: {device.Properties['Name'].Value}") print(f" ID: {device.DeviceID}") print(f" Description: {device.Properties['Description'].Value}") print(" ----------------") -
Scan with a scanner.
wia = Dispatch("WIA.CommonDialog") manager = Dispatch("WIA.DeviceManager") devices = manager.DeviceInfos selected_device = None scanner_name = "target scanner name" for i in range(1, devices.Count + 1): device = devices.Item(i) if device.Type == 1 and device.Properties['Name'].Value == scanner_name: selected_device = device.Connect() # Select the scanner by name break img = None if selected_device is None: img = wia.ShowAcquireImage() # Show scanning dialog with scanner selection else: img = wia.ShowTransfer(selected_device.Items[1]) # Transfer the scanned image using the selected scanner #save the image pil_img = Image.fromarray(img) pil_img.save(output_path)
eSCL
eSCL is a RESTful interface. The network scanners broadcast themselves via Bonjour and the client can find them and send HTTP requests to scan documents. We are going to use Python as well to write the scanning tool.
-
Import the libraries.
from zeroconf import ServiceBrowser, Zeroconf from requests import get as requests_get, post as requests_post -
List scanners by detecting Bonjour services whose type is
_uscan._tcp.local..class ESCLScannerListener: def __init__(self): self.scanners = [] def add_service(self, zeroconf, type, name): info = zeroconf.get_service_info(type, name) if info: addresses = ["%s:%d" % (addr, info.port) for addr in info.addresses] scanner_info = { 'name': name, 'type': type, 'addresses': info.addresses, 'port': info.port, 'properties': info.properties } self.scanners.append(scanner_info) def remove_service(self, zeroconf, type, name): print(f"Scanner removed: {name}") def discover_escl_scanners(timeout=2): zeroconf = Zeroconf() listener = ESCLScannerListener() browser = ServiceBrowser(zeroconf, "_uscan._tcp.local.", listener) print(f"Discovering ESCL scanners for {timeout} seconds...") time.sleep(timeout) zeroconf.close() return listener.scanners -
Scan with a scanner. The scanning configuration is expressed in XML.
def scan(scanner_address, output_path="scanned.jpg"): xml = '''<scan:ScanSettings xmlns:scan="http://schemas.hp.com/imaging/escl/2011/05/03" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/" xmlns:dd3="http://www.hp.com/schemas/imaging/con/dictionaries/2009/04/06" xmlns:fw="http://www.hp.com/schemas/imaging/con/firewall/2011/01/05" xmlns:scc="http://schemas.hp.com/imaging/escl/2011/05/03" xmlns:pwg="http://www.pwg.org/schemas/2010/12/sm"><pwg:Version>2.1</pwg:Version><scan:Intent>Photo</scan:Intent><pwg:ScanRegions><pwg:ScanRegion><pwg:Height>3300</pwg:Height><pwg:Width>2550</pwg:Width><pwg:XOffset>0</pwg:XOffset><pwg:YOffset>0</pwg:YOffset></pwg:ScanRegion></pwg:ScanRegions><pwg:InputSource>Platen</pwg:InputSource><scan:DocumentFormatExt>image/jpeg</scan:DocumentFormatExt><scan:XResolution>300</scan:XResolution><scan:YResolution>300</scan:YResolution><scan:ColorMode>Grayscale8</scan:ColorMode><scan:CompressionFactor>25</scan:CompressionFactor><scan:Brightness>1000</scan:Brightness><scan:Contrast>1000</scan:Contrast></scan:ScanSettings>''' resp = requests_post('http://{0}/eSCL/ScanJobs'.format(scanner_address), data=xml, headers={'Content-Type': 'text/xml'}) if resp.status_code == 201: url = '{0}/NextDocument'.format(resp.headers['Location']) r = requests_get(url) with open(output_path,'wb') as f: f.write(r.content)
ICA
Using the Image Capture API is a bit complicated, we are going to create a Swift command-line project to implement the tool.
Here are the key parts:
-
Create a scanner manager class to list the scanners.
class ScannerManager: NSObject, ICDeviceBrowserDelegate { private var deviceBrowser: ICDeviceBrowser! private var scanners: [ICScannerDevice] = [] private var currentScanner: ICScannerDevice? private var scanCompletionHandler: ((Result<URL, Error>) -> Void)? private var targetURL: URL? override init() { super.init() setupDeviceBrowser() } private func setupDeviceBrowser() { deviceBrowser = ICDeviceBrowser() deviceBrowser.delegate = self let mask = ICDeviceTypeMask(rawValue: ICDeviceTypeMask.scanner.rawValue | ICDeviceLocationTypeMask.local.rawValue | ICDeviceLocationTypeMask.bonjour.rawValue | ICDeviceLocationTypeMask.shared.rawValue) deviceBrowser.browsedDeviceTypeMask = mask! deviceBrowser.start() } func listScanners(completion: @escaping ([ICScannerDevice]) -> Void) { DispatchQueue.main.asyncAfter(deadline: .now() + 1) { completion(self.scanners) } } // MARK: - ICDeviceBrowserDelegate func deviceBrowser(_ browser: ICDeviceBrowser, didAdd device: ICDevice, moreComing: Bool) { guard let scanner = device as? ICScannerDevice else { return } scanners.append(scanner) } func deviceBrowser(_ browser: ICDeviceBrowser, didRemove device: ICDevice, moreGoing: Bool) { if let index = scanners.firstIndex(where: { $0 == device }) { scanners.remove(at: index) } } } -
Let the manager class inherit
ICScannerDeviceDelegateand add the scanning-related functions.func device(_ device: ICDevice, didCloseSessionWithError error: (any Error)?) { print("did close") } func didRemove(_ device: ICDevice) { print("did remove") } func device(_ device: ICDevice, didOpenSessionWithError error: (any Error)?) { print("did open") DispatchQueue.main.asyncAfter(deadline: .now() + 1) { [weak self] in guard let self = self else { return } guard let scanner = currentScanner else { return } scanner.transferMode = .fileBased scanner.downloadsDirectory = URL(fileURLWithPath: NSTemporaryDirectory()) scanner.documentName = "scan" scanner.documentUTI = kUTTypeJPEG as String if let functionalUnit = scanner.selectedFunctionalUnit as? ICScannerFunctionalUnit { let resolutionIndex = functionalUnit.supportedResolutions.integerGreaterThanOrEqualTo(300) ?? functionalUnit.supportedResolutions.last if let resolutionIndex = resolutionIndex ?? functionalUnit.supportedResolutions.last { functionalUnit.resolution = resolutionIndex } let a4Width: CGFloat = 210.0 // mm let a4Height: CGFloat = 297.0 // mm let widthInPoints = a4Width * 72.0 / 25.4 // convert to point let heightInPoints = a4Height * 72.0 / 25.4 functionalUnit.scanArea = NSMakeRect(0, 0, widthInPoints, heightInPoints) functionalUnit.pixelDataType = .RGB functionalUnit.bitDepth = .depth8Bits scanner.requestScan() } } } // MARK: - ICScannerDeviceDelegate func scannerDevice(_ scanner: ICScannerDevice, didScanTo url: URL) { print("did scan to") print(url.absoluteString) guard let targetURL = targetURL else { scanCompletionHandler?(.failure(NSError(domain: "ScannerError", code: -2, userInfo: [NSLocalizedDescriptionKey: "No target URL set"]))) return } do { try FileManager.default.moveItem(at: url, to: targetURL) scanCompletionHandler?(.success(targetURL)) } catch { scanCompletionHandler?(.failure(error)) } } // MARK: - Scan Operations func startScan(scanner: ICScannerDevice, outputPath: String, completion: @escaping (Result<URL, Error>) -> Void) { currentScanner = scanner scanCompletionHandler = completion targetURL = URL(fileURLWithPath: outputPath) scanner.delegate = self scanner.requestOpenSession() }
Dynamic Web TWAIN RESTful API
Dynamic Web TWAIN provides a RESTful API feature for scanning documents using TWAIN, WIA, SANE, ICA or eSCL. You can find its details on this page.
Here are the benefits of using Dynamic Web TWAIN’s RESTful API:
- One unified interface to use all the mainstream document scanning APIs with complete scanner controls on different platforms.
- Share scanners via the network so that mobile devices can also access document scanners.
- We can use programming languages we like to use the document scanning APIs.
Here are the key parts using the Python wrapper:
-
Import the library and declare several variables. You can apply for a license here.
from dynamsoftservice import ScannerController, ScannerType license_key = "LICENSE-KEY" host = "http://127.0.0.1:18622" scannerController = ScannerController() -
List scanners.
def list_scanners(): """List all available scanners""" scanners = scannerController.getDevices(host) return scanners -
Scan with a scanner.
def scan_document(output_path="scan.png", scanner_name=None): """ Scan a document using Web TWAIN service and save as image file Parameters: output_path: Path to save scanned image scanner_name: Name of specific scanner to use (None shows dialog) """ scanners = list_scanners() selectedScanner = None if scanner_name is not None: for scanner in scanners: if scanner['name'] == scanner_name: selectedScanner = scanner break parameters = { "license": license_key } if selectedScanner is not None: parameters["device"] = selectedScanner["device"] parameters["config"] = { "IfShowUI": False, "PixelType": 2, "Resolution": 200, "IfFeederEnabled": False, "IfDuplexEnabled": False, } job = scannerController.createJob(host, parameters) print(job) if "jobuid" in job: job_id = job["jobuid"] stream = scannerController.getImageStreams(host,job_id)[0] with open(output_path,"wb") as f: f.write(stream) f.close() return output_path
Apart from the RESTful API, Dynamic Web TWAIN also provides a JavaScript library with a dedicated viewer, complete wrapping of the document scanning APIs, local cache and various supplementary APIs to provide a browser-based document scanning solution. Visit its online demo to have a try.
Source Code
Get the source code on GitHub and learn about how to use the command line tools:
https://github.com/tony-xlh/document-scanner-cli/