How to Scan Documents from the Command Line

In this article, we are going to talk about how to scan documents from the command line (CLI), which allows scanning and saving documents to be automated and/or scripted.

There are different APIs to access document scanners and here is a comparison table about them.

Feature TWAIN WIA (Windows Image Acquisition) SANE (Scanner Access Now Easy) ICA (Image Capture Architecture) eSCL
Developer TWAIN Working Group Microsoft SANE Open-Source Community Apple Mopria
Operating Systems Windows, macOS, Linux (partial) Windows Linux, macOS, Unix-like macOS, iOS Cross-platform (Windows/macOS/Linux/mobile)
Supported Scanners Broad Less broad Less broad (community support) Less broad Only modern network scanners/MFPs
Functionality Advanced controls (ADF, barcode detection, etc) Basic controls like color mode Medium-to-advanced controls Basic controls like color mode Basic controls like color mode

We are going to use all the APIs to scan documents from the command line. Since only SANE provides a command-line tool and the others do not, we need to write command-line tools to use the other APIs.

SANE

SANE has a command-line tool scanimage. Here is its basic usage:

  1. List connected scanners.

    scanimage -L
    
  2. Acquire an image with a specified scanner.

    scanimage -d "scanner name" -o out.png
    

Command line tools of the other APIs we are going to write will have the same usage.

TWAIN

The TWAIN interface is implemented with C++ and has a Python library. We are going to use Python to write the command-line tool.

Here are the key parts:

  1. Import the library.

    import twain
    
  2. List scanners.

    with twain.SourceManager() as sm:
        for source in sm.source_list:
            print(source)
    
  3. Scan with a scanner.

    from PIL import Image
    from io import BytesIO
    with twain.SourceManager() as sm:
       src = sm.open_source("scanner_name")
       src.request_acquire(show_ui=False, modal_ui=False)
       (handle, remaining_count) = src.xfer_image_natively()
       bmp_bytes = twain.dib_to_bm_file(handle)
       img = Image.open(BytesIO(bmp_bytes), formats=["bmp"])
       img.save("output_path")
    

WIA

WIA provides APIs as well as a COM layer. We are going to use Python and COM to use WIA.

Here are the key parts:

  1. Import libraries.

    from PIL import Image
    import pythoncom
    from win32com.client import Dispatch
    
  2. List scanners.

    manager = Dispatch("WIA.DeviceManager")
    devices = manager.DeviceInfos
    print("Available scanners:")
    for i in range(1, devices.Count + 1):
        device = devices.Item(i)
        # Check if the device is a scanner (Type = 1)
        if device.Type == 1:
            print(f"  Name: {device.Properties['Name'].Value}")
            print(f"  ID: {device.DeviceID}")
            print(f"  Description: {device.Properties['Description'].Value}")
            print("  ----------------")
    
  3. Scan with a scanner.

    wia = Dispatch("WIA.CommonDialog")
    manager = Dispatch("WIA.DeviceManager")
       
    devices = manager.DeviceInfos
    selected_device = None
    scanner_name = "target scanner name"
    for i in range(1, devices.Count + 1):
        device = devices.Item(i)
        if device.Type == 1 and device.Properties['Name'].Value == scanner_name:
            selected_device = device.Connect() # Select the scanner by name
            break
               
    img = None
    if selected_device is None:
        img = wia.ShowAcquireImage()  # Show scanning dialog with scanner selection
    else:
        img = wia.ShowTransfer(selected_device.Items[1])  # Transfer the scanned image using the selected scanner
       
    #save the image
    pil_img = Image.fromarray(img) 
    pil_img.save(output_path)
    

eSCL

eSCL is a RESTful interface. The network scanners broadcast themselves via Bonjour and the client can find them and send HTTP requests to scan documents. We are going to use Python as well to write the scanning tool.

  1. Import the libraries.

    from zeroconf import ServiceBrowser, Zeroconf
    from requests import get as requests_get, post as requests_post
    
  2. List scanners by detecting Bonjour services whose type is _uscan._tcp.local..

    class ESCLScannerListener:
        def __init__(self):
            self.scanners = []
    
        def add_service(self, zeroconf, type, name):
            info = zeroconf.get_service_info(type, name)
            if info:
                addresses = ["%s:%d" % (addr, info.port) for addr in info.addresses]
                scanner_info = {
                    'name': name,
                    'type': type,
                    'addresses': info.addresses,
                    'port': info.port,
                    'properties': info.properties
                }
                self.scanners.append(scanner_info)
    
        def remove_service(self, zeroconf, type, name):
            print(f"Scanner removed: {name}")
               
    def discover_escl_scanners(timeout=2):
        zeroconf = Zeroconf()
        listener = ESCLScannerListener()
        browser = ServiceBrowser(zeroconf, "_uscan._tcp.local.", listener)
        print(f"Discovering ESCL scanners for {timeout} seconds...")
        time.sleep(timeout)
        zeroconf.close()
        return listener.scanners
    
  3. Scan with a scanner. The scanning configuration is expressed in XML.

    def scan(scanner_address, output_path="scanned.jpg"):
        xml = '''<scan:ScanSettings xmlns:scan="http://schemas.hp.com/imaging/escl/2011/05/03" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/" xmlns:dd3="http://www.hp.com/schemas/imaging/con/dictionaries/2009/04/06" xmlns:fw="http://www.hp.com/schemas/imaging/con/firewall/2011/01/05" xmlns:scc="http://schemas.hp.com/imaging/escl/2011/05/03" xmlns:pwg="http://www.pwg.org/schemas/2010/12/sm"><pwg:Version>2.1</pwg:Version><scan:Intent>Photo</scan:Intent><pwg:ScanRegions><pwg:ScanRegion><pwg:Height>3300</pwg:Height><pwg:Width>2550</pwg:Width><pwg:XOffset>0</pwg:XOffset><pwg:YOffset>0</pwg:YOffset></pwg:ScanRegion></pwg:ScanRegions><pwg:InputSource>Platen</pwg:InputSource><scan:DocumentFormatExt>image/jpeg</scan:DocumentFormatExt><scan:XResolution>300</scan:XResolution><scan:YResolution>300</scan:YResolution><scan:ColorMode>Grayscale8</scan:ColorMode><scan:CompressionFactor>25</scan:CompressionFactor><scan:Brightness>1000</scan:Brightness><scan:Contrast>1000</scan:Contrast></scan:ScanSettings>'''
    
        resp = requests_post('http://{0}/eSCL/ScanJobs'.format(scanner_address), data=xml, headers={'Content-Type': 'text/xml'})
        if resp.status_code == 201:
            url = '{0}/NextDocument'.format(resp.headers['Location'])
            r = requests_get(url) 
            with open(output_path,'wb') as f:
                f.write(r.content)
    

ICA

Using the Image Capture API is a bit complicated, we are going to create a Swift command-line project to implement the tool.

Here are the key parts:

  1. Create a scanner manager class to list the scanners.

    class ScannerManager: NSObject, ICDeviceBrowserDelegate {
        private var deviceBrowser: ICDeviceBrowser!
        private var scanners: [ICScannerDevice] = []
        private var currentScanner: ICScannerDevice?
        private var scanCompletionHandler: ((Result<URL, Error>) -> Void)?
        private var targetURL: URL?
           
        override init() {
            super.init()
            setupDeviceBrowser()
        }
           
        private func setupDeviceBrowser() {
            deviceBrowser = ICDeviceBrowser()
            deviceBrowser.delegate = self
            let mask = ICDeviceTypeMask(rawValue:
                        ICDeviceTypeMask.scanner.rawValue |
                        ICDeviceLocationTypeMask.local.rawValue |
                        ICDeviceLocationTypeMask.bonjour.rawValue |
                        ICDeviceLocationTypeMask.shared.rawValue)
            deviceBrowser.browsedDeviceTypeMask = mask!
            deviceBrowser.start()
        }
           
        func listScanners(completion: @escaping ([ICScannerDevice]) -> Void) {
            DispatchQueue.main.asyncAfter(deadline: .now() + 1) {
                completion(self.scanners)
            }
        }
           
        // MARK: - ICDeviceBrowserDelegate
           
        func deviceBrowser(_ browser: ICDeviceBrowser, didAdd device: ICDevice, moreComing: Bool) {
            guard let scanner = device as? ICScannerDevice else { return }
            scanners.append(scanner)
        }
           
        func deviceBrowser(_ browser: ICDeviceBrowser, didRemove device: ICDevice, moreGoing: Bool) {
            if let index = scanners.firstIndex(where: { $0 == device }) {
                scanners.remove(at: index)
            }
        }
    }
    
  2. Let the manager class inherit ICScannerDeviceDelegate and add the scanning-related functions.

    func device(_ device: ICDevice, didCloseSessionWithError error: (any Error)?) {
        print("did close")
    }
    
    func didRemove(_ device: ICDevice) {
        print("did remove")
    }
    
    func device(_ device: ICDevice, didOpenSessionWithError error: (any Error)?) {
        print("did open")
        DispatchQueue.main.asyncAfter(deadline: .now() + 1) { [weak self] in
            guard let self = self else { return }
            guard let scanner = currentScanner else { return }
            scanner.transferMode = .fileBased
            scanner.downloadsDirectory = URL(fileURLWithPath: NSTemporaryDirectory())
            scanner.documentName = "scan"
            scanner.documentUTI = kUTTypeJPEG as String
            if let functionalUnit = scanner.selectedFunctionalUnit as? ICScannerFunctionalUnit {
                let resolutionIndex = functionalUnit.supportedResolutions.integerGreaterThanOrEqualTo(300) ?? functionalUnit.supportedResolutions.last
                if let resolutionIndex = resolutionIndex ?? functionalUnit.supportedResolutions.last {
                    functionalUnit.resolution = resolutionIndex
                }
                   
                let a4Width: CGFloat = 210.0 // mm
                let a4Height: CGFloat = 297.0 // mm
                let widthInPoints = a4Width * 72.0 / 25.4 // convert to point
                let heightInPoints = a4Height * 72.0 / 25.4
                   
                functionalUnit.scanArea = NSMakeRect(0, 0, widthInPoints, heightInPoints)
                functionalUnit.pixelDataType = .RGB
                functionalUnit.bitDepth = .depth8Bits
    
                scanner.requestScan()
            }
        }
    }
    
    // MARK: - ICScannerDeviceDelegate
    
    func scannerDevice(_ scanner: ICScannerDevice, didScanTo url: URL) {
        print("did scan to")
        print(url.absoluteString)
        guard let targetURL = targetURL else {
            scanCompletionHandler?(.failure(NSError(domain: "ScannerError", code: -2, userInfo: [NSLocalizedDescriptionKey: "No target URL set"])))
            return
        }
        do {
            try FileManager.default.moveItem(at: url, to: targetURL)
            scanCompletionHandler?(.success(targetURL))
        } catch {
            scanCompletionHandler?(.failure(error))
        }
    }
    
    // MARK: - Scan Operations
    
    func startScan(scanner: ICScannerDevice, outputPath: String, completion: @escaping (Result<URL, Error>) -> Void) {
        currentScanner = scanner
        scanCompletionHandler = completion
        targetURL = URL(fileURLWithPath: outputPath)
           
        scanner.delegate = self
        scanner.requestOpenSession()
    }
    

Dynamic Web TWAIN RESTful API

Dynamic Web TWAIN provides a RESTful API feature for scanning documents using TWAIN, WIA, SANE, ICA or eSCL. You can find its details on this page.

Here are the benefits of using Dynamic Web TWAIN’s RESTful API:

  1. One unified interface to use all the mainstream document scanning APIs with complete scanner controls on different platforms.
  2. Share scanners via the network so that mobile devices can also access document scanners.
  3. We can use programming languages we like to use the document scanning APIs.

Here are the key parts using the Python wrapper:

  1. Import the library and declare several variables. You can apply for a license here.

    from dynamsoftservice import ScannerController, ScannerType
    license_key = "LICENSE-KEY"
    host = "http://127.0.0.1:18622"
    scannerController = ScannerController()
    
  2. List scanners.

    def list_scanners():
        """List all available scanners"""
        scanners = scannerController.getDevices(host)
        return scanners
    
  3. Scan with a scanner.

    def scan_document(output_path="scan.png", scanner_name=None):
        """
        Scan a document using Web TWAIN service and save as image file
           
        Parameters:
            output_path: Path to save scanned image
            scanner_name: Name of specific scanner to use (None shows dialog)
        """
        scanners = list_scanners()
        selectedScanner = None
        if scanner_name is not None:
            for scanner in scanners:
                if scanner['name'] == scanner_name:
                    selectedScanner = scanner
                    break
           
        parameters = {
            "license": license_key
        }
    
        if selectedScanner is not None:
            parameters["device"] = selectedScanner["device"]
               
        parameters["config"] = {
            "IfShowUI": False,
            "PixelType": 2,
            "Resolution": 200,
            "IfFeederEnabled": False,
            "IfDuplexEnabled": False,
        }
           
        job = scannerController.createJob(host, parameters)
        print(job)
        if "jobuid" in job:
            job_id = job["jobuid"]
            stream = scannerController.getImageStreams(host,job_id)[0]
            with open(output_path,"wb") as f:
                f.write(stream)
                f.close()
        return output_path
    

Apart from the RESTful API, Dynamic Web TWAIN also provides a JavaScript library with a dedicated viewer, complete wrapping of the document scanning APIs, local cache and various supplementary APIs to provide a browser-based document scanning solution. Visit its online demo to have a try.

Source Code

Get the source code on GitHub and learn about how to use the command line tools:

https://github.com/tony-xlh/document-scanner-cli/