How OCR Helps Organize and Search Bulk Scanned Documents: A Developer's Guide

When dealing with bulk document scanning, one of the biggest challenges is making scanned documents searchable and organized. Unlike digital-born documents, scanned images are essentially pictures—you can’t search them, copy text from them, or organize them by content. This is where Optical Character Recognition (OCR) becomes invaluable.

In this comprehensive tutorial, we’ll build a web-based document scanner that:

  • Scans documents directly from TWAIN/WIA scanners
  • Automatically extracts text using OCR
  • Stores documents locally with searchable text
  • Provides full-text search with visual highlights
  • Enables text selection from scanned images

Prerequisites

Part 1: Understanding the Architecture

The Challenge with Traditional Approaches

Most document scanning solutions bind the scanner SDK directly to a DOM element, giving you limited control over the UI. For advanced features like search highlighting and text selection overlays, we need a different approach.

Our Solution: Headless DWT + Custom Viewer

We’ll use Dynamic Web TWAIN SDK in “headless” mode:

  • No UI binding to the SDK’s built-in viewer
  • Custom dual-canvas architecture for complete control
  • Separation of concerns: scanning, OCR, display, and storage

Architecture Overview:

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│   Scanner    │───▶│  DWT Object  │───▶│   OCR Kit    │
└──────────────┘    └──────────────┘    └──────────────┘
                            │                    │
                            ▼                    ▼
                    ┌──────────────────────────────┐
                    │       IndexedDB              │
                    │  {image, text, coordinates}  │
                    └──────────────────────────────┘
                            │
                            ▼
                    ┌──────────────────────────────┐
                    │    Custom Canvas Viewer      │
                    │  • Image Layer               │
                    │  • Highlight Layer           │
                    │  • Text Selection Layer      │
                    └──────────────────────────────┘

Part 2: Setting Up the HTML Structure

Create index.html with three key sections:

1. Scanner Controls

<div class="controls">
    <h2>Scanner</h2>
    <select id="source"></select>
    <button onclick="acquireImage()">Scan</button>
    <button onclick="loadImage()">Load</button>
</div>

2. Custom Viewer with Layered Canvases

<div id="custom-viewer">
    <!-- Image rendering layer -->
    <canvas id="image-canvas" width="800" height="600"></canvas>
    
    <!-- Search highlight overlay -->
    <canvas id="highlight-canvas" width="800" height="600"></canvas>
    
    <!-- Selectable text layer -->
    <div id="text-layer"></div>
    
    <!-- Navigation controls -->
    <div class="viewer-controls">
        <button onclick="previousImage()">◀ Prev</button>
        <span id="image-counter">0 / 0</span>
        <button onclick="nextImage()">Next ▶</button>
    </div>
</div>

3. Search Interface

<div class="search-controls">
    <input type="text" id="search-input" placeholder="Search documents...">
    <button onclick="searchText()">Search</button>
    <button onclick="clearSearch()">Clear</button>
    <button onclick="previousMatch()"></button>
    <button onclick="nextMatch()"></button>
</div>

Part 3: CSS - Aligning the Canvas Layers

The key to our dual-canvas approach is perfect alignment using absolute positioning:

#custom-viewer {
    position: relative;
    width: 95%;
    max-height: 85vh;
    margin: 20px auto;
    display: flex;
    justify-content: center;
    align-items: center;
    background: #2a2a2a;
}

#image-canvas {
    position: relative;
    display: block;
    z-index: 1;
}

#highlight-canvas {
    position: absolute;
    top: 0;
    left: 0;
    z-index: 2;
    pointer-events: none; /* Allow clicks to pass through */
}

#text-layer {
    position: absolute;
    top: 0;
    left: 0;
    z-index: 3;
    cursor: text;
}

.text-word {
    position: absolute;
    color: transparent; /* Invisible but selectable */
    user-select: text;
}

Why This Works:

  • #image-canvas is relative - establishes positioning context
  • #highlight-canvas and #text-layer are absolute - overlay on top
  • Z-index layering: image(1) → highlights(2) → text(3)
  • pointer-events: none on highlights allows text selection underneath

Part 4: Initializing Dynamic Web TWAIN (Headless Mode)

// Configuration
Dynamsoft.DWT.ProductKey = "YOUR-LICENSE-KEY";
Dynamsoft.DWT.ResourcesPath = "https://unpkg.com/dwt@19.3.0/dist/";
Dynamsoft.DWT.AutoLoad = false;

let DWTObject = null;

function initDWT() {
    return new Promise((resolve, reject) => {
        Dynamsoft.DWT.CreateDWTObjectEx(
            { WebTwainId: "dwtControl" },
            function (obj) {
                DWTObject = obj;
                
                DWTObject.IfShowUI = false;
                
                // No viewer binding - we use custom canvas
                populateScanners();
                resolve();
            },
            function (err) {
                reject(err);
            }
        );
    });
}

Key Points:

  • No .Bind() call - Complete UI independence
  • DWT only handles scanning and OCR

Part 5: IndexedDB Schema for Document Storage

const DB_NAME = "DocuScanOCR";
const STORE_NAME = "documents";

const dbPromise = new Promise((resolve, reject) => {
    const request = indexedDB.open(DB_NAME, 2);
    
    request.onupgradeneeded = (event) => {
        const db = event.target.result;
        
        if (!db.objectStoreNames.contains(STORE_NAME)) {
            const store = db.createObjectStore(STORE_NAME, { 
                keyPath: "id", 
                autoIncrement: true 
            });
            store.createIndex("timestamp", "timestamp", { unique: false });
        }
    };
    
    request.onsuccess = (event) => resolve(event.target.result);
    request.onerror = (event) => reject(event.target.errorCode);
});

Document Schema:

{
    id: 1,                    // Auto-increment
    imageData: "data:image/png;base64,...",  // Base64 image
    ocrText: "Full extracted text...",       // Searchable text
    words: [                  // Word coordinates for highlighting
        {text: "Hello", x: 120, y: 45, width: 80, height: 25},
        {text: "World", x: 210, y: 45, width: 85, height: 25}
    ],
    timestamp: 1706025600000
}

Part 6: Scanning and OCR Processing

Step 1: Acquire Image from Scanner

async function acquireImage() {
    const devices = await DWTObject.GetDevicesAsync();
    const device = devices[selectedIndex];
    
    await DWTObject.SelectDeviceAsync(device);
    
    const startCount = DWTObject.HowManyImagesInBuffer;
    
    await DWTObject.AcquireImageAsync({
        IfCloseSourceAfterAcquire: true
    });
    
    const endCount = DWTObject.HowManyImagesInBuffer;
    
    // Process newly scanned images
    for (let i = startCount; i < endCount; i++) {
        await processAndSaveImage(i);
    }
    
    // Clear buffer after saving
    DWTObject.RemoveAllImages();
}

Step 2: Convert to Base64

const imageData = await new Promise((resolve, reject) => {
    DWTObject.ConvertToBase64(
        [dwtIndex],
        Dynamsoft.DWT.EnumDWT_ImageType.IT_PNG,
        function(result, indices, type) {
            const dataURL = `data:image/png;base64,${result.getData(0, result.getLength())}`;
            resolve(dataURL);
        },
        function(errorCode, errorString) {
            reject(new Error(errorString));
        }
    );
});

Step 3: Perform OCR and Extract Coordinates

async function processAndSaveImage(dwtIndex) {
    // Convert to base64
    const imageData = await convertToBase64(dwtIndex);
    
    // Perform OCR
    let ocrText = "";
    let words = [];
    
    if (DWTObject.Addon && DWTObject.Addon.OCRKit) {
        const result = await DWTObject.Addon.OCRKit.Recognize(dwtIndex, {
            settings: { language: "eng" }
        });
        
        // Extract text and coordinates
        if (result && result.blocks) {
            result.blocks.forEach((block) => {
                if (block.lines) {
                    block.lines.forEach((line) => {
                        if (line.words) {
                            line.words.forEach((word) => {
                                ocrText += word.value + " ";
                                
                                // Extract geometry and convert to x, y, width, height
                                if (word.geometry) {
                                    const geo = word.geometry;
                                    words.push({
                                        text: word.value,
                                        x: geo.left,
                                        y: geo.top,
                                        width: geo.right - geo.left,
                                        height: geo.bottom - geo.top
                                    });
                                }
                            });
                            ocrText += "\n";
                        }
                    });
                }
            });
        }
    }
    
    // Save to IndexedDB
    await saveDocument(imageData, ocrText.trim(), words);
}

Understanding OCR Geometry:

  • OCRKit returns geometry: {left, top, right, bottom}
  • We convert to {x, y, width, height} for easier use
  • Coordinates are in original image dimensions

Part 7: Rendering Images on Canvas

function displayCurrentImage() {
    const doc = documents[currentImageIndex];
    const img = new Image();
    
    img.onload = function() {
        // Calculate display size (scale to fit viewport)
        const maxWidth = window.innerWidth * 0.9;
        const maxHeight = window.innerHeight * 0.75;
        
        let displayWidth = img.width;
        let displayHeight = img.height;
        
        if (displayWidth > maxWidth || displayHeight > maxHeight) {
            const scale = Math.min(maxWidth / displayWidth, maxHeight / displayHeight);
            displayWidth = Math.floor(displayWidth * scale);
            displayHeight = Math.floor(displayHeight * scale);
        }
        
        // Resize both canvases
        imageCanvas.width = displayWidth;
        imageCanvas.height = displayHeight;
        highlightCanvas.width = displayWidth;
        highlightCanvas.height = displayHeight;
        
        // Draw scaled image
        imageCtx.clearRect(0, 0, imageCanvas.width, imageCanvas.height);
        imageCtx.drawImage(img, 0, 0, displayWidth, displayHeight);
        
        // Calculate scale factors for coordinate transformation
        const scaleX = displayWidth / img.width;
        const scaleY = displayHeight / img.height;
        
        // Render text layer and highlights
        renderTextLayer(doc.words, scaleX, scaleY);
    };
    
    img.src = doc.imageData;
}

Critical Concept: Coordinate Transformation

OCR coordinates are in original image dimensions, but we display scaled images. We must transform coordinates:

const scaleX = displayWidth / originalWidth;
const scaleY = displayHeight / originalHeight;

const displayX = ocrX * scaleX;
const displayY = ocrY * scaleY;

Part 8: Implementing Text Selection

Create invisible but selectable text spans positioned over the image:

function renderTextLayer(words, scaleX = 1, scaleY = 1) {
    const textLayer = document.getElementById('text-layer');
    
    // Match canvas dimensions
    textLayer.style.width = imageCanvas.width + 'px';
    textLayer.style.height = imageCanvas.height + 'px';
    
    textLayer.innerHTML = '';
    
    words.forEach(word => {
        if (word.width > 0 && word.height > 0) {
            const span = document.createElement('span');
            span.textContent = word.text;
            span.className = 'text-word';
            
            // Transform coordinates
            span.style.left = (word.x * scaleX) + 'px';
            span.style.top = (word.y * scaleY) + 'px';
            span.style.width = (word.width * scaleX) + 'px';
            span.style.height = (word.height * scaleY) + 'px';
            
            // Scale font size
            span.style.fontSize = (word.height * scaleY * 0.8) + 'px';
            
            // Invisible but selectable
            span.style.color = 'transparent';
            span.style.userSelect = 'text';
            
            textLayer.appendChild(span);
        }
    });
}

How It Works:

  1. Each word becomes a <span> positioned absolutely
  2. Font size matches original text height
  3. color: transparent makes text invisible
  4. user-select: text enables selection
  5. User can click and drag to select text like a normal document!

Part 9: Full-Text Search with Highlighting

Search Implementation

async function searchText() {
    const query = document.getElementById("search-input").value.trim().toLowerCase();
    
    searchMatches = [];
    
    documents.forEach((doc, docIndex) => {
        if (doc.ocrText.toLowerCase().includes(query)) {
            const matchingWords = [];
            
            doc.words.forEach(word => {
                if (word.text.toLowerCase().includes(query)) {
                    matchingWords.push(word);
                }
            });
            
            if (matchingWords.length > 0) {
                searchMatches.push({ docIndex, words: matchingWords });
            }
        }
    });
    
    if (searchMatches.length > 0) {
        currentMatchIndex = 0;
        navigateToMatch(0);
    }
}

Drawing Highlights on Canvas

function drawHighlightsOnCanvas(words, scaleX = 1, scaleY = 1) {
    highlightCtx.clearRect(0, 0, highlightCanvas.width, highlightCanvas.height);
    
    words.forEach((word) => {
        if (word.width > 0 && word.height > 0) {
            // Transform coordinates
            const x = word.x * scaleX;
            const y = word.y * scaleY;
            const width = word.width * scaleX;
            const height = word.height * scaleY;
            
            // Draw yellow semi-transparent box
            highlightCtx.fillStyle = 'rgba(255, 255, 0, 0.4)';
            highlightCtx.fillRect(x, y, width, height);
            
            // Draw border for emphasis
            highlightCtx.strokeStyle = 'rgba(255, 180, 0, 1)';
            highlightCtx.lineWidth = 3;
            highlightCtx.strokeRect(x, y, width, height);
        }
    });
}

Search Navigation

function navigateToMatch(index) {
    currentMatchIndex = index;
    const match = searchMatches[index];
    
    // Navigate to the document containing the match
    currentImageIndex = match.docIndex;
    displayCurrentImage();

}

function nextMatch() {
    if (searchMatches.length === 0) return;
    const nextIndex = (currentMatchIndex + 1) % searchMatches.length;
    navigateToMatch(nextIndex);
}

function previousMatch() {
    if (searchMatches.length === 0) return;
    const prevIndex = (currentMatchIndex - 1 + searchMatches.length) % searchMatches.length;
    navigateToMatch(prevIndex);
}

Part 10: Document Management

async function removeSelected() {
    const doc = documents[currentImageIndex];
    await deleteDocument(doc.id);
    await loadDocumentsFromDB();
    
    // Adjust index if needed
    if (currentImageIndex >= documents.length && documents.length > 0) {
        currentImageIndex = documents.length - 1;
    }
    
    clearHighlights();
}

async function removeAll() {
    if (!confirm("Remove all documents?")) return;
    
    await clearAllDocuments();
    await loadDocumentsFromDB();
    
    clearHighlights();
    currentImageIndex = 0;
}

Running the Application Locally

To run the application locally, follow these steps:

  1. Serve via HTTP server (required for TWAIN SDK):
    # Python 3
    python -m http.server 8000
       
    # Node.js
    npx http-server -p 8000
    
  2. Open http://localhost:8000 in a modern web browser.

    web document scanner with OCR and text search

Source Code

https://github.com/yushulx/web-twain-document-scan-management/tree/main/examples/ocr_search