How OCR Helps Organize and Search Bulk Scanned Documents: A Developer's Guide
When dealing with bulk document scanning, one of the biggest challenges is making scanned documents searchable and organized. Unlike digital-born documents, scanned images are essentially pictures—you can’t search them, copy text from them, or organize them by content. This is where Optical Character Recognition (OCR) becomes invaluable.
In this comprehensive tutorial, we’ll build a web-based document scanner that:
- Scans documents directly from TWAIN/WIA scanners
- Automatically extracts text using OCR
- Stores documents locally with searchable text
- Provides full-text search with visual highlights
- Enables text selection from scanned images
Demo Video: Web Document Scanner with OCR and Text Search
Prerequisites
- 30-day free trial license
- OCR requires installing the OCR add-on (Windows Only). Download
DynamicWebTWAINOCRResources.zipfrom Dynamsoft’s website and run the installer as administrator.
Part 1: Understanding the Architecture
The Challenge with Traditional Approaches
Most document scanning solutions bind the scanner SDK directly to a DOM element, giving you limited control over the UI. For advanced features like search highlighting and text selection overlays, we need a different approach.
Our Solution: Headless DWT + Custom Viewer
We’ll use Dynamic Web TWAIN SDK in “headless” mode:
- No UI binding to the SDK’s built-in viewer
- Custom dual-canvas architecture for complete control
- Separation of concerns: scanning, OCR, display, and storage
Architecture Overview:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Scanner │───▶│ DWT Object │───▶│ OCR Kit │
└──────────────┘ └──────────────┘ └──────────────┘
│ │
▼ ▼
┌──────────────────────────────┐
│ IndexedDB │
│ {image, text, coordinates} │
└──────────────────────────────┘
│
▼
┌──────────────────────────────┐
│ Custom Canvas Viewer │
│ • Image Layer │
│ • Highlight Layer │
│ • Text Selection Layer │
└──────────────────────────────┘
Part 2: Setting Up the HTML Structure
Create index.html with three key sections:
1. Scanner Controls
<div class="controls">
<h2>Scanner</h2>
<select id="source"></select>
<button onclick="acquireImage()">Scan</button>
<button onclick="loadImage()">Load</button>
</div>
2. Custom Viewer with Layered Canvases
<div id="custom-viewer">
<!-- Image rendering layer -->
<canvas id="image-canvas" width="800" height="600"></canvas>
<!-- Search highlight overlay -->
<canvas id="highlight-canvas" width="800" height="600"></canvas>
<!-- Selectable text layer -->
<div id="text-layer"></div>
<!-- Navigation controls -->
<div class="viewer-controls">
<button onclick="previousImage()">◀ Prev</button>
<span id="image-counter">0 / 0</span>
<button onclick="nextImage()">Next ▶</button>
</div>
</div>
3. Search Interface
<div class="search-controls">
<input type="text" id="search-input" placeholder="Search documents...">
<button onclick="searchText()">Search</button>
<button onclick="clearSearch()">Clear</button>
<button onclick="previousMatch()">◀</button>
<button onclick="nextMatch()">▶</button>
</div>
Part 3: CSS - Aligning the Canvas Layers
The key to our dual-canvas approach is perfect alignment using absolute positioning:
#custom-viewer {
position: relative;
width: 95%;
max-height: 85vh;
margin: 20px auto;
display: flex;
justify-content: center;
align-items: center;
background: #2a2a2a;
}
#image-canvas {
position: relative;
display: block;
z-index: 1;
}
#highlight-canvas {
position: absolute;
top: 0;
left: 0;
z-index: 2;
pointer-events: none; /* Allow clicks to pass through */
}
#text-layer {
position: absolute;
top: 0;
left: 0;
z-index: 3;
cursor: text;
}
.text-word {
position: absolute;
color: transparent; /* Invisible but selectable */
user-select: text;
}
Why This Works:
#image-canvasisrelative- establishes positioning context#highlight-canvasand#text-layerareabsolute- overlay on top- Z-index layering: image(1) → highlights(2) → text(3)
pointer-events: noneon highlights allows text selection underneath
Part 4: Initializing Dynamic Web TWAIN (Headless Mode)
// Configuration
Dynamsoft.DWT.ProductKey = "YOUR-LICENSE-KEY";
Dynamsoft.DWT.ResourcesPath = "https://unpkg.com/dwt@19.3.0/dist/";
Dynamsoft.DWT.AutoLoad = false;
let DWTObject = null;
function initDWT() {
return new Promise((resolve, reject) => {
Dynamsoft.DWT.CreateDWTObjectEx(
{ WebTwainId: "dwtControl" },
function (obj) {
DWTObject = obj;
DWTObject.IfShowUI = false;
// No viewer binding - we use custom canvas
populateScanners();
resolve();
},
function (err) {
reject(err);
}
);
});
}
Key Points:
- No
.Bind()call - Complete UI independence - DWT only handles scanning and OCR
Part 5: IndexedDB Schema for Document Storage
const DB_NAME = "DocuScanOCR";
const STORE_NAME = "documents";
const dbPromise = new Promise((resolve, reject) => {
const request = indexedDB.open(DB_NAME, 2);
request.onupgradeneeded = (event) => {
const db = event.target.result;
if (!db.objectStoreNames.contains(STORE_NAME)) {
const store = db.createObjectStore(STORE_NAME, {
keyPath: "id",
autoIncrement: true
});
store.createIndex("timestamp", "timestamp", { unique: false });
}
};
request.onsuccess = (event) => resolve(event.target.result);
request.onerror = (event) => reject(event.target.errorCode);
});
Document Schema:
{
id: 1, // Auto-increment
imageData: "data:image/png;base64,...", // Base64 image
ocrText: "Full extracted text...", // Searchable text
words: [ // Word coordinates for highlighting
{text: "Hello", x: 120, y: 45, width: 80, height: 25},
{text: "World", x: 210, y: 45, width: 85, height: 25}
],
timestamp: 1706025600000
}
Part 6: Scanning and OCR Processing
Step 1: Acquire Image from Scanner
async function acquireImage() {
const devices = await DWTObject.GetDevicesAsync();
const device = devices[selectedIndex];
await DWTObject.SelectDeviceAsync(device);
const startCount = DWTObject.HowManyImagesInBuffer;
await DWTObject.AcquireImageAsync({
IfCloseSourceAfterAcquire: true
});
const endCount = DWTObject.HowManyImagesInBuffer;
// Process newly scanned images
for (let i = startCount; i < endCount; i++) {
await processAndSaveImage(i);
}
// Clear buffer after saving
DWTObject.RemoveAllImages();
}
Step 2: Convert to Base64
const imageData = await new Promise((resolve, reject) => {
DWTObject.ConvertToBase64(
[dwtIndex],
Dynamsoft.DWT.EnumDWT_ImageType.IT_PNG,
function(result, indices, type) {
const dataURL = `data:image/png;base64,${result.getData(0, result.getLength())}`;
resolve(dataURL);
},
function(errorCode, errorString) {
reject(new Error(errorString));
}
);
});
Step 3: Perform OCR and Extract Coordinates
async function processAndSaveImage(dwtIndex) {
// Convert to base64
const imageData = await convertToBase64(dwtIndex);
// Perform OCR
let ocrText = "";
let words = [];
if (DWTObject.Addon && DWTObject.Addon.OCRKit) {
const result = await DWTObject.Addon.OCRKit.Recognize(dwtIndex, {
settings: { language: "eng" }
});
// Extract text and coordinates
if (result && result.blocks) {
result.blocks.forEach((block) => {
if (block.lines) {
block.lines.forEach((line) => {
if (line.words) {
line.words.forEach((word) => {
ocrText += word.value + " ";
// Extract geometry and convert to x, y, width, height
if (word.geometry) {
const geo = word.geometry;
words.push({
text: word.value,
x: geo.left,
y: geo.top,
width: geo.right - geo.left,
height: geo.bottom - geo.top
});
}
});
ocrText += "\n";
}
});
}
});
}
}
// Save to IndexedDB
await saveDocument(imageData, ocrText.trim(), words);
}
Understanding OCR Geometry:
- OCRKit returns
geometry: {left, top, right, bottom} - We convert to
{x, y, width, height}for easier use - Coordinates are in original image dimensions
Part 7: Rendering Images on Canvas
function displayCurrentImage() {
const doc = documents[currentImageIndex];
const img = new Image();
img.onload = function() {
// Calculate display size (scale to fit viewport)
const maxWidth = window.innerWidth * 0.9;
const maxHeight = window.innerHeight * 0.75;
let displayWidth = img.width;
let displayHeight = img.height;
if (displayWidth > maxWidth || displayHeight > maxHeight) {
const scale = Math.min(maxWidth / displayWidth, maxHeight / displayHeight);
displayWidth = Math.floor(displayWidth * scale);
displayHeight = Math.floor(displayHeight * scale);
}
// Resize both canvases
imageCanvas.width = displayWidth;
imageCanvas.height = displayHeight;
highlightCanvas.width = displayWidth;
highlightCanvas.height = displayHeight;
// Draw scaled image
imageCtx.clearRect(0, 0, imageCanvas.width, imageCanvas.height);
imageCtx.drawImage(img, 0, 0, displayWidth, displayHeight);
// Calculate scale factors for coordinate transformation
const scaleX = displayWidth / img.width;
const scaleY = displayHeight / img.height;
// Render text layer and highlights
renderTextLayer(doc.words, scaleX, scaleY);
};
img.src = doc.imageData;
}
Critical Concept: Coordinate Transformation
OCR coordinates are in original image dimensions, but we display scaled images. We must transform coordinates:
const scaleX = displayWidth / originalWidth;
const scaleY = displayHeight / originalHeight;
const displayX = ocrX * scaleX;
const displayY = ocrY * scaleY;
Part 8: Implementing Text Selection
Create invisible but selectable text spans positioned over the image:
function renderTextLayer(words, scaleX = 1, scaleY = 1) {
const textLayer = document.getElementById('text-layer');
// Match canvas dimensions
textLayer.style.width = imageCanvas.width + 'px';
textLayer.style.height = imageCanvas.height + 'px';
textLayer.innerHTML = '';
words.forEach(word => {
if (word.width > 0 && word.height > 0) {
const span = document.createElement('span');
span.textContent = word.text;
span.className = 'text-word';
// Transform coordinates
span.style.left = (word.x * scaleX) + 'px';
span.style.top = (word.y * scaleY) + 'px';
span.style.width = (word.width * scaleX) + 'px';
span.style.height = (word.height * scaleY) + 'px';
// Scale font size
span.style.fontSize = (word.height * scaleY * 0.8) + 'px';
// Invisible but selectable
span.style.color = 'transparent';
span.style.userSelect = 'text';
textLayer.appendChild(span);
}
});
}
How It Works:
- Each word becomes a
<span>positioned absolutely - Font size matches original text height
color: transparentmakes text invisibleuser-select: textenables selection- User can click and drag to select text like a normal document!
Part 9: Full-Text Search with Highlighting
Search Implementation
async function searchText() {
const query = document.getElementById("search-input").value.trim().toLowerCase();
searchMatches = [];
documents.forEach((doc, docIndex) => {
if (doc.ocrText.toLowerCase().includes(query)) {
const matchingWords = [];
doc.words.forEach(word => {
if (word.text.toLowerCase().includes(query)) {
matchingWords.push(word);
}
});
if (matchingWords.length > 0) {
searchMatches.push({ docIndex, words: matchingWords });
}
}
});
if (searchMatches.length > 0) {
currentMatchIndex = 0;
navigateToMatch(0);
}
}
Drawing Highlights on Canvas
function drawHighlightsOnCanvas(words, scaleX = 1, scaleY = 1) {
highlightCtx.clearRect(0, 0, highlightCanvas.width, highlightCanvas.height);
words.forEach((word) => {
if (word.width > 0 && word.height > 0) {
// Transform coordinates
const x = word.x * scaleX;
const y = word.y * scaleY;
const width = word.width * scaleX;
const height = word.height * scaleY;
// Draw yellow semi-transparent box
highlightCtx.fillStyle = 'rgba(255, 255, 0, 0.4)';
highlightCtx.fillRect(x, y, width, height);
// Draw border for emphasis
highlightCtx.strokeStyle = 'rgba(255, 180, 0, 1)';
highlightCtx.lineWidth = 3;
highlightCtx.strokeRect(x, y, width, height);
}
});
}
Search Navigation
function navigateToMatch(index) {
currentMatchIndex = index;
const match = searchMatches[index];
// Navigate to the document containing the match
currentImageIndex = match.docIndex;
displayCurrentImage();
}
function nextMatch() {
if (searchMatches.length === 0) return;
const nextIndex = (currentMatchIndex + 1) % searchMatches.length;
navigateToMatch(nextIndex);
}
function previousMatch() {
if (searchMatches.length === 0) return;
const prevIndex = (currentMatchIndex - 1 + searchMatches.length) % searchMatches.length;
navigateToMatch(prevIndex);
}
Part 10: Document Management
async function removeSelected() {
const doc = documents[currentImageIndex];
await deleteDocument(doc.id);
await loadDocumentsFromDB();
// Adjust index if needed
if (currentImageIndex >= documents.length && documents.length > 0) {
currentImageIndex = documents.length - 1;
}
clearHighlights();
}
async function removeAll() {
if (!confirm("Remove all documents?")) return;
await clearAllDocuments();
await loadDocumentsFromDB();
clearHighlights();
currentImageIndex = 0;
}
Running the Application Locally
To run the application locally, follow these steps:
- Serve via HTTP server (required for TWAIN SDK):
# Python 3 python -m http.server 8000 # Node.js npx http-server -p 8000 -
Open
http://localhost:8000in a modern web browser.
Source Code
https://github.com/yushulx/web-twain-document-scan-management/tree/main/examples/ocr_search