Building an Auto-Scan Document Processing Solution: Automatic Image Cropping and Barcode Extraction

Feb 03, 2026

Modern enterprises process thousands of documents daily—from shipping labels and invoices to medical records and ID cards. Manual document processing is slow, error-prone, and doesn’t scale. This comprehensive guide demonstrates how to build a production-ready auto-scan document processing solution that automatically detects documents, crops them with perspective correction, and extracts barcode data—all in real-time.

Demo Video: Document Detection and Barcode Extraction

Online Demo

https://yushulx.me/javascript-barcode-qr-code-scanner/examples/document_barcode/

Business Case

The Problem

Manual document processing creates bottlenecks:

Slow processing: Employees spend hours cropping, rotating, and extracting data
Human error: Incorrect data entry, missed barcodes, poor image quality
No scalability: Can’t handle volume spikes (50,000+ documents/month)
High costs: Labor-intensive workflows require continuous staffing

The Solution

An auto-scan system that:

Detects documents automatically using AI edge detection
Crops & straightens documents with perspective correction
Extracts barcode data from 1D/2D barcodes instantly
Processes at scale - handle millions of documents/month

Technical Architecture

System Components

┌─────────────────────────────────────────────────────┐
│           Web Application Interface                  │
├─────────────────────────────────────────────────────┤
│                                                      │
│  ┌─────────────────┐      ┌────────────────────┐   │
│  │  Camera Input   │─────▶│  Document Detector │   │
│  │  (Live Stream)  │      │  (DDN Module)      │   │
│  └─────────────────┘      └──────┬─────────────┘   │
│                                   │                  │
│                           Quad Detected              │
│                                   │                  │
│                          ┌────────▼─────────────┐   │
│                          │  Stability Tracker   │   │
│                          │  (Auto-Capture)      │   │
│                          └────────┬─────────────┘   │
│                                   │                  │
│                          Stable Document             │
│                                   │                  │
│  ┌────────────────────────────────▼──────────────┐  │
│  │     Document Normalizer (DDN)                 │  │
│  │  • Perspective correction                     │  │
│  │  • Image cropping                             │  │
│  │  • Quality enhancement                        │  │
│  └────────────────────────┬──────────────────────┘  │
│                            │                         │
│                     Cropped Image                    │
│                            │                         │
│  ┌────────────────────────▼──────────────────────┐  │
│  │     Barcode Reader (DBR)                      │  │
│  │  • 1D/2D barcode detection                    │  │
│  │  • Multi-format support                       │  │
│  │  • Data extraction                            │  │
│  └────────────────────────┬──────────────────────┘  │
│                            │                         │
│                     ┌──────▼────────┐                │
│                     │  Result Data  │                │
│                     │  • Image      │                │
│                     │  • Barcodes   │                │
│                     └───────────────┘                │
└─────────────────────────────────────────────────────┘

Key Technologies

Document Detection & Normalization (DDN): AI-powered edge detection and perspective correction
Barcode Reader (DBR): Reads 40+ barcode formats (QR, Code 39, Code 128, PDF417, etc.)
Camera Enhancer (DCE): Real-time video streaming with auto-focus optimization
Capture Vision Router (CVR): Orchestrates multi-module workflows

Get Your Trial License

Register for a free trial license
Receive license key via email

Use in your web application:

 // Initialize the SDK with your license key
 await Dynamsoft.License.LicenseManager.initLicense("YOUR-LICENSE-KEY", true);

Step-by-Step Implementation

Let’s build the auto-scan document processing system from scratch using JavaScript for a web-based solution that runs in any modern browser.

Step 1: Project Setup

Create your project structure:

auto-document-scanner/
├── index.html          # Main HTML page
├── app.js             # Application logic
├── styles.css         # Styling
└── README.md          # Documentation

index.html - Basic HTML structure:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Auto Document Scanner</title>
    <link rel="stylesheet" href="styles.css">
    
    <!-- Load Dynamsoft Capture Vision Bundle -->
    <script src="https://cdn.jsdelivr.net/npm/dynamsoft-capture-vision-bundle@3.2.5000/dist/dcv.bundle.min.js"></script>
</head>
<body>
    <div id="app">
        <!-- License activation screen -->
        <div id="license-screen" class="screen">
            <h1>📱 Auto Document Scanner</h1>
            <input type="text" id="license-input" placeholder="Enter license key">
            <button id="activate-btn">Activate & Start</button>
        </div>

        <!-- Camera view (initially hidden) -->
        <div id="camera-screen" class="screen hidden">
            <div id="camera-view"></div>
            <div id="status">Looking for document...</div>
            <button id="capture-btn">Capture</button>
        </div>

        <!-- Results screen -->
        <div id="result-screen" class="screen hidden">
            <h2>Scan Result</h2>
            <img id="cropped-image" alt="Cropped document">
            <div id="barcode-results"></div>
            <button id="scan-next-btn">Scan Next</button>
        </div>
    </div>

    <script src="app.js"></script>
</body>
</html>

Step 2: SDK Initialization & License Activation

app.js - Initialize the SDK:

let cvr = null;
let cameraEnhancer = null;
let cameraView = null;
let isSDKReady = false;

const licenseInput = document.getElementById('license-input');
const activateBtn = document.getElementById('activate-btn');
const cameraScreen = document.getElementById('camera-screen');
const resultScreen = document.getElementById('result-screen');

activateBtn.addEventListener('click', async () => {
    const licenseKey = licenseInput.value.trim();
    if (!licenseKey) {
        alert('Please enter a license key');
        return;
    }

    try {
        console.log('Activating license...');
        await Dynamsoft.License.LicenseManager.initLicense(licenseKey, true);

        console.log('Loading modules...');
        await Dynamsoft.Core.CoreModule.loadWasm(["DBR", "DDN"]);

        console.log('Initializing camera...');
        await initCamera();

        console.log('Setting up scanner...');
        cvr = await Dynamsoft.CVR.CaptureVisionRouter.createInstance();

        cvr.addResultReceiver({
            onCapturedResultReceived: handleCapturedResult
        });

        isSDKReady = true;

        document.getElementById('license-screen').classList.add('hidden');
        cameraScreen.classList.remove('hidden');

        await startScanning();

    } catch (error) {
        console.error('Initialization failed:', error);
        alert(`Error: ${error.message}`);
    }
});

Step 3: Camera Setup

async function initCamera() {
    cameraView = await Dynamsoft.DCE.CameraView.createInstance();
    cameraEnhancer = await Dynamsoft.DCE.CameraEnhancer.createInstance(cameraView);

    const container = document.getElementById('camera-view');
    container.appendChild(cameraView.getUIElement());

    const cameras = await cameraEnhancer.getAllCameras();
    console.log('Available cameras:', cameras);

    if (cameras.length > 0) {
        await cameraEnhancer.selectCamera(cameras[0]);
        cameraEnhancer.setPixelFormat(10);
        await cameraEnhancer.open();
    } else {
        throw new Error('No cameras found');
    }
}

Step 4: Document Detection with Auto-Capture

This is where the magic happens—automatic document detection with stability tracking:

async function startScanning() {
    if (!isSDKReady) return;

    try {
        let settings = await cvr.getSimplifiedSettings("DetectDocumentBoundaries_Default");
        await cvr.updateSettings("DetectDocumentBoundaries_Default", settings);

        cvr.setInput(cameraEnhancer);
        await cvr.startCapturing("DetectDocumentBoundaries_Default");

        updateStatus('Looking for document...');
    } catch (error) {
        console.error('Failed to start scanning:', error);
    }
}

let stabilityThreshold = 12;
let movementTolerance = 15;
let stabilityCounter = 0;
let lastQuadPoints = null;
let latestDetectedQuad = null;
let isCaptureInProgress = false;

async function handleCapturedResult(result) {
    if (isCaptureInProgress) return;

    const items = result.items;
    if (!items || items.length === 0) {
        resetStabilityTracking();
        updateStatus('Looking for document...');
        return;
    }

    for (const item of items) {
        if (item.type === Dynamsoft.Core.EnumCapturedResultItemType.CRIT_DETECTED_QUAD) {
            latestDetectedQuad = item;
            checkStability(item.location.points);
        }
    }

    if (stabilityCounter >= stabilityThreshold && !isCaptureInProgress) {
        await performCapture();
    }
}

function checkStability(currentPoints) {
    if (!lastQuadPoints) {
        lastQuadPoints = currentPoints;
        stabilityCounter = 1;
        updateStatus('Document detected, hold steady...');
        return;
    }

    const isStable = isQuadStable(currentPoints, lastQuadPoints);

    if (isStable) {
        stabilityCounter++;
        const progress = Math.min(stabilityCounter / stabilityThreshold * 100, 100);
        
        if (stabilityCounter >= stabilityThreshold) {
            updateStatus('Ready to capture!');
        } else {
            updateStatus(`Hold steady... ${Math.round(progress)}%`);
        }
    } else {
        resetStabilityTracking();
        stabilityCounter = 1;
        updateStatus('Movement detected, hold steady...');
    }

    lastQuadPoints = currentPoints;
}

function isQuadStable(current, last) {
    if (current.length !== 4 || last.length !== 4) return false;

    for (let i = 0; i < 4; i++) {
        const dx = Math.abs(current[i].x - last[i].x);
        const dy = Math.abs(current[i].y - last[i].y);
        
        if (dx > movementTolerance || dy > movementTolerance) {
            return false;
        }
    }
    return true;
}

function resetStabilityTracking() {
    stabilityCounter = 0;
    lastQuadPoints = null;
}

function updateStatus(message) {
    document.getElementById('status').textContent = message;
}

Step 5: Document Cropping & Normalization

When the document is stable, capture and normalize it:

async function performCapture() {
    isCaptureInProgress = true;
    updateStatus('Capturing...');

    try {
        await cvr.stopCapturing();

        let normalizeSettings = await cvr.getSimplifiedSettings("NormalizeDocument_Default");
        normalizeSettings.roiMeasuredInPercentage = false;
        normalizeSettings.roi = latestDetectedQuad.location;
        await cvr.updateSettings("NormalizeDocument_Default", normalizeSettings);

        const image = cameraEnhancer.fetchImage();
        const normalizeResult = await cvr.capture(image, "NormalizeDocument_Default");

        let normalizedImage = null;
        for (const item of normalizeResult.items) {
            if (item.type === Dynamsoft.Core.EnumCapturedResultItemType.CRIT_NORMALIZED_IMAGE) {
                normalizedImage = item;
                break;
            }
        }

        if (!normalizedImage) {
            throw new Error('Failed to normalize document');
        }

        await readBarcodesFromDocument(normalizedImage);

    } catch (error) {
        console.error('Capture failed:', error);
        alert('Failed to capture document');
        
        isCaptureInProgress = false;
        resetStabilityTracking();
        await startScanning();
    }
}

Step 6: Barcode Extraction

After cropping and normalizing the document, extract barcodes from it:

async function readBarcodesFromDocument(normalizedImageItem) {
    try {
        updateStatus('Reading barcodes...');

        const imageData = normalizedImageItem.toCanvas().toDataURL();

        let barcodeSettings = await cvr.getSimplifiedSettings("ReadBarcodes_Balance");
        await cvr.updateSettings("ReadBarcodes_Balance", barcodeSettings);

        const barcodeResult = await cvr.capture(normalizedImageItem.imageData, "ReadBarcodes_Balance");

        const barcodes = [];
        for (const item of barcodeResult.items) {
            if (item.type === Dynamsoft.Core.EnumCapturedResultItemType.CRIT_BARCODE) {
                barcodes.push({
                    text: item.text,
                    format: item.formatString,
                    type: item.formatString
                });
            }
        }

        displayResults(imageData, barcodes);

    } catch (error) {
        console.error('Barcode reading failed:', error);
        displayResults(normalizedImageItem.toCanvas().toDataURL(), []);
    }
}

function displayResults(croppedImageData, barcodes) {
    cameraScreen.classList.add('hidden');
    resultScreen.classList.remove('hidden');

    const croppedImage = document.getElementById('cropped-image');
    croppedImage.src = croppedImageData;

    const barcodeResults = document.getElementById('barcode-results');
    barcodeResults.innerHTML = '';

    if (barcodes.length === 0) {
        barcodeResults.innerHTML = '<p>No barcodes detected</p>';
    } else {
        barcodeResults.innerHTML = '<h3>Detected Barcodes:</h3>';
        barcodes.forEach((barcode, index) => {
            barcodeResults.innerHTML += `
                <div class="barcode-item">
                    <strong>Barcode ${index + 1}:</strong> ${barcode.text}<br>
                    <em>Format:</em> ${barcode.format}
                </div>
            `;
        });
    }

    currentScanResult = {
        imageDataUrl: croppedImageData,
        barcodes: barcodes,
        timestamp: new Date().toISOString()
    };
}

Step 7: IndexedDB History Storage

The captured documents and barcode results will be saved to a history using IndexedDB for later review.

const DB_NAME = 'DocumentScannerDB';
const DB_VERSION = 1;
const STORE_NAME = 'scanHistory';

function openDB() {
    return new Promise((resolve, reject) => {
        const request = indexedDB.open(DB_NAME, DB_VERSION);
        request.onerror = (event) => reject('Database error: ' + event.target.error);
        request.onsuccess = (event) => resolve(event.target.result);
        request.onupgradeneeded = (event) => {
            const db = event.target.result;
            if (!db.objectStoreNames.contains(STORE_NAME)) {
                db.createObjectStore(STORE_NAME, { keyPath: 'timestamp' });
            }
        };
    });
}

async function saveScanToDB(scanResult) {
    const db = await openDB();
    return new Promise((resolve, reject) => {
        const transaction = db.transaction([STORE_NAME], 'readwrite');
        const store = transaction.objectStore(STORE_NAME);
        const request = store.add(scanResult);
        request.onsuccess = () => resolve();
        request.onerror = (event) => reject('Save error: ' + event.target.error);
    });
}

async function saveToHistory() {
    if (!currentScanResult) return;

    try {
        await saveScanToDB(currentScanResult);
        scanHistory.unshift(currentScanResult);
        
        if (scanHistory.length > 50) {
            scanHistory = scanHistory.slice(0, 50);
        }
        
        updateHistoryCount();
        currentScanResult = null;
    } catch (e) {
        console.warn('Failed to save history to DB:', e);
        showToast('Failed to save history');
    }
}

Step 8: Adjustable Stability Settings

To give users control over the auto-capture sensitivity, add a settings UI that allows real-time adjustment of stability parameters:

HTML (Settings Modal):

<!-- Add to your index.html -->
<div id="settings-overlay" class="overlay hidden">
    <div class="settings-modal">
        <div class="settings-header">
            <h2>Settings</h2>
            <button id="close-settings-btn" class="close-btn">&times;</button>
        </div>
        <div class="settings-body">
            <div class="setting-group">
                <label for="stability-threshold">
                    Stability Threshold: <span id="stability-threshold-value">12</span>
                    <span class="tooltip">Number of stable frames required before capture</span>
                </label>
                <input type="range" id="stability-threshold" 
                       min="5" max="30" value="12" step="1">
            </div>
            
            <div class="setting-group">
                <label for="movement-tolerance">
                    Movement Tolerance: <span id="movement-tolerance-value">15</span>
                    <span class="tooltip">Allowed pixel movement to be considered stable</span>
                </label>
                <input type="range" id="movement-tolerance" 
                       min="5" max="50" value="15" step="1">
            </div>
        </div>
    </div>
</div>

<!-- Add Settings button to your status bar -->
<button id="settings-btn" class="icon-button" title="Settings">⚙️</button>

JavaScript (Settings Logic):

// Settings UI Management
const settingsBtn = document.getElementById('settings-btn');
const settingsOverlay = document.getElementById('settings-overlay');
const closeSettingsBtn = document.getElementById('close-settings-btn');
const stabilityInput = document.getElementById('stability-threshold');
const stabilityValue = document.getElementById('stability-threshold-value');
const movementInput = document.getElementById('movement-tolerance');
const movementValue = document.getElementById('movement-tolerance-value');

function initSettings() {
    // Open settings modal
    settingsBtn.addEventListener('click', () => {
        // Sync inputs with current values
        stabilityInput.value = stabilityThreshold;
        stabilityValue.textContent = stabilityThreshold;
        movementInput.value = movementTolerance;
        movementValue.textContent = movementTolerance;
        
        settingsOverlay.classList.remove('hidden');
    });

    // Close settings
    const closeSettings = () => {
        settingsOverlay.classList.add('hidden');
    };
    
    closeSettingsBtn.addEventListener('click', closeSettings);
    
    // Close on click outside modal
    settingsOverlay.addEventListener('click', (e) => {
        if (e.target === settingsOverlay) {
            closeSettings();
        }
    });

    // Real-time parameter updates
    stabilityInput.addEventListener('input', (e) => {
        stabilityThreshold = parseInt(e.target.value);
        stabilityValue.textContent = stabilityThreshold;
    });

    movementInput.addEventListener('input', (e) => {
        movementTolerance = parseInt(e.target.value);
        movementValue.textContent = movementTolerance;
    });
}

initSettings();

CSS (Settings Modal Styling):

.settings-modal {
    position: fixed;
    top: 50%;
    left: 50%;
    transform: translate(-50%, -50%);
    background: white;
    border-radius: 12px;
    padding: 24px;
    min-width: 400px;
    max-width: 90%;
    box-shadow: 0 8px 32px rgba(0, 0, 0, 0.3);
    z-index: 10001;
}

.setting-group {
    margin-bottom: 20px;
}

.setting-group label {
    display: block;
    margin-bottom: 8px;
    font-weight: 600;
}

.setting-group input[type="range"] {
    width: 100%;
    height: 6px;
    border-radius: 3px;
    background: #e0e0e0;
    outline: none;
}

.tooltip {
    display: block;
    font-size: 12px;
    color: #666;
    font-weight: normal;
    margin-top: 4px;
}

This feature is particularly valuable for:

Production environments - optimize for speed vs. quality
Different document types - small cards vs. large posters
Various lighting conditions - adjust sensitivity for low-light scenarios
User preferences - let end-users customize their experience

Testing Your Implementation

# Option 1: Using Python
python -m http.server 8000 --bind localhost

# Option 2: Using Node.js (http-server)
npx http-server -p 8000

web document barcode scanning

Source Code

https://github.com/yushulx/javascript-barcode-qr-code-scanner/tree/main/examples/document_barcode