Building an AI-Powered Passport Scanner with MRZ, OCR, and Face Detection in JavaScript

Dec 14, 2025

In today’s digital world, automating document processing has become essential for businesses across various industries. Whether you’re building a hotel check-in system, airport kiosk, identity verification platform, or government service portal, the ability to quickly and accurately extract information from passports and ID cards can significantly improve user experience and operational efficiency.

In this comprehensive tutorial, we’ll build a powerful web-based passport scanner that combines three technologies:

MRZ (Machine Readable Zone) Detection - Extract structured data from passport codes
OCR (Optical Character Recognition) - Read text from any part of the document
Face Detection - Locate and extract facial photos

Demo Video: MRZ, OCR, and Face Detection in Web Browsers

Online Demo

https://yushulx.me/javascript-barcode-qr-code-scanner/examples/mrz_solution/

Why Build a Passport Scanner?

Real-World Use Cases

Travel & Hospitality
- Hotel check-in automation
- Airport self-service kiosks
- Rental car verification
- Cruise ship boarding
Financial Services
- KYC (Know Your Customer) compliance
- Account opening verification
- International money transfers
- Credit applications
Government & Immigration
- Border control stations
- Visa application processing
- Citizen services portals
- Emergency document verification
Healthcare
- Patient registration
- Medical tourism verification
- Insurance claim processing
E-commerce & Marketplaces
- Age verification for restricted products
- Seller identity verification
- International shipping compliance

Benefits of a Web-Based Solution

No Installation Required - Works instantly in any modern browser
Cross-Platform - Desktop, tablet, and mobile compatible
Easy Updates - Deploy fixes and features without user action
Offline Capability - Service workers enable offline processing
Cost-Effective - Reduce hardware and maintenance costs

Technology Stack Overview

Our solution leverages a hybrid approach, combining both free and paid APIs to give users flexibility based on their needs and budget.

Free Components

PaddleOCR (ONNX Runtime Web)
- Runs entirely in the browser
- No API costs or quotas
- Privacy-friendly (no data leaves the device)
face-api.js
- SSD MobileNetV1 face detection
- Lightweight and fast
- Client-side processing
- Open-source and free

Paid Components

Dynamsoft Capture Vision SDK
- Industry-leading MRZ recognition
- Supports TD1, TD2, TD3 formats
- High accuracy and speed
- Free trial available
Google Cloud Vision API
- State-of-the-art OCR accuracy
- Supports 50+ languages
- Handles complex layouts and fonts

Why Hybrid Approach?

The combination of free and paid options provides:

Development Flexibility - Start with free tools, upgrade when needed
Cost Optimization - Use local processing for non-critical text, cloud API for precision tasks
Reliability - Fallback options if one service is unavailable
User Choice - Let users select based on their API keys and budgets

Step-by-Step Development Guide

Web-based passport scanner

Step 1: Create the HTML Structure

Create index.html:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>AI-Powered MRZ & Passport Scanner</title>
    <link rel="stylesheet" href="style.css">
    
    <!-- Dynamsoft SDK for MRZ -->
    <script src="https://cdn.jsdelivr.net/npm/dynamsoft-capture-vision-bundle@3.2.4000/dist/dcv.bundle.min.js"></script>
    
    <!-- ONNX Runtime for local OCR -->
    <script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.all.min.js"></script>
    
    <!-- face-api.js for face detection -->
    <script defer src="https://cdn.jsdelivr.net/npm/face-api.js@0.22.2/dist/face-api.min.js"></script>
</head>
<body>
    <div class="app-container">
        <header>
            <h1>🛂 MRZ & Passport Scanner</h1>
            
            <!-- MRZ License Configuration -->
            <div class="license-bar">
                <input type="text" id="licenseKey" placeholder="Dynamsoft License Key">
                <button id="initBtn">Initialize MRZ</button>
                <a href="https://www.dynamsoft.com/customer/license/trialLicense" target="_blank">Get Trial License</a>
            </div>
            
            <!-- OCR Engine Selection -->
            <div class="config-bar">
                <label>OCR Engine:</label>
                <select id="ocrEngineSelect">
                    <option value="paddle">PaddleOCR (Local - Free)</option>
                    <option value="google">Google OCR (Cloud - Paid)</option>
                </select>
                <input type="text" id="googleApiKey" placeholder="Google API Key (Optional)" style="display:none;">
            </div>
        </header>

        <main>
            <!-- Image Input Panel -->
            <section class="panel left-panel">
                <div class="toolbar">
                    <button id="btnLoad">📂 Load</button>
                    <button id="btnCamera">📷 Camera</button>
                    <button id="btnPaste">📋 Paste</button>
                    <span id="status">Ready</span>
                </div>
                
                <div class="viewer-container" id="dropZone">
                    <div id="cameraView" class="hidden"></div>
                    <div id="imageView" class="hidden">
                        <img id="displayImage" alt="Input">
                        <canvas id="overlayCanvas"></canvas>
                    </div>
                    <div class="placeholder-text">Drag & Drop Image Here</div>
                </div>
            </section>

            <!-- Results Panel -->
            <section class="panel right-panel">
                <div class="result-group">
                    <h3>👤 Face Detection</h3>
                    <canvas id="faceCropCanvas" width="150" height="150"></canvas>
                </div>
                
                <div class="result-group">
                    <h3>📝 MRZ Raw Text</h3>
                    <div id="mrzRawText" class="data-box">Waiting...</div>
                </div>
                
                <div class="result-group">
                    <h3>📋 MRZ Parsed Data</h3>
                    <div id="mrzResults" class="data-box">Waiting...</div>
                </div>
                
                <div class="result-group">
                    <h3>🔤 OCR Results</h3>
                    <div id="ocrResults" class="data-box">Waiting...</div>
                </div>
            </section>
        </main>
    </div>

    <div id="loadingSpinner" class="spinner-overlay">
        <div class="spinner"></div>
    </div>

    <script src="ocr-processor.js"></script>
    <script src="face-processor.js"></script>
    <script src="script.js"></script>
</body>
</html>

Key Features:

Dual-panel layout for input and results
License configuration for paid SDK
OCR engine switcher (free vs paid)
Multiple input methods (file, camera, clipboard, drag-drop)

Step 2: Style the Interface

Create style.css:

* {
    margin: 0;
    padding: 0;
    box-sizing: border-box;
}

body {
    font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
    background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
    min-height: 100vh;
    padding: 20px;
}

.app-container {
    max-width: 1400px;
    margin: 0 auto;
    background: white;
    border-radius: 10px;
    box-shadow: 0 10px 40px rgba(0,0,0,0.2);
    overflow: hidden;
}

header {
    background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
    color: white;
    padding: 20px;
    text-align: center;
}

.license-bar, .config-bar {
    margin-top: 10px;
    display: flex;
    gap: 10px;
    align-items: center;
    justify-content: center;
    flex-wrap: wrap;
}

main {
    display: grid;
    grid-template-columns: 1fr 1fr;
    gap: 20px;
    padding: 20px;
}

.viewer-container {
    position: relative;
    width: 100%;
    aspect-ratio: 4/3;
    background: #f0f0f0;
    border-radius: 8px;
    overflow: hidden;
}

#displayImage, #overlayCanvas {
    position: absolute;
    top: 0;
    left: 0;
    width: 100%;
    height: 100%;
    object-fit: contain;
}

.data-box {
    background: #f9f9f9;
    padding: 15px;
    border-radius: 5px;
    min-height: 60px;
    font-size: 14px;
    line-height: 1.6;
}

.spinner-overlay {
    position: fixed;
    top: 0;
    left: 0;
    width: 100%;
    height: 100%;
    background: rgba(0,0,0,0.7);
    display: none;
    justify-content: center;
    align-items: center;
    z-index: 1000;
}

.spinner-overlay.visible {
    display: flex;
}

.spinner {
    border: 4px solid #f3f3f3;
    border-top: 4px solid #667eea;
    border-radius: 50%;
    width: 50px;
    height: 50px;
    animation: spin 1s linear infinite;
}

@keyframes spin {
    0% { transform: rotate(0deg); }
    100% { transform: rotate(360deg); }
}

Step 3: Implement OCR with Dual Engine Support

Create ocr-processor.js with both PaddleOCR and Google Cloud Vision integration:

class OCRProcessor {
    constructor() {
        this.detSession = null;
        this.recSession = null;
        this.isInitialized = false;
        this.characters = null;
    }

    async init() {
        try {
            const options = {
                executionProviders: ['webgpu', 'wasm'],
                graphOptimizationLevel: 'all'
            };

            // Load PaddleOCR detection model
            this.detSession = await ort.InferenceSession.create(
                './assets/ch_PP-OCRv4_det.onnx', 
                options
            );

            // Load recognition model
            this.recSession = await ort.InferenceSession.create(
                './assets/ch_PP-OCRv4_rec.onnx', 
                options
            );

            // Load character dictionary
            const response = await fetch('./assets/ppocr_keys_v1.txt');
            const text = await response.text();
            this.characters = ['blank', ...text.split('\n').filter(c => c), ' '];

            this.isInitialized = true;
            console.log("✅ PaddleOCR initialized");
        } catch (e) {
            console.error("Failed to initialize OCR:", e);
        }
    }

    async run(imageElement, ocrResultsDiv, canvasOverlay, mrzZone, engine = 'paddle', apiKey = '') {
        if (engine === 'google') {
            return await this.runGoogleOCR(imageElement, ocrResultsDiv, canvasOverlay, mrzZone, apiKey);
        }

        // PaddleOCR local processing
        if (!this.isInitialized) {
            ocrResultsDiv.textContent = "OCR not initialized.";
            return [];
        }

        const boxes = await this.detectText(imageElement);
        const results = [];

        for (const box of boxes) {
            if (mrzZone && this.overlapsWithMrz(box, mrzZone)) continue;
            
            const text = await this.recognizeText(imageElement, box);
            if (text?.trim()) {
                results.push({ box, text });
            }
        }

        this.drawOverlays(canvasOverlay, results);
        this.displayResults(ocrResultsDiv, results);
        return results;
    }

    async runGoogleOCR(imageElement, ocrResultsDiv, canvasOverlay, mrzZone, apiKey) {
        try {
            let base64Image = imageElement.src.split(',')[1];

            const url = apiKey 
                ? `https://vision.googleapis.com/v1/images:annotate?key=${apiKey}`
                : 'https://vision.googleapis.com/v1/images:annotate';

            const response = await fetch(url, {
                method: 'POST',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify({
                    requests: [{
                        image: { content: base64Image },
                        features: [{ type: "TEXT_DETECTION" }]
                    }]
                })
            });

            if (!response.ok) {
                const err = await response.json();
                throw new Error(err.error?.message || 'Google OCR failed');
            }

            const data = await response.json();
            const annotations = data.responses[0].textAnnotations || [];

            const results = [];
            for (let i = 1; i < annotations.length; i++) {
                const ann = annotations[i];
                const vertices = ann.boundingPoly.vertices;
                
                const box = {
                    x: Math.min(...vertices.map(v => v.x || 0)),
                    y: Math.min(...vertices.map(v => v.y || 0)),
                    width: Math.max(...vertices.map(v => v.x || 0)) - Math.min(...vertices.map(v => v.x || 0)),
                    height: Math.max(...vertices.map(v => v.y || 0)) - Math.min(...vertices.map(v => v.y || 0))
                };

                if (mrzZone && this.overlapsWithMrz(box, mrzZone)) continue;
                results.push({ box, text: ann.description });
            }

            this.drawOverlays(canvasOverlay, results);
            this.displayResults(ocrResultsDiv, results);
            return results;
        } catch (e) {
            console.error("Google OCR Error:", e);
            ocrResultsDiv.textContent = `Google OCR Error: ${e.message}`;
            return [];
        }
    }

    // Additional helper methods: detectText, recognizeText, overlapsWithMrz, etc.
}

const ocrProcessor = new OCRProcessor();

window.initOCR = async function() {
    await ocrProcessor.init();
};

window.runOCR = async function(imageElement, ocrResultsDiv, canvasOverlay, mrzZone, engine, apiKey) {
    return await ocrProcessor.run(imageElement, ocrResultsDiv, canvasOverlay, mrzZone, engine, apiKey);
};

Key Implementation Details:

Free Option (PaddleOCR):
- Runs entirely in browser using ONNX Runtime Web
- No external API calls
- Privacy-friendly
Paid Option (Google Cloud Vision):
- Higher accuracy especially for complex documents
- Supports 50+ languages out of the box
- Requires API key
- 1,000 free requests per month

Step 4: Add Face Detection

Create face-processor.js using the free face-api.js library:

let faceDetectionReady = false;

window.initFaceDetection = async function() {
    try {
        await faceapi.nets.ssdMobilenetv1.loadFromUri('./assets/face-api-models');
        faceDetectionReady = true;
        console.log("✅ Face detection ready");
    } catch (e) {
        console.error("Face detection init failed:", e);
    }
};

window.runFaceDetection = async function(imageElement, faceCropCanvas, overlayCanvas) {
    if (!faceDetectionReady) return;

    try {
        const detection = await faceapi.detectSingleFace(imageElement);
        
        if (detection) {
            const box = detection.box;
            
            // Draw overlay on main canvas
            const ctx = overlayCanvas.getContext('2d');
            ctx.strokeStyle = '#ff0000';
            ctx.lineWidth = 3;
            ctx.strokeRect(box.x, box.y, box.width, box.height);
            
            // Crop and display face
            const faceCtx = faceCropCanvas.getContext('2d');
            faceCtx.drawImage(
                imageElement,
                box.x, box.y, box.width, box.height,
                0, 0, faceCropCanvas.width, faceCropCanvas.height
            );
        }
    } catch (e) {
        console.error("Face detection error:", e);
    }
};

Step 5: Implement MRZ Detection (Paid)

In script.js, add Dynamsoft MRZ initialization:

let cvr = null;
let parser = null;
let isMRZReady = false;

// Initialize MRZ SDK (requires license)
document.getElementById('initBtn').addEventListener('click', async () => {
    const key = document.getElementById('licenseKey').value.trim();
    
    try {
        // Initialize Dynamsoft license
        await Dynamsoft.License.LicenseManager.initLicense(key, true);
        
        // Load WASM modules
        await Dynamsoft.Core.CoreModule.loadWasm(["DLR"]);
        
        // Create parser
        parser = await Dynamsoft.DCP.CodeParser.createInstance();
        
        // Load MRZ specifications
        await Dynamsoft.DCP.CodeParserModule.loadSpec("MRTD_TD1_ID");
        await Dynamsoft.DCP.CodeParserModule.loadSpec("MRTD_TD3_PASSPORT");
        
        // Create router
        cvr = await Dynamsoft.CVR.CaptureVisionRouter.createInstance();
        await cvr.initSettings("./full.json");
        
        isMRZReady = true;
        console.log("✅ MRZ SDK initialized");
    } catch (ex) {
        console.error("MRZ init failed:", ex);
        alert("MRZ failed. Free features (OCR/Face) still work!");
    }
});

Step 6: Orchestrate the Complete Workflow

Add the main processing pipeline in script.js:

async function loadImage(base64Image) {
    const displayImage = document.getElementById('displayImage');
    displayImage.src = base64Image;
    
    displayImage.onload = async () => {
        document.getElementById('loadingSpinner').classList.add('visible');
        
        let mrzZone = null;
        
        // 1. Run MRZ detection (if initialized)
        if (isMRZReady && cvr) {
            const result = await cvr.capture(base64Image, "ReadMRZ");
            const mrzTexts = result.items
                .filter(item => item.type === Dynamsoft.Core.EnumCapturedResultItemType.CRIT_TEXT_LINE)
                .map(item => item.text);
            
            if (mrzTexts.length > 0) {
                document.getElementById('mrzRawText').textContent = mrzTexts.join('\n');
                
                const parsed = await parser.parse(mrzTexts.join(''));
                displayParsedMrz(parsed);
                
                // Calculate MRZ zone for OCR exclusion
                mrzZone = calculateMrzBounds(result.items);
            }
        }
        
        // 2. Run face detection (always free)
        if (window.runFaceDetection) {
            await window.runFaceDetection(
                displayImage,
                document.getElementById('faceCropCanvas'),
                document.getElementById('overlayCanvas')
            );
        }
        
        // 3. Run OCR (skip MRZ zone)
        if (window.runOCR) {
            const engine = document.getElementById('ocrEngineSelect').value;
            const apiKey = document.getElementById('googleApiKey').value.trim();
            
            await window.runOCR(
                displayImage,
                document.getElementById('ocrResults'),
                document.getElementById('overlayCanvas'),
                mrzZone,
                engine,
                apiKey
            );
        }
        
        document.getElementById('loadingSpinner').classList.remove('visible');
    };
}

Step 7: Enable Offline Mode with Service Worker

Create service-worker.js to cache models for offline use:

const CACHE_NAME = 'passport-scanner-v1';
const ASSETS_TO_CACHE = [
    './assets/ch_PP-OCRv4_det.onnx',
    './assets/ch_PP-OCRv4_rec.onnx',
    './assets/ppocr_keys_v1.txt',
    './assets/version-RFB-320.onnx',
    './assets/face-api-models/ssd_mobilenetv1_model-weights_manifest.json',
    './assets/face-api-models/ssd_mobilenetv1_model-shard1',
    './assets/face-api-models/ssd_mobilenetv1_model-shard2'
];

self.addEventListener('install', (event) => {
    event.waitUntil(
        caches.open(CACHE_NAME).then((cache) => {
            return cache.addAll(ASSETS_TO_CACHE);
        })
    );
});

self.addEventListener('fetch', (event) => {
    event.respondWith(
        caches.match(event.request).then((response) => {
            return response || fetch(event.request);
        })
    );
});

API Configuration Guide

Getting Dynamsoft License (MRZ - Paid)

Visit Dynamsoft Trial License
Sign up for a free 30-day trial
Copy your license key
Paste it into the “Dynamsoft License Key” input field
Click “Initialize MRZ”

Getting Google Cloud Vision API Key (OCR - Paid with Free Tier)

Go to Google Cloud Console
Create a new project or select an existing one
Enable the “Cloud Vision API”
Go to “Credentials” → “Create Credentials” → “API Key”
Copy your API key
Select “Google OCR (Cloud)” in the dropdown
Paste your API key

Using Free Options

PaddleOCR:

Select “PaddleOCR (Local)” from dropdown
No API key needed
Models load automatically
Works offline after first load

Face Detection:

Always free and enabled
No configuration needed

Source Code

https://github.com/yushulx/javascript-barcode-qr-code-scanner/tree/main/examples/mrz_solution