How to Enable OCR Indexing for Scanned Documents in a Web App

When working with documents, users might need to scan a batch of paper documents from MFP and key in the related information and save it to the database, so that they can easily find and retrieve the document in future.

However, manually keying in data costs time and is also error prone. It’d be nice that after scanning a document, the application can automatically capture the value of a field, either a name or an ID, populate it in the web page. Users can then save the record or use the field to look up for related records in your system. How to achieve that?

In this article, we will show you how to easily integrate document scanning and auto indexing with OCR to a Web application with Dynamic Web TWAIN SDK plus its OCR Pro add-on.

About Dynamic Web TWAIN

Dynamic Web TWAIN is a cross-browser document scanning SDK. With its simple JavaScript APIs, you can easily access client-side scanners from your web application. The SDK works with all mainstream browsers on Windows, macOS and Linux on the client side.

With OCR (Optical Character Recognition) technology, you can easily convert text in images to machine-readable text.

Scan Document from a Web Page

It’s very easy to add document scanning feature to a web page. Below is a simple working scan page.

<html>
<head>
<script type="text/javascript" src="Resources/dynamsoft.webtwain.initiate.js"> </script>
<script type="text/javascript" src="Resources/dynamsoft.webtwain.config.js"> </script>
</head>
<body>
<input type="button" value="Scan" onclick="AcquireImage();" />
<div id="dwtcontrolContainer"> </div>

    <script type="text/javascript">
        var DWObject;
        function Dynamsoft_OnReady(){
            DWObject = Dynamsoft.WebTwainEnv.GetWebTwain('dwtcontrolContainer');
        }
        function AcquireImage(){
            if(DWObject) {
                DWObject.IfDisableSourceAfterAcquire = true;
                DWObject.SelectSource();
                DWObject.OpenSource();
                DWObject.AcquireImage();
            }
        }
    </script>
</body>
</html>

Extract Text from Scanned Image with OCR

You can then extract text from the scanned images. You can either do a full-text OCR or zonal OCR according to your needs. Here we will show you how to do zonal OCR in JavaScript with Dynamsoft OCR Professional module.

Include the JS client of OCR Pro in the head.

<head>
<script type="text/javascript" language="javascript" src="Resources/addon/dynamsoft.webtwain.addon.ocrpro.js">
</head>

Download the OCR Pro DLL to the client machine to perform OCR recognition on the client side. There is also a server-side OCR module option as you prefer.

var CurrentPathName = unescape(location.pathname);
CurrentPath = CurrentPathName.substring(0, CurrentPathName.lastIndexOf("/") + 1);
DWObject.Addon.OCRPro.Download(CurrentPath + "Resources/addon/OCRPro.zip", OnSuccess, OnFailure)

Select an area of an image and perform zonal OCR to extract and recognize a field.


function GetOCRProInfo(result) {
        var bRet = "";
        for (var i = 0; i < pageCount; i++) {
                var page = result.GetPageContent(i);
                var letterCount = page.GetLettersCount();
                for (var n = 0; n < letterCount; n++) { 
                        var letter = page.GetLetterContent(n); bRet += letter.GetText(); 
                }
        } console.log(bRet); //Get OCR result. var fileName = "ocrresult.pdf"; //save the OCR result as PDF var savePath = "D:\temp\" + fileName; if(savePath.length > 1)
        result.Save(savePath);
}
DWObject.Addon.OCRPro.Recognize(0, GetOCRProInfo, GetErrorInfo); // 0 is the index of the image



function Dynamsoft_OnImageAreaSelected(index, left, top, right, bottom) {
        _iLeft = left;
        _iTop = top;
        _iRight = right;
        _iBottom = bottom;
}


function ZonalOCR() {

        var zoneArray = [];
        var zone = Dynamsoft.WebTwain.Addon.OCRPro.NewOCRZone(_iLeft, _iTop, _iRight, _iBottom);
        zoneArray.push(zone);
        DWObject.Addon.OCRPro.RecognizeRect(0, zoneArray, GetRectOCRProInfo, GetErrorInfo);
}

function GetRectOCRProInfo(sImageIndex, aryZone, result) {

        var bRet = "";
        for (var i = 0; i < pageCount; i++) {
                var page = result.GetPageContent(i);
                var letterCount = page.GetLettersCount();
                for (var n = 0; n < letterCount; n++) { 
                        var letter = page.GetLetterContent(n); 
                        bRet += letter.GetText(); 
                }
        } 
        console.log(bRet); //Get OCR result. 
        var fileName = "ocrresult.pdf"; //save the OCR result as PDF 
        var savePath = "D:\temp\" + fileName; 
        if(savePath.length > 1)
        result.Save(savePath);
}

If your documents are using the same template, meaning you have the same fields in the same area of the documents, you can have template fields correspond with specific document types and run it to automatically index a batch of scanned documents.

Try an online demo of zonal OCR
Get 30-day free trial of the SDK

Let us know if you have any questions on implementing OCR for document indexing in a Web application.

Subscribe Newsletter

Subscribe to our mailing list to get the monthly update.

Subscribename@email.com