Dev Center
Table of contents

Thanks for Downloading Dynamic Web TWAIN 30-Day Trial!

Your download will start shortly. If your download does not begin, click here to retry.

OCR (Retired)

This page is only provided as a reference for clients with existing Dynamsoft OCR licences. New OCR licences are not available as Dynamsoft has ended the development of OCR modules.

Dynamsoft offers two OCR engines that can be used as add-ons for Dynamic Web TWAIN : OCR Basic ( OCRB for short) and OCR Professional ( OCRPro for short).

OCRB is developed on top of the open source Tesseract engine. OCRPro on the other hand was developed on top of Kofax’s proprietary OCR engine.

For simple OCR of relatively clear images, OCRB will suffice. It supports 27 languages including English, Arabic, Chinese, and Russian, etc. Here is a full list of all the languages supported by OCRB.

As the name implies, OCRPro is faster, more robust and comes with built-in image pre-processing. It currently supports 119 languages and is the recommended option for any large-scale enterprise grade solution. Here is a full list of all the languages supported by OCRPro.

For a quick comparison, you can use this sample application to test the performance of both engines side by side.

OCR can be performed both on the client side and on the server side. But Server-side OCRPro is no longer supported in v17.0.

Client side OCR

Environment

Client side OCR only works in browsers on Windows on desktop.

Use OCRB on the Client Side

Step one - Include OCRB

To include this addon is to reference the necessary JavaScript file dynamsoft.webtwain.addon.ocr.js which is NOT included in the resources files. If you can’t find this file, you can contact Dynamsoft Support.

If you are using the dwt package, the addon is already included in the main JavaScript file ( dynamsoft.webtwain.min.js or dynamsoft.webtwain.min.mjs ) which means you can skip this step.

<script src="dynamsoft.webtwain.addon.ocr.js"> </script>

Step two - Install OCRB

OCRB is not included by default in the service installation. To use it, you need to download and install it with the APIs Download() and DownloadLangData() . Check out the code snippet on how it works.

OCRB requires a dictionary / data when reading a specific language. The following code assumes the target language is “English”.

function downloadOCRB(bDownloadDLL) {
    var strOCRPath = Dynamsoft.DWT.ResourcesPath + '/addon/OCRx64.zip',
        strOCRLangPath = Dynamsoft.DWT.ResourcesPath + '/addon/OCRBasicLanguages/English.zip';
    if (bDownloadDLL) {
        DWObject.Addon.OCR.Download(
            strOCRPath,
            function() {
                /*console.log('OCR dll is installed');*/
                downloadOCRB(false);
            },
            function(errorCode, errorString) {
                console.log(errorString);
            }
        );
    } else {
        DWObject.Addon.OCR.DownloadLangData(
            strOCRLangPath,
            function() {},
            function(errorCode, errorString) {
                console.log(errorString);
            });
    }
}
downloadOCRB(true);

The code asks Dynamic Web TWAIN to download OCRB from the URL Dynamsoft.DWT.ResourcesPath + '/addon/OCRx64.zip' and the language data from the URL Dynamsoft.DWT.ResourcesPath + '/addon/OCRBasicLanguages/English.zip' . Both zip files need to be placed on the server where you placed the resources files. As mentioned above, if you can’t find these files, you can contact Dynamsoft Support.

Once the installation is done, you should be able to find the following files under C:\Windows\SysWOW64\Dynamsoft\DynamsoftServicex64_17\DynamicOCR .

  • DynamicOCRx64_10.0.0.0618.dll : The version number may vary.
  • DynamicOCR\eng.traineddata : This is for English, other language(s) may have different name(s).

Step three - Perform OCR with OCRB

Once installed, you can start using the addon. Check out the following code snippet which makes use of the methods SetLanguage() , SetOutputFormat() and Recognize() .

function DoOCR() {
    if (DWObject) {
        if (DWObject.HowManyImagesInBuffer == 0) {
            alert("Please scan or load an image first.");
            return;
        }
        if (Dynamsoft.DWT.EnumDWT_OCROutputFormat === undefined)
            Dynamsoft.DWT.EnumDWT_OCROutputFormat = EnumDWT_OCROutputFormat;
        DWObject.Addon.OCR.SetLanguage('eng');
        DWObject.Addon.OCR.SetOutputFormat(Dynamsoft.DWT.EnumDWT_OCROutputFormat.OCROF_TEXT);
        DWObject.Addon.OCR.Recognize(
            DWObject.CurrentImageIndexInBuffer,
            function(sImageIndex, result) {
                if (result == null)
                    return null;
                var _textResult = (Dynamsoft.Lib.base64.decode(result.Get())).split(/\r?\n/g);
                console.log(_textResult.join(" "));
            },
            function(errorcode, errorstring, result) {
                alert(errorstring);
            }
        );
    }
}
Other methods for OCRB

The following four methods are only effective when the output format is PDF.

Online demo for OCRB

Scan-Documents-and-Do-Client-side-OCR-Basic

Use OCRPro on the Client Side

Step one - Include OCRPro

To include this addon is to reference the necessary JavaScript file dynamsoft.webtwain.addon.ocrpro.js which is NOT included in the resources files. If you can’t find this file, you can contact Dynamsoft Support.

If you are using the dwt package, the addon is already included in the main JavaScript file ( dynamsoft.webtwain.min.js or dynamsoft.webtwain.min.mjs ) which means you can skip this step.

<script src="dynamsoft.webtwain.addon.ocrpro.js"> </script>

Step two - Install OCRPro

OCRPro is not included by default in the service installation. To use it, you need to download and install it with the APIs Download() .

NOTE: The OCRPro engine is huge (over 150MB) which takes quite a bit of time to download. The good news is that it only needs to be done once.

function downloadOCRPro() {
    var strOCRPath = Dynamsoft.DWT.ResourcesPath + '/addon/OCRProx64.zip';
    DWObject.Addon.OCRPro.Download(
        strOCRPath,
        function() {},
        function(errorCode, errorString) {
            console.log(errorString);
        }
    );
}
downloadOCRPro();

The code asks Dynamic Web TWAIN to download OCRPro from the URL Dynamsoft.DWT.ResourcesPath + '/addon/OCRProx64.zip' . This zip file needs to be placed on the server where you placed the resources files. If you can’t find this file, you can contact Dynamsoft Support.

Once the installation is done, you should be able to find the following under C:\Windows\SysWOW64\Dynamsoft\DynamsoftServicex64_17

  • DynamicOCRProx64_1.2.0.0806.dll : The version number may vary.
  • OCRProResource\{hundreds of files} : There are a few hundred files under this directory OCRProResource .

Step three - Perform OCR with OCRPro

Once installed, you can start using the addon. Check out the following code snippet which sets up the operation with Settings and then starts reading with Recognize() .

function DoOCR() {
    if (DWObject) {
        if (DWObject.HowManyImagesInBuffer == 0) {
            alert("Please scan or load an image first.");
            return;
        }
        var settings = Dynamsoft.WebTwain.Addon.OCRPro.NewSettings();
        settings.Languages = "eng";
        settings.OutputFormat = "TXTS";
        //settings.LicenseChecker = "LicenseChecker.aspx";
        DWObject.Addon.OCRPro.Recognize(
            DWObject.CurrentImageIndexInBuffer,
            function(sImageIndex, result) {
                if (result == null)
                    return null;
                var bRet = "",
                    pageCount = result.GetPageCount();
                if (pageCount == 0) {
                    alert("OCR result is Null.");
                    return;
                } else {
                    for (var i = 0; i < pageCount; i++) {
                        var page = result.GetPageContent(i);
                        var letterCount = page.GetLettersCount();
                        for (var n = 0; n < letterCount; n++) {
                            var letter = page.GetLetterContent(n);
                            bRet += letter.GetText();
                        }
                    }
                    console.log(bRet);
                }
            },
            function(errorcode, errorstring, result) {
                if (errorcode != -2600 && errorcode != -2601) {
                    //-2600:LicenseChecker cannot be empty.  
                    //-2601:Cannot connect to the LicenseChecker, please check and make sure it's set correctly.
                    alert(errorstring);
                }
                var strErrorDetail = "";
                var aryErrorDetailList = result.GetErrorDetailList();
                for (var i = 0; i < aryErrorDetailList.length; i++) {
                    if (i > 0)
                        strErrorDetail += ";";
                    strErrorDetail += aryErrorDetailList[i].GetMessage();
                }
                if (strErrorDetail.length > 0 && errorstring != strErrorDetail)
                    alert(strErrorDetail);
            });
    }
}

About Settings

OCRPro is configured through Settings , the following shows all the configurable parameters. Check out more details here. The following shows how to OCR and create a PDF file that has the keyword ‘TWAIN’ stricken out.

var settings = Dynamsoft.WebTwain.Addon.OCRPro.NewSettings();
settings.Languages = "eng";
settings.LicenseChecker = "LicenseChecker.aspx";
settings.OutputFormat = Dynamsoft.DWT.EnumDWT_OCRProOutputFormat.OCRPFT_IOTPDF;
settings.PDFAVersion = Dynamsoft.DWT.EnumDWT_OCRProPDFAVersion.OCRPPDFAV_1A;
settings.PDFVersion = Dynamsoft.DWT.EnumDWT_OCRProPDFVersion.OCRPPDFV_5;
settings.RecognitionModule = Dynamsoft.DWT.EnumDWT_OCRProRecognitionModule.OCRPM_FASTEST;
settings.Redaction.FindText = "TWAIN";
settings.Redaction.FindTextFlags = Dynamsoft.DWT.EnumDWT_OCRFindTextFlags.OCRFT_WHOLEWORD;
settings.Redaction.FindTextAction = Dynamsoft.DWT.EnumDWT_OCRFindTextAction.OCRFT_STRIKEOUT;
Other methods for OCRPro

Server side OCR

Environment

Server side OCR has no restriction on what OS or application is running on the client side. It receives an OCR request via HTTP from a client, carries out the OCR operation and returns the results to the client. As you may know, server side OCRPro is no longer supported since V17.0, so if you want to use Dynamic Web TWAIN V17.0+ for server-side OCR, you need to use OCRB.

For OCRB , the server can run either Windows or Linux .

Use OCRB on the Server Side

As mentioned above, using OCRB on the server side is not recommended by Dynamsoft unless

  • You need to read a language not supported by OCRPro like Chinese (The full languages List).
  • You need to do OCR on Linux

The following shows how to get OCRB set up, the environment is

  • OS: Windows 10
  • JRE: 1.8.0_221
  • Web Server: Tomcat 9.0.24 (64bit)
  • Eclipse: Oxygen.3a Release (4.7.3a)

If you’d like to use other environments, please first contact Dynamsoft Support.

Download OCRB resources

The resources of OCRB for the specified environment can be downloaded here.

Deploy OCRB resources

Unzip what’s downloaded in the previous step, copy the entire folder WebContent and paste it to Tomcat. In our case, the folder goes to C:\Program Files\Apache Software Foundation\Tomcat 9.0\webapps .

Use OCRB

Upload the file and the configuration

The following code shows how to upload the file to be read using Dynamic Web TWAIN and the configuration for the reading

NOTE: you can upload the file and the configuration in other ways too, it’s not required to use Dynamic Web TWAIN .

function DoOCR(index) {
    if (DWObject) {
        // `url` is the target URL to receive the HTTP request.
        var url = CurrentPath + "upload";
        DWObject.ClearAllHTTPFormField();
        DWObject.SetHTTPFormField("ProductKey", DWObject.ProductKey);
        DWObject.SetHTTPFormField("OutputFormat", OCROutputFormat[document.getElementById("ddlOCROutputFormat").selectedIndex].val);
        DWObject.SetHTTPFormField("InputLanguage", OCRLanguages[document.getElementById("ddlLanguages").selectedIndex].val);
        DWObject.HTTPUpload(
            url,
            [index],
            Dynamsoft.DWT.EnumDWT_ImageType.IT_PDF,
            Dynamsoft.DWT.EnumDWT_UploadDataFormat.Binary,
            "sampleFile.pdf",
            function() {
                console.log('upload success with no returned info');
            },
            // `OnOCRResultReturned` processes the returend OCR result
            OnOCRResultReturned
        );
    }
}
Receive and save the uploaded file

Open the file CRBasicx64-v16-Server\src\com\dynamsoft\demo\FileLoadServlet.java to see how it works. The core function in there is called service() . We’ll break it down below

DiskFileItemFactory factory = new DiskFileItemFactory();
String path = this.getServletContext().getRealPath("/uploadTemp");
factory.setRepository(new File(path));
factory.setSizeThreshold(1024 * 1024);
ServletFileUpload upload = new ServletFileUpload(factory);
List<FileItem> list;
try {
    list = (List<FileItem>)upload.parseRequest(request);
    for (FileItem item: list) {
        String name = item.getFieldName();
        if (item.isFormField()) {} else {
            String value = item.getName();
            int start = value.lastIndexOf("\\");
            String filename = value.substring(start + 1);
            request.setAttribute(name, filename);
            InputStream inputStream = null;
            OutputStream out = null;
            try {
                inputStream = item.getInputStream();
                byte[] aryImageBuffer = FileLoadServlet.readBytes(inputStream);
                out = new FileOutputStream(new File(path, filename));
                out.write(aryImageBuffer, 0, aryImageBuffer.length);
            } catch (IOException e) {
            } finally {
                inputStream.close();
                out.close();
            }
        }
    }
} catch (FileUploadException e1) {
    e1.printStackTrace();
}

The above code (extracted from service() ) is used for receiving and saving the uploaded file, if you don’t need the file on the server, you can ignore the saving part.

Perform OCR and return the results
// Define the paths of the OCRB Engine and the language data path
String strDllPath = this.getServletContext().getRealPath("/") + "WEB-INF\\lib";
String strTessDataPath = this.getServletContext().getRealPath("/") + "WEB-INF\\lib\\tessdata";
// Define the response header
response.setContentType("text/html;charset=utf-8");
request.setCharacterEncoding("utf-8"); 
String fileExtention = ".txt";
// Create an OCRB instance
DynamsoftOCRBasic ocr = new DynamsoftOCRBasic();
ocr.setOCRDllPath(strDllPath);
ocr.setOCRTessDataPath(strTessDataPath);
// Do OCR
byte[] result = ocr.ocrImage(aryImageBuffer, refaryRetResultDetails);
if(result.length > 0) {
	if(fileExtention == ".pdf"){
		response.getWriter().write("|#|" + strOCRResultFileName);
	} else {
		response.reset(); 
		response.setContentType("text/plain");
		response.setHeader("Content-disposition", "attachment; filename=\"" + strOCRResultFileName + "\"");
		// Write file to response.
		OutputStream output = response.getOutputStream();
		output.write(result);
		output.close();
	}
}
// Write the result on the server too (not necessary, just for future reference)
outResult = new FileOutputStream(new File(path, strOCRResultFileName));
outResult.write(result, 0, result.length);
outResult.close();

As shown in the above code, OCRB is represented by the class DynamsoftOCRBasic and the key method is ocrImage() . Check out the full list of the OCRB server-side methods.

Online demo for OCRB on the Server side

Scan-Documents-and-Do-Server-side-OCR-Basic-Java

Is this page helpful?

YesYes NoNo

In this article:

latest version

    • Latest Version (18.4)
    • Version 18.3
    • Version 18.1
    • Version 18.0
    • Version 17.3
    • Version 17.2.1
    • Version 17.1.1
    • Version 17.0
    • Version 16.2
    • Version 16.1.1
    Change +