Quickly implement text recognition in web applications


In the process of document digitization, it is often necessary to extract the required information from the acquired images. Optical Character Recognition (OCR) is the technology used for this purpose. In this article, we explore how to quickly scan and recognize text in a browser with Dynamic Web TWAIN and its OCR Add-on.


We only discuss the basic OCR engine in this article and we are using it on the client-side. The engine can also be used on the server side. Furthermore, Dynamsoft offers another engine called Professional OCR which is faster and more accurate and can also be used on both client-side and server-side. For more info, please contact us.



The OCR module itself doesn't rely on Node.js, it's needed in this article just because it's faster to get required files with its package manager (npm).


Step 1 Create a new directory, open the command line tool inside (shortcut is Ctrl+Shift+right click). Download the core control used in this article through npm

npm install dwt@14.2.0

Then you can see the following in this directory


Step 2 Open to the following directory


where you can see


Step 3 In this article, we are going to check OCRADocument.html. Double click it to open. If the related controls are not yet available, follow the prompts to install them


Under normal circumstances, the installed files can be found in the C:\Windows\SysWOW64\Dynamsoft\DynamsoftService directory. The core files here are mainly





Step 4 After the installation is complete, refresh the page, click Scan Documents (local need scanner) or Load Images or PDFs to scan or load local image files with English text. Then click on OCR An Image with English. The recognition result of the image will then show up in the result box on the right


How it is done

Open OCRADocument.html in a text editor

References to the Core JavaScript files

<script type="text/javascript" src="../dist/dynamsoft.webtwain.initiate.js"></script>
<script type="text/javascript" src="../dist/dynamsoft.webtwain.config.js"></script>
<script type="text/javascript" src="../dist/addon/dynamsoft.webtwain.addon.ocr.js"></script>
<script type="text/javascript" src="../dist/addon/dynamsoft.webtwain.addon.pdf.js"></script>

Here the files referenced are

JS library for the core SDK Dynamic Web TWAIN

node_modules\dwt\dis\dynamsoft.webtwain.initiate.js node_modules\dwt\dis\dynamsoft.webtwain.config.js

JS library for the Dynamsoft OCR Basic


PDF Rasterizer is not necessary, check out PDF Rasterizer


If you have previously installed the Dynamic Web TWAIN product locally, the same files (except dynamsoft.webtwain.addon.pdf.js) can also be found in the following directory.

C:\Program Files (x86)\Dynamsoft\Dynamic Web TWAIN SDK {version number} {Trial}\Resource

Dynamsoft OCR Basicruntime installation code

function downloadOCRBasic(bDownloadDLL) {
    var strOCRPath = Dynamsoft.WebTwainEnv.ResourcesPath + "/OCRResources/OCR.zip",
        strOCRLangPath = Dynamsoft.WebTwainEnv.ResourcesPath + '/OCRResources/OCRBasicLanguages/English.zip';

    if (bDownloadDLL) {
            function () {/*console.log('OCR dll is installed');*/
            function (errorCode, errorString) {
    } else {
            function () {
            }, function (errorCode, errorString) {

As shown in the above code, the Dynamsoft OCR Basic installation takes two steps. The first step is to install the core DLL (DynamicOCR.dll from "/OCRResources/OCR.zip") with the DWObject.Addon.OCR.Download interface, The second step is to install the OCR language pack or the recognition dictionary ('/OCRResources/OCRBasicLanguages/English.zip') with the DWObject.Addon.OCR.DownloadLangData interface. Only the English dictionary is installed here, so the program can only recognize English. If you need to identify other languages ​​(27 main languages ​​in total), you can download a complete example or refer to this online example

Scan Documents and Client-side OCR basic

List of supported languages

Arabic, Bengali, Chinese_Simplified, Chinese_Traditional, English, French, German, Hindi, Indonesian, Italian, Japanese, Javanese, Korean, Malay, Marathi, Panjabi, Persian, Portuguese, Russian, Spanish, Swahili, Tamil, Telugu, Thai, Turkish, Vietnamese, Urdu.

Use the addon

function DoOCR() {
    if (DWObject) {
        if (DWObject.HowManyImagesInBuffer == 0) {
            alert("Please scan or load an image first.");
            function (sImageIndex, result) {
                if (result == null)
                    return null;
                var _textResult = (Dynamsoft.Lib.base64.decode(result.Get())).split(/\r?\n/g), _resultToShow = [];
                for (var i = 0; i < _textResult.length; i++) {
                    if (i == 0 && _textResult[i].trim() == "")
                    _resultToShow.push(_textResult[i] + '<br />');
                _resultToShow.splice(0, 0, '<p style="padding:5px; margin:0;">');
                document.getElementById('divNoteMessage').innerHTML = _resultToShow.join('');
            function (errorcode, errorstring, result) {

The core code is

DWObject.Addon.OCR.SetLanguage('eng'); //Set the language to be recognized
DWObject.Addon.OCR.SetOutputFormat(EnumDWT_OCROutputFormat.OCROF_TEXT); //Set the output format
DWObject.Addon.OCR.Recognize(... //Start Reconizing

Check out the supported output formats EnumDWT_OCROutputFormat.

Related methods are

SetLanguage(), SetOutputFormat()

Recognize(), RecognizeFile(), RecognizeRect(), RecognizeSelectedImages()

results matching ""

    No results matching ""

    results matching ""

      No results matching ""