How to Scan Documents and Extract Text

Last Updated on 2020-05-28

Read Text from Scanned PDFs or Other Images in ASP.NET

In this fast-paced world, customers expect work to be delivered in a short time. We often hear from anxious customers who have an urgent project that needs to be completed quickly. If their project involves scanning documents that contain images, our system can easily recognize the content of the image and convert it to text. Fast and accurate Optical Character Recognition saves your company time and money, and reduces data entry errors.

This is what Dynamic Web TWAIN is designed for – to save you time and help you build a document management solution rapidly.

Try OCR Online Demo

Below is an OCR online demo that you can try:

Server-Side OCR Online Demo
This demo uploads the images to the server and performs OCR on the server side.

Want to try some samples yourself?
Get Sample Code NOW

Dynamic Web TWAIN is a browser-based document scanning SDK that enables you to interact with TWAIN scanners with just a few lines of code in JavaScript. Combined with the OCR Professional Engine, you can easily create a document workflow to scan documents and read text from images in your web application.

Dynamsoft OCR is implemented through a complex system of trained pattern recognition, which can also recognize fonts and formatting. It recognizes text in graphic form, such as words in a picture, and turns it into text that can be read and edited.

Some of the ways in which OCR can be used include:

  • Recovering editable text files from scanned documents including faxes
  • Categorizing forms based on an approximation of their handwritten contents
  • Creating searchable and editable eBooks from book scans
  • Searching and editing text from screenshot images
  • Computerized reading of books for visually impaired individuals through text-to-speech

Dynamsoft OCR input can come from different sources, including scans, online images, and photos. Formats supported include TIFF, PNG, PNM, BMP, GIF, and JPEG. It doesn’t matter if your image comes from a scanner, the web, or your camera, Dynamsoft OCR will process it.

Code Snippets

Below we will show some code snippets of doing TWAIN scanning and client-side OCR in JavaScript using Dynamic Web TWAIN.

Scan Images

Dynamic Web TWAIN provides easy APIs for you to customize scanning settings and acquire images from TWAIN scanners.

[js]function acquireImage() {
DWObject.SelectSourceByIndex(document.getElementById("source").selectedIndex); //select an available TWAIN scanners

//set scanning settings like pixel type, resolution, ADF etc.
DWObject.IfShowUI = false; //don’t show the user interface of the scanner
DWObject.PixelType = 1; //scan in gray
DWObject.Resolution = 300;
DWObject.IfFeederEnabled = true; //scan from auto feeder
DWObject.IfDuplexEnabled = false;
DWObject.IfDisableSourceAfterAcquire = true;

//acquire images from scanners
DWObject.AcquireImage();
}
[/js]

Download the OCR Professional Module

To use the OCR Professional module for client-side OCR, you will need to include ocrpro.js in the head and also download the OCR Pro DLL.

[js]<script type="text/javascript" src="Resources/addon/dynamsoft.webtwain.addon.ocrpro.js"> </script>[/js]

Make edits to the .js file:

[js]var CurrentPathName = unescape(location.pathname);
CurrentPath = CurrentPathName.substring(0, CurrentPathName.lastIndexOf("/") + 1);
DWObject.Addon.OCRPro.Download(CurrentPath + "Resources/addon/OCRPro.zip", OnSuccess, OnFailure);
[/js]

Perform OCR Recognition

It’s as easy as adding the code below to call the JS OCR recognition API to extract text from scanned images.

[js]DWObject.Addon.OCRPro.Recognize(0, GetOCRProInfo, GetErrorInfo); // 0 is the index of the image[/js]

You can also use your mouse to select an area of the image and do zonal OCR.

[js]var zoneArray = [];
var zone = Dynamsoft.WebTwain.Addon.OCRPro.NewOCRZone(_iLeft, _iTop, _iRight, _iBottom);
zoneArray.push(zone);
DWObject.Addon.OCRPro.RecognizeRect(0, zoneArray, GetRectOCRProInfo, GetErrorInfo);
[/js]

Return OCR Results

You can save the OCR results in txt, csv, rtf, xml, or pdf.

[js] function DoOCR() {
if (DWObject) {

var saveTye = "";
var fileType = "";
switch (OCROutputFormat[document.getElementById("ddlOCROutputFormat").selectedIndex].val) {
case EnumDWT_OCRProOutputFormat.OCRPFT_TXTS:
fileType = ".txt";
saveTye = "Plain Text(*.txt)";
break;
case EnumDWT_OCRProOutputFormat.OCRPFT_TXTCSV:
fileType = ".csv";
saveTye = "CSV(*.csv)";
break;
case EnumDWT_OCRProOutputFormat.OCRPFT_TXTF:
fileType = ".rtf";
saveTye = "Rich Text Format(*.rtf)";
break;
case EnumDWT_OCRProOutputFormat.OCRPFT_XML:
fileType = ".xml";
saveTye = "XML Document(*.xml)";
break;
case EnumDWT_OCRProOutputFormat.OCRPFT_IOTPDF:
case EnumDWT_OCRProOutputFormat.OCRPFT_IOTPDF_MRC:
fileType = ".pdf";
saveTye = "PDF(*.pdf)";
break;
}
var fileName = "result" + fileType;

DWObject.ShowFileDialog(true, saveTye, 0, "", fileName, true, false, 0);

}
}[/js]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe Newsletter

Subscribe to our mailing list to get the monthly update.

Subscribename@email.com