Build a Web Page to Scan Documents to PDF

Last Updated on 2021-06-16

If you are developing a web application that requires the capability to deal with different digital file formats, chances are PDF will be a must-have file format. Scanning documents of text and graphics to PDF results in a compressed and visually clear file that can be read on a PC or Mac, typically using Adobe Reader. So, how do you scan documents to PDF?

In order to scan documents to pdf, your web application needs to be able to talk with your scanner via some scanning protocols. This leaves you with two options:

  • Spend a lot of time and effort to figure out the TWAIN standard
  • Use an available off-the-shelf third-party SDK

Considering the ease and convenience of using third-party SDK compared to the immense time and steep learning curve of studying a new protocol, opting for an SDK is a recommended solution.

In this tutorial, we will explain how to build a simple HTML page to scan documents and save them as a PDF file using Dynamic Web TWAIN SDK. We will also discuss how you scan multiple pages in a batch and save them as PDF, and how you can scan paper documents into searchable PDF files with OCR in a web application.

How to build a simple HTML page to scan documents and save them as a PDF file

  1. Start a Web Application
  2. Add Dynamic Web TWAIN to the HTML Page
  3. Use Dynamic Web TWAIN to Scan or Load Images
  4. Save Images as a PDF file

Step 1: Start a Web Application

Download the free Dynamic Web TWAIN 30-day trial.

After your installation, you can find it by default at C: > Program Files > (x86) > Dynamsoft > Dynamic Web TWAIN SDK {version number} Trial.

1. Copy the Dynamsoft’s Resources folder to your project

The Resources folder can normally be copied from
C:\Program Files (x86)\Dynamsoft\Dynamic Web TWAIN SDK {Version Number} {Trial}\

ResourcesFolder

Notice there are three folders:

  • Documents — Help Documents and a Developer’s Guide 
  • Resources — SDK files necessary to build a scanning web page
  • Samples — Dynamic Web TWAIN samples

2. Create an empty HTML page

Please put the empty HTML page together with the Resources folder, as shown below:

ResourcesAndHTML

Step 2. Add Dynamic Web TWAIN to the HTML Page

2.1 – Include the two Dynamsoft’s JS files in the <head> tag.

<script type="text/javascript" src="Resources/dynamsoft.webtwain.initiate.js"></script> 
<script type="text/javascript" src="Resources/dynamsoft.webtwain.config.js"></script>

2.2. Add Dynamic Web TWAIN container to the <body> tag.

<div id="dwtcontrolContainer" ></div>

Note: " dwtcontrolContainer" is the default id for the div. You can change it in the dynamsoft.webtwain.config.js if necessary.

Step 3: Use Dynamic Web TWAIN to Scan or Load Images

Add Scan and Load buttons to the page:

<input type="button" value="Scan" onclick="AcquireImage();" />

<input type="button" value="Load" onclick="LoadImage();" >

And add the implementation of function AcquireImage() and LoadImage(). Notice how LoadImage() handles success and failure with callback functions OnSuccess() and OnFailure() :

[javascript] function AcquireImage() {
if (DWObject) {
DWObject.SelectSource();
DWObject.OpenSource();
DWObject.IfDisableSourceAfterAcquire = true; // Scanner source will be disabled/closed automatically after the scan.
DWObject.AcquireImage();
}
}

//Callback functions for async APIs
function OnSuccess() {
console.log('successful');
}

function OnFailure(errorCode, errorString) {
alert(errorString);
}

function LoadImage() {
if (DWObject) {
DWObject.IfShowFileDialog = true; // Open the system's file dialog to load image
DWObject.LoadImageEx("", EnumDWT_ImageType.IT_ALL, OnSuccess, OnFailure); // Load images in all supported formats (.bmp, .jpg, .tif, .png, .pdf). OnSuccess or OnFailure will be called after the operation
}
}
[/javascript]

Step 4: Save Images as a PDF file

Now we have two options to get documents loaded into Dynamic Web TWAIN:

  • Scan documents from a scanner (AcquireImage());
  • Or load hard disk documents (LoadImage()).

Let’s add a save button to the web page:

<input type="button" value="Save" onclick="SaveWithFileDialog();" />

Add the logic of saving documents to PDF:

[javascript] function SaveWithFileDialog() {
if (DWObject) {
if (DWObject.HowManyImagesInBuffer &amp;amp;amp;amp;amp;gt; 0) {
DWObject.IfShowFileDialog = true;
DWObject.SaveAllAsPDF("DynamicWebTWAIN.pdf", OnSuccess, OnFailure);
}
}
}
[/javascript]

Now, save the file.

That’s it. Congratulations. You have just built a web page in around 5 minutes that can scan or load documents and save them as a PDF file.

You can open scan2pdf.html in a browser and test it out.

You can either load a local document or scan documents into your web page. Let’s try scanning. This is how the page looks like when the Scan button is clicked:

Dynamic Web TWAIN scan page

Please note that only TWAIN-compatible scanners will be listed in the Select Source dialog. If you don’t have a real scanner at hand, you can install a virtual scanner for the testing, which is what I did. If you do have a scanner, but it doesn’t show up in the list, please check this article for a solution.

After a sample page is scanned, it looks something like this:
Dynamic Web TWAIN scan page after scanning

And yes, you can save it as a PDF file by clicking the Save button.

One Step Further

The example above is simple and functions well. But sometimes, you may like to take things a step further. For example, how about automatically saving documents as a PDF without having to manually click the save button?

With Dynamic Web TWAIN’s event mechanism, it’s actually fairly easy to do.

Dynamic Web TWAIN offers a number of events for users to subscribe to. Events are triggered when certain trigger points are reached. For example, we have an OnMouseClick event for mouse clicking, an OnPostTransfer event for the end of transferring one image, etc.

So at the end of function Dynamsoft_OnReady(), simply add:

[javascript] if (DWObject) {
DWObject.RegisterEvent('OnPostAllTransfers', SaveWithFileDialog);
}
[/javascript]

This will do the job.

Scan Documents and Use barcodes as Batch Separators

What if you want to scan documents in a batch and then save them as PDF? Or, how to automatically separate different files in one batch? 

We recommend you first give this web demo a try before we cover these questions. 

Scan multi-pages to PDF Online Demo

The demo application enables users to scan documents from TWAIN scanners and MFPs (Multi-Function Printer). They can save them as PDF files, either one-page PDF or multi-page PDF. Also, it utilizes barcodes as batch separators. If there is a barcode detected on the page, a new file will be created with the barcode value being the filename.

How to Use this Demo

Upon your first visit, the page prompts a message asking you to download and install Dynamic Web TWAIN, which this application is built upon. Dynamic Web TWAIN is a browser-based document scanning and webcam capture SDK. After automated installation, everything should be quite straightforward.

Scan Paper Documents into Searchable PDF Files With OCR 

A typical step in document management is to scan paper documents to make them image-based PDFs and to save them in your document repository. But, these PDFs are not searchable or editable. They are essentially like photos. It’s inconvenient if you want to search the scanned document or edit part of the content. It also costs more space to store scanned image files.

To save storage space, and more importantly to improve work efficiency, you will need an OCR engine to convert scanned image-based PDFs to text-based files.

Perform OCR and Save it to a Searchable PDF

Now we need to perform OCR on the images to extract text and save them as searchable and editable PDF files.

Dynamic Web TWAIN provides an OCR Professional add-on that enables you to extract text from images and make them real text. You can do both server-side OCR and client-side with the OCR engine.

Try Dynamsoft Document Scanning SDKs to Easily Integrate Document Scanning Capability in your Applications 

The above examples show you how to use Dynamic Web TWAIN for your various document scanning needs. If you are interested in learning more about how our document scanning SDKs can help your project, contact us today, and our experts will help you out. 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe Newsletter

Subscribe to our mailing list to get the monthly update.

Subscribename@email.com