C++ OCR Library for you to Convert Image to Text

Overview

  • With an optical character recognition (OCR) library, you can extract text from scanned images or PDF documents to manipulate that content, whether to edit, save or reuse it.
  • You can also produce searchable PDF documents.
  • Dynamsoft offers two OCR engines: OCR Professional Module (based on Nuance OmniPage) and OCR Basic Module (based on Tesseract).
  • Both OCR libraries are optimized for web applications.

OCR Professional Module

OCR Professional Module is fast and robust. It delivers great accuracy with built-in image pre-processing (de-speckle, de-skew, autorotation), auto font matching, and more advanced imaging technology. Additionally, OCR Professional Module supports multi-thread processing.

How to use Dynamsoft OCR Professional Module ›

OCR Basic Module

OCR Basic Module is developed on top of Tesseract, an intelligent learning open-source OCR engine sponsored by Google since 2006.

Language Support

The OCR Professional library currently supports English and 119 other western languages as well as Arabic.

See the complete supported language list ›

OCR Basic Module currently supports 27 languages.

See the complete supported language list ›
Language Support

Deployment

Both the professional and basic engines support client-side and server-side deployment.

With server-side deployment

users upload the data to the server side for OCR processing. There is no need to download an OCR engine to a client machine. The downside of this approach is a lack of support for offline OCR.

  • With the OCR Professional engine, you can deploy the OCR engine on your Windows server. There is no limit to the server-side programming language. You can use Java, .NET, or any other you prefer.
  • OCR Basic Module supports both Windows Server and Linux Server. There is also no limit to the server-side language.

With a client-side deployment

users need to download and install the OCR module upon their first visit of a web page. This approach currently only supports Windows clients.

Input

The OCR Professional engine supports extracting text from the following file types:

TIFF (G4 / LZW / JPEG), JPEG, PDF, BMP, JPEG2000, JBIG, JBIG2, PNG, PDA, PGX, XPS, WMP, OPG, MAX, AWD, DCX, and PCX.

The Basic module supports BMP, JPG, PNG, TIF, and PDF.

Both engines support zonal OCR which significantly speeds up text recognition from scanned documents.

Input

Output

The OCR Professional engine enables you to save OCR results in the following formats:

  • Searchable PDFs (including PDF/A 1-b). Text-over-image technology supports multiple image compression formats to reduce the size of PDF files.
  • Text files - TXT, CSV, XML, RTF
  • String variable
  • You can also get detailed position information as a part of an OCR result.

The Basic module supports exporting the result as a string, .txt, image-over-text PDF and pure-text PDF.

Output

Licensing and Pricing

The OCR Professional engine is licensed on an annual basis, starting at $990/Year/300K pages.

The OCR Basic module offers a perpetual licensing option, starting at $1,999 as a one-time license fee.

Licensing and Pricing

Benefits

Robust Imaging Features

You can integrate a multitude of document imaging features all in one application, including:

  • TWAIN scanning
  • Webcam capture
  • PDF rasterizer
  • 1D & 2D barcode detection

Award-Winning Technical Support

Dynamsoft is committed to giving the best customer service and provides multiple support channels: phone call, live chat, email, online meeting, forums, kb, etc. Learn more ›

Benefits