C++ OCR Library for you to Convert Image to Text
Overview
- With an optical character recognition (OCR) library, you can extract text from scanned images or PDF documents to manipulate that content, whether to edit, save or reuse it.
- You can also produce searchable PDF documents.
- Dynamsoft offers two OCR engines: OCR Professional Module (based on Kofax OmniPage) and OCR Basic Module (based on Tesseract).
- Both OCR libraries are optimized for web applications.
OCR Professional Module
OCR Professional Module is fast and robust. It delivers great accuracy with built-in image pre-processing (de-speckle, de-skew, autorotation), auto font matching, and more advanced imaging technology. Additionally, OCR Professional Module supports multi-thread processing.
OCR Basic Module
OCR Basic Module is developed on top of Tesseract, an intelligent learning open-source OCR engine sponsored by Google since 2006.
Language Support
The OCR Professional library currently supports English and 119 other western languages as well as Arabic.
See the complete supported language list ›OCR Basic Module currently supports 27 languages.
See the complete supported language list ›Deployment
Both the professional and basic engines support client-side and server-side deployment.
With server-side deployment
users upload the data to the server side for OCR processing. There is no need to download an OCR engine to a client machine. The downside of this approach is a lack of support for offline OCR.
- With the OCR Professional engine, you can deploy the OCR engine on your Windows server. There is no limit to the server-side programming language. You can use Java, .NET, or any other you prefer.
- OCR Basic Module supports both Windows Server and Linux Server. There is also no limit to the server-side language.
With a client-side deployment
users need to download and install the OCR module upon their first visit of a web page. This approach currently only supports Windows clients.
Input
The OCR Professional engine supports extracting text from the following file types:
TIFF (G4 / LZW / JPEG), JPEG, PDF, BMP, JPEG2000, JBIG, JBIG2, PNG, PDA, PGX, XPS, WMP, OPG, MAX, AWD, DCX, and PCX.
The Basic module supports BMP, JPG, PNG, TIF, and PDF.
Both engines support zonal OCR which significantly speeds up text recognition from scanned documents.
Output
The OCR Professional engine enables you to save OCR results in the following formats:
- Searchable PDFs (including PDF/A 1-b). Text-over-image technology supports multiple image compression formats to reduce the size of PDF files.
- Text files - TXT, CSV, XML, RTF
- String variable
- You can also get detailed position information as a part of an OCR result.
The Basic module supports exporting the result as a string, .txt, image-over-text PDF and pure-text PDF.
Licensing and Pricing
The OCR Professional engine is licensed on an annual basis, starting at $990/Year/300K pages.
The OCR Basic module offers a perpetual licensing option, starting at $2,997 as a one-time license fee.
Benefits
Robust Imaging Features
You can integrate a multitude of document imaging features all in one application, including:
- TWAIN scanning
- Webcam capture
- PDF rasterizer
- 1D & 2D barcode detection
Award-Winning Technical Support
Dynamsoft is committed to giving the best customer service and provides multiple support channels: phone call, live chat, email, online meeting, forums, kb, etc. Learn more ›