OCR SDK: Convert Image to Text
Dynamsoft has a complete C++ OCR library to assist you in converting images to text.
- With an optical character recognition (OCR) library, you can extract text from scanned images or PDF documents to edit, save, or reuse it.
- You can produce searchable PDF documents.
- Dynamsoft offers two OCR engines: OCR Professional Module (based on Kofax OmniPage) and OCR Basic Module (based on Tesseract).
- Both OCR libraries are optimized for web applications.
Online Demo: OCR PDF to Word
Please note that in the “output format” dropdown, there are many choices, such as text, pdf, and xml files. To save your result in a format which can be opened in Word, you need to choose “Formatted Text”.
OCR image to text
To convert images to editable and searchable text, An OCR (optical character recognition) engine is a must. With Dynamsoft OCR SDK, we can easily build a web application which opens a local image or PDF file, recognizes the text, and then saves the result as Formatted Text. The best thing about using an SDK is that developers get to integrate file format conversion in a business process seamlessly, and even make the conversion procedure automatic in a batch process. The OCR engine supports extracting text from the file types including: TIFF (G4 / LZW / JPEG), JPEG, PDF, BMP, JPEG2000, JBIG, JBIG2, PNG, PDA, PGX, XPS, WMP, OPG, MAX, AWD, DCX, PCX.
Convert pdf files to text
PDF files and Word documents are probably the most popular file formats. Conversion from Word to PDF is not too bad. We can simply choose PDF in the “save as type” list when saving the file. However, conversion from PDF to Word is more complicated, especially when the text in the PDF file is actually image-based or image-over-text. Dynamsoft’s OCR SDK can handle this easily for you.
Scan documents and convert to text
Document capture plays a vital role in many businesses, such as insurance, banking, healthcare, and etc. On extracting information from paper documents, document scanning and OCR are the two of the key procedures. While scanners turn paper to a digital format, an OCR engine converts the images to text and thus helps operators interpret the scanned documents. If you are looking for a solution to integrate document capture into your workflow, you may try Dynamsoft SDKs. Depending on whether you are build a web or .NET desktop application, you may choose between Dynamic Web TWAIN and Dynamic .NET TWAIN. Useful Resources:
- More samples of OCR
- Developer’s guide
- How Dynamsoft OCR SDK works
- Western and Arabic Language Support The OCR Professional library currently supports English and 119 other western languages as well as Arabic. See the complete supported language list →