OCR SDK: Convert Image to Text

Please note that in the “output format” dropdown, there are many choices, such as text, pdf, and xml files. To save your result in a format which can be opened in Word, you need to choose “Formatted Text”.

OCR image to text

To convert images to editable and searchable text, An OCR (optical character recognition) engine is a must.

With Dynamsoft OCR SDK, we can easily build a web application which opens a local image or PDF file, recognizes the text, and then saves the result as Formatted Text. The best thing about using an SDK is that developers get to integrate file format conversion in a business process seamlessly, and even make the conversion procedure automatic in a batch process.

The OCR engine supports extracting text from the file types including: TIFF (G4 / LZW / JPEG), JPEG, PDF, BMP, JPEG2000, JBIG, JBIG2, PNG, PDA, PGX, XPS, WMP, OPG, MAX, AWD, DCX, PCX

Convert pdf files to text

PDF files and Word documents are probably the most popular file formats nowadays. Conversion from Word to PDF is not too bad. We can simply choose PDF in the “save as type” list on Saving. However, conversion from PDF to Word is more complicated, especially when the text in the PDF file is actually image-based or image-over-text.

Scan documents and convert to text

Document capture plays a vital role in many businesses, such as insurance, banking, healthcare, and etc. On extracting information from paper documents, document scanning and OCR are the two of the key procedures. While scanners turn paper to a digital format, an OCR engine converts the images to text and thus helps operators interpret the scanned documents.

If you are looking for a solution to integrate document capture into your workflow, you may try Dynamsoft SDKs. Depending on whether you are build a web or .NET desktop application, you may choose between Dynamic Web TWAIN and Dynamic .NET TWAIN.

