How to OCR PDF in a .NET Desktop Application

Convert a PDF file to an image and read text from the image in C# / VB.NET

Recently, several customers have asked us if it’s possible to use Dynamic .NET TWAIN to convert a PDF file to an image, and then extract text from it, all in a .NET desktop app. In this article I’ll provide some samples that show how to do this using Dynamic .NET TWAIN with our PDF Rasterizer and OCR add-ons. The solution works in both WinForms and WPF applications.

All the samples provided below (both C# and VB.NET) are included in the 30-day free trial installer of Dynamic .NET TWAIN.
Get 30-day free trial now

Read a PDF file and convert it to image

PDF Rasterizer

With the PDF Rasterizer add-on of Dynamic .NET TWAIN, you can load PDF files from your local disk as images, then display them in the Dynamic .NET TWAIN viewer. You can also specify a resolution for the image.

C# code snippet

        private void btnLoadPDF_Click(object sender, EventArgs e)
        {
            try
            {
                OpenFileDialog openfiledlg = new OpenFileDialog();
                openfiledlg.Filter = "PDF|*.PDF";
                openfiledlg.FilterIndex = 0;
                openfiledlg.Multiselect = true;

                if (openfiledlg.ShowDialog() == DialogResult.OK)
                {
                    foreach (string strfilename in openfiledlg.FileNames)
                    {
                        this.dynamicDotNetTwain1.ConvertPDFToImage(strfilename, float.Parse(cmbPDFResolution.SelectedItem.ToString()));
                    }
                }
            }
            catch(Exception exp)
            {
                MessageBox.Show(exp.Message);
            }
        }

Perform OCR

With the OCR add-on of Dynamic .NET TWAIN, you can extract text from images and save the OCR result in a text  or PDF file (text or image over text) in your .NET application.  The .NET OCR SDK supports  40+ languages, including Arabic, Chinese and more.

C# code snippet

private void OCR(bool isOcrOnRectangleArea)
        {
            string languageFolder = m_strCurrentDirectory;
            if (m_bSamplesExist)
            {
                languageFolder = m_strCurrentDirectory + @"Samples\Bin\"; 
            }
            //specify the tessdata folder with the language package
            this.dynamicDotNetTwain1.OCRTessDataPath = languageFolder;
            //specify the language for OCR
            this.dynamicDotNetTwain1.OCRLanguage = languages[this.cbxOCRLanguage.Text]; 
            //specify the file format to store OCR result: text file, text or image over text PDF file
            this.dynamicDotNetTwain1.OCRResultFormat = (Dynamsoft.DotNet.TWAIN.OCR.ResultFormat)this.ddlResultFormat.SelectedIndex;


            string strDllPath = m_strCurrentDirectory;
            if (m_bSamplesExist)
            {
                strDllPath = m_strCurrentDirectory + @"Redistributable\OCRResources\";
            }

            this.dynamicDotNetTwain1.OCRDllPath = strDllPath; 


            if (this.dynamicDotNetTwain1.CurrentImageIndexInBuffer < 0)
            {
                MessageBox.Show("Please load an image before doing OCR!", "Index out of bounds", MessageBoxButtons.OK, MessageBoxIcon.Warning);
                return;
            }

            byte[] sbytes = null;
            if (!isOcrOnRectangleArea)
                sbytes = this.dynamicDotNetTwain1.OCR(this.dynamicDotNetTwain1.CurrentSelectedImageIndicesInBuffer);
            else
                sbytes = this.dynamicDotNetTwain1.OCR(dynamicDotNetTwain1.CurrentImageIndexInBuffer, int.Parse(tbxLeft.Text),
                    int.Parse(tbxTop.Text), int.Parse(tbxRight.Text), int.Parse(tbxButtom.Text));

            if (sbytes != null && sbytes.Length > 0)
            {
                SaveFileDialog filedlg = new SaveFileDialog();
                if (this.ddlResultFormat.SelectedIndex != 0)
                {
                    filedlg.Filter = "PDF File(*.pdf)| *.pdf";
                }
                else
                {
                    filedlg.Filter = "Text File(*.txt)| *.txt";
                }
                if (filedlg.ShowDialog() == DialogResult.OK)
                {                   
                    File.WriteAllBytes(filedlg.FileName, sbytes);    
                }
            }
            else
            {
                if(this.dynamicDotNetTwain1.ErrorCode != 0)
                    MessageBox.Show(this.dynamicDotNetTwain1.ErrorString);
            }
        }

Get samples

You can download and try the .NET PDF samples above from the Dynamsoft sample gallery.

Subscribe Newsletter

Subscribe to our mailing list to get the monthly update.

Subscribename@email.com