Auto-Deskew Scanned Documents with OpenCV and Python: Step-by-Step Code Example

Scanned documents often contain skewed or crooked images. They do not look good and are not friendly for OCR.

In this article, we are going to use OpenCV and Python to deskew scanned documents based on text lines.

What you’ll build: A Python script that detects and corrects skew in scanned document images using OpenCV contour analysis and affine rotation, plus a browser-based alternative with Dynamic Web TWAIN.

Key Takeaways

  • OpenCV can auto-detect document skew angles by analyzing dilated text-line contours with minAreaRect and taking the median rotation angle.
  • Affine transformation via cv2.warpAffine rotates the scanned image to correct skew without cropping content.
  • Pre-processing steps — grayscale conversion, Gaussian blur, Otsu thresholding, and morphological dilation — are essential to isolate text lines for accurate angle detection.
  • For browser-based workflows, Dynamic Web TWAIN provides a built-in GetSkewAngle + Rotate API that handles deskewing without custom image processing code.

Common Developer Questions

  • How do I auto-deskew a scanned document image with OpenCV in Python?
  • Why is my OpenCV deskew returning the wrong angle, and how do I fix it?
  • What is the easiest way to deskew scanned documents in a web browser with JavaScript?

Prerequisites

  • Python 3.6 or later
  • OpenCV installed (pip install opencv-python)
  • To try the browser-based approach with Dynamic Web TWAIN, get a 30-day free trial license.

Step-by-Step: Deskew a Scanned Document with OpenCV and Python

We are going to write a Python script to deskew the following sample image.

document

Step 1: Normalize the Scanned Image for Processing

  1. Scanned images are sharp. We can convert the image to grayscale and blur the image first.

    img = cv2.imread(path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    gray = cv2.GaussianBlur(gray, (9, 9), 0)
    

    blur

  2. Resize the image with a fixed height.

    resized_height = 480
    percent = resized_height / len(img)
    resized_width = int(percent * len(img[0]))
    gray = cv2.resize(gray,(resized_width,resized_height))
    

    resized

  3. Draw a rectangle around the border to remove border lines.

    start_point = (0, 0) 
    end_point = (gray.shape[0], gray.shape[1]) 
    color = (255, 255, 255) 
    thickness = 10
    gray = cv2.rectangle(gray, start_point, end_point, color, thickness) 
    

    cropped

  4. Invert the image, since we have to process the text.

    gray = cv2.bitwise_not(gray)
    

    inverted

  5. Run thresholding to get a binary image.

    thresh = cv2.threshold(gray, 0, 255,
            cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
    

    thresh

  6. Dilate the text to make the text lines more obvious.

    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (30, 5))
    dilate = cv2.dilate(thresh, kernel)
    

    dilate

Step 2: Detect the Skew Angle from Text Lines

  1. Find all the contours based on the dilated image.

    contours, hierarchy = cv2.findContours(dilate, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
    
  2. Use minAreaRect to get the rotation angles of contours.

    angles = []
    for contour in contours:
        minAreaRect = cv2.minAreaRect(contour)
        angle = minAreaRect[-1]
        if angle != 90.0 and angle != -0.0: #filter out 0 and 90
            angles.append(angle)
    
  3. Use the median as the skewed angle.

    angles.sort()
    mid_angle = angles[int(len(angles)/2)]
    

Step 3: Rotate the Image to Correct the Skew

After getting the skewed angle, we can perform affine transformation to get the deskewed image.

if angle > 45: #anti-clockwise
        angle = -(90 - angle)
height = original.shape[0]
width = original.shape[1]
m = cv2.getRotationMatrix2D((width / 2, height / 2), angle, 1)
deskewed = cv2.warpAffine(original, m, (width, height), borderValue=(255,255,255))

deskewed

Deskew Scanned Documents in the Browser with Dynamic Web TWAIN

There are other tools which have the ability to deskew document images. Dynamic Web TWAIN is a JavaScript library to enable document scanning in the browser. It can scan documents from physical scanners via protocols like TWAIN, WIA, SANE and ICA and has a deskew function built-in.

The following is the code snippet to perform deskewing of a scanned document image.

function Deskew(index) {
  return new Promise((resolve, reject) => {
    DWObject.GetSkewAngle(
      index,
      function(angle) {
        console.log("skew angle: " + angle);
        DWObject.Rotate(index, angle, true,
          function() {
            console.log("Successfully deskewed an image!");
            resolve();
          },
          function(errorCode, errorString) {
            console.log(errorString);
            reject(errorString);
          }
        );
      },
      function(errorCode, errorString) {
        console.log(errorString);
        reject(errorString);
      }
    );
  })
}

You can use this online demo to have a try. It can also load image or PDF files and save documents in a PDF file.

Common Issues and Edge Cases

  • Wrong skew angle on documents with few text lines: If the document has large images, tables, or very few text lines, the contour-based median angle may be unreliable. Filter contours by minimum area (e.g., cv2.contourArea(contour) > 100) to exclude noise before calculating the angle.
  • Black borders appear after rotation: cv2.warpAffine fills empty pixels with borderValue. If you see black edges, make sure you set borderValue=(255,255,255) for white-background documents. For large skew angles (> 5°), consider cropping the result.
  • Angle off by 90°: OpenCV’s minAreaRect returns angles in the range [−90°, 0°). A nearly-vertical rectangle may report −89° instead of −1°. The if angle > 45 guard in Step 3 handles this, but verify the direction by testing on a known-skewed sample first.

Source Code

You can find all the code in the following repo:

https://github.com/tony-xlh/deskew