How to Reduce Scanned Document File Size: DPI, Bit Depth, and Compression Explained
Scanning documents into a digital form can help companies save physical space, improve data retrieval, reduce costs, improve collaboration, etc.
Sometimes, we may come across very large scanned document files or files scanned with a low visual quality. In this article, we are going to talk about aspects related to the size optimization of scanned documents.
Samples using Dynamic Web TWAIN, a JavaScript library which enables documents scanning from browsers, are created for demonstration.
What you’ll build: A web-based document scanning workflow that reduces scanned file sizes by up to 96% using optimized DPI, bit depth, and compression settings with Dynamic Web TWAIN.
Key Takeaways
- Scanned document file size is determined by three factors: DPI (resolution), bit depth (color depth), and the image compression format.
- JPEG compression achieves up to 97% size reduction for color and grayscale scans, while TIFF with CCITT Group 4 is optimal for black-and-white documents.
- PDF containers with automatic per-page compression (JBIG2 for B&W, JPEG for color) deliver the best multi-page results — 96% compression in real-world tests.
- Dynamic Web TWAIN lets you control DPI, pixel type, bit depth, and output format programmatically in JavaScript.
Common Developer Questions
- How do I reduce scanned document file size programmatically in JavaScript?
- What is the best image format and compression for scanned documents — JPEG, PNG, TIFF, or PDF?
- How does DPI affect scanned document file size and quality?
Prerequisites
To follow along with the code samples, you need:
- A modern web browser (Chrome, Firefox, Edge).
- A TWAIN-compatible document scanner connected to your computer.
- A trial license for Dynamic Web TWAIN. Get a 30-day free trial license to get started.
How Scanned Image File Sizes Are Calculated
Before we get started, let’s learn about how image sizes are calculated.
image bytes = width * height * bit depth / 8
Basically, it is affected by the number of pixels and how many colors it can represent.
How DPI Affects Scanned Document File Size
DPI stands for dots per inch and is a measure of the resolution of a scanned document or photo. The higher the DPI, the higher the quality and resolution of your scan and the resulting image.
Generally, scanning a document at about 300 DPI produces an image with a reasonable size and good quality.
In Dynamic Web TWAIN, we can specify the DPI in the device configuration:
DWObject.AcquireImageAsync({IfShowUI:false,Resolution:300});
Here is a test result of scanning documents with different DPIs.
| DPI | Resolution | Size |
|---|---|---|
| 100 | 850x1100 | 2741.41KB |
| 200 | 1700x2200 | 10957.03KB |
| 300 | 2550x3300 | 24659.77KB |
| 600 | 5100x6600 | 98613.28KB |
How Bit Depth and Color Mode Affect File Size
Bit depth, also known as color depth, is the number of bits used to indicate the color of a single pixel. The bigger the bit depth, the more colors a pixel can represent.
Most document scanners provide the option to scan documents in different color modes: black & white, gray and color.
Their bit depths are as follows:
- black & white: 1 bit.
- gray: 8 bit.
- color: 24 bit.
In Dynamic Web TWAIN, we can specify the color mode (or pixel type) in the device configuration:
let pixelType = Dynamsoft.DWT.EnumDWT_PixelType.TWPT_BW;
DWObject.AcquireImageAsync({IfShowUI:false,PixelType:pixelType});
We can also set the bit depth with the following code:
let imageIndex = 0;
let bitDepth = 4;
let highQuality = false;
DWObject.ChangeBitDepth(imageIndex,bitDepth,highQuality);
Here is a test result of scanning documents with different color modes.
| Pixel Type | Resolution | Bitdepth | Size |
|---|---|---|---|
| B&W | 2550x3507 | 1 | 1095.94KB |
| Gray | 2550x3507 | 8 | 8740.10KB |
| Color | 2550x3507 | 24 | 26206.61KB |
How to Compress Scanned Images with JPEG, PNG, and TIFF
We can compress the image data with different image file formats like JPEG and PNG.
In Dynamic Web TWAIN, we can get the image sizes with the following code:
let imageIndex = 0;
let width = DWObject.GetImageWidth(imageIndex);
let height = DWObject.GetImageHeight(imageIndex);
let originalSize = DWObject.GetImageSize(imageIndex,width,height);
let size = DWObject.GetImageSizeWithSpecifiedType(imageIndex,j); //size after compression with a format
Here is a test result of scanning documents in different color modes and image formats.
| Pixel Type | Resolution | Bitdepth | Size | Format | Compression Rate |
|---|---|---|---|---|---|
| B&W | 2550x3507 | 1 | 1096.00KB | BMP | 0% |
| B&W | 2550x3507 | 1 | 705.89KB | JPG | 35.59% |
| B&W | 2550x3507 | 1 | 38.09KB | TIF | 96.52% |
| B&W | 2550x3507 | 1 | 77.45KB | PNG | 92.93% |
| Gray | 2550x3507 | 8 | 8741.15KB | BMP | 0% |
| Gray | 2550x3507 | 8 | 590.68KB | JPG | 93.24% |
| Gray | 2550x3507 | 8 | 1957.32KB | TIF | 77.61% |
| Gray | 2550x3507 | 8 | 1574.74KB | PNG | 81.98% |
| Color | 2550x3507 | 24 | 26206.66KB | BMP | 0% |
| Color | 2550x3507 | 24 | 665.43KB | JPG | 97.46% |
| Color | 2550x3507 | 24 | 5112.49KB | TIF | 80.49% |
| Color | 2550x3507 | 24 | 3753.82KB | PNG | 85.68% |
We can draw some points from the table:
- BMP is a lossless format which does not compress the image.
- JPEG does not work well for black & white images but it has the best compression rate for gray and color images.
How to Optimize Multi-Page Scanned Documents with TIFF and PDF
Most of the time, we need to scan documents in multiple pages. We can use TIFF or PDF as the container to save the images in one file. TIFF and PDF support many image formats and compression methods.
Since different compression methods work differently for different color modes, Dynamic Web TWAIN uses the following compression strategies to achieve an optimum result.
For TIFF, it uses the following strategy:
- For 1-bit images, use TIFF_T6.
- For other images, use TIFF_LZW.
For PDF, it uses the following strategy:
- For 1-bit images, if the PDF version is over 1.4, use JBIG2 encoding, otherwise, use FAX4 (CCITT Group 4 Fax).
- For 8-bit images, if the image is grayscale, use JPEG encoding, otherwise, use LZW (Lempel-Ziv-Welch).
- For 24-bit and 32-bit images, use JPEG encoding.
Here is a test result of scanning documents in TIFF and PDF formats. It scans a document in black & white, gray and color.
| Format | Original Size | Size | Compression Rate | Link |
|---|---|---|---|---|
| TIFF | 36042.64KB | 7109.21KB | 80.28% | Download |
| 36042.64KB | 1283.96KB | 96.44% | Download |
Common Issues and Edge Cases
- JPEG artifacts on B&W scans: JPEG compression is designed for continuous-tone images. Applying it to 1-bit black-and-white scans produces visible artifacts and poor compression (only ~35%). Use TIFF with CCITT Group 4 or PNG instead.
- Oversized files from default scanner settings: Many scanners default to 600 DPI and 24-bit color. If the document is text-only, switching to 300 DPI and B&W or grayscale can reduce file size by over 90% with no meaningful quality loss.
- PDF version compatibility: JBIG2 encoding for B&W pages in PDF requires PDF version 1.4 or higher. If you need to support older PDF readers, the encoder falls back to FAX4 (CCITT Group 4), which still provides strong compression for 1-bit images.
Frequently Asked Questions
What is the best DPI setting for scanning documents?
300 DPI is the standard recommendation for most document scanning workflows. It balances file size and readability well. Use 600 DPI only when you need to capture fine print or detailed graphics.
Why are my scanned PDFs so large?
Scanned PDFs are large because each page is stored as a raster image. The file size depends on the scan resolution (DPI), color mode (B&W vs. color), and whether compression is applied. Switching from 600 DPI color to 300 DPI grayscale with JPEG compression can reduce a single page from ~98 MB to under 1 MB.
Should I use JPEG or PNG for scanned documents?
Use JPEG for grayscale and color document scans — it achieves 93–97% compression. Use PNG or TIFF for black-and-white scans where JPEG performs poorly. For multi-page documents, use PDF with automatic per-page compression for the best results.
Source Code
You can find all the code and online demos in the following repo: