Build a Cross-Platform SwiftUI Document Scanner for macOS and iOS

Previously, we created a cross-platform SwiftUI project, a 1D/2D barcode scanner app, for macOS and iOS using the Dynamsoft Capture Vision SDK. In this tutorial, we will continue to explore the functionalities of the Capture Vision SDK by building a document scanner app for macOS and iOS using SwiftUI. Utilizing the existing barcode SwiftUI project can expedite the development process.

What you’ll build: A cross-platform SwiftUI document scanner app that detects document edges in real time and produces a perspective-corrected image, running on both macOS and iOS with the Dynamsoft Capture Vision SDK.

Key Takeaways

  • A single SwiftUI codebase can deliver real-time document edge detection and perspective correction on both macOS and iOS using Dynamsoft Capture Vision SDK.
  • On macOS, the C++ Capture API processes raw pixel buffers and returns CNormalizedImageResultItem objects with four corner points and a normalized image.
  • On iOS, switching to PresetTemplate.detectAndNormalizeDocument is the only change needed to move from barcode scanning to document scanning.
  • The normalized output image requires a 90-degree rotation on iOS before it renders correctly in a SwiftUI Image view.

Common Developer Questions

  • How do I build a cross-platform SwiftUI document scanner for macOS and iOS?
  • How do I detect document edges and normalize a scanned image with Dynamsoft SDK in Swift?
  • Why does the captured document image appear rotated in SwiftUI on iOS?

Demo: Real-Time iOS Document Scanning

Prerequisites

Step 1: Configure the Document Detection Template on macOS

  1. Import the Barcode Scanner Project into Xcode.
  2. Open the template.h file in Xcode.
  3. Replace the string with the configuration from the DDN-PresetTemplates.json file. The JSON file is extracted from the Dynamsoft Capture Vision bundle for Python.

Step 2: Retrieve Document Edges and Normalized Image on macOS

After setting the document detection template, the Capture Vision Router object can return the document edges and the normalized image. The following code snippet demonstrates how to obtain the document edges and normalized image in the captureImageWithData method:

  1. Call the C++ method Capture with the image buffer address, width, height, stride, and pixel format to detect documents:
     CImageData *imageStruct =
       new CImageData(stride * height, (unsigned char *)baseAddress, width,
                      height, stride, sdkPixelFormat);
    
     CCapturedResult *result = cvr->Capture(imageStruct, "");
    
  2. Retrieve the detection results, which is an array of CNormalizedImageResultItem objects:
     CNormalizedImagesResult *documentResult = result->GetNormalizedImagesResult();
     int documentResultItemCount = documentResult->GetItemsCount();
    
     for (int j = 0; j < documentResultItemCount; j++) {
       const CNormalizedImageResultItem *documentResultItem =
           documentResult->GetItem(j);
     }
    
  3. Get the normalized image data and the four points of the document edges:
     const CImageData *imageData = documentResultItem->GetImageData();
     const unsigned char *bytes = imageData->GetBytes();
     unsigned long size = imageData->GetBytesLength();
     int width = imageData->GetWidth();
     int height = imageData->GetHeight();
     int stride = imageData->GetStride();
     ImagePixelFormat format = imageData->GetImagePixelFormat();
     NSImage *image = [self convertToNSImageWithBytes:bytes
                                                 size:size
                                                width:width
                                               height:height
                                               stride:stride
                                               format:format];
    
     CPoint *points = documentResultItem->GetLocation().points;
    
  4. Convert const unsigned char * to an NSImage object.

         - (NSImage *)convertToNSImageWithBytes:(const unsigned char *)bytes
                                     size:(unsigned long)size
                                     width:(int)width
                                     height:(int)height
                                     stride:(int)stride
                                     format:(ImagePixelFormat)format {
    
                 NSBitmapFormat bitmapFormat = 0;
                 int bitsPerPixel = 0;
                 int samplesPerPixel = 0;
                    
                 switch (format) {
                 case IPF_RGB_888:
                     bitmapFormat = 0;    
                     bitsPerPixel = 24;  
                     samplesPerPixel = 3; 
                     break;
                 case IPF_ARGB_8888:
                     bitmapFormat = NSBitmapFormatAlphaFirst;
                     bitsPerPixel = 32;  
                     samplesPerPixel = 4; 
                     break;
                 case IPF_GRAYSCALED:
                     bitmapFormat = 0;    
                     bitsPerPixel = 8;    
                     samplesPerPixel = 1; 
                     break;
                 default:
                     NSLog(@"Unsupported pixel format");
                     return nil;
                 }
    
                 NSBitmapImageRep *imageRep = [[NSBitmapImageRep alloc]
                     initWithBitmapDataPlanes:NULL
                                     pixelsWide:width
                                     pixelsHigh:height
                                 bitsPerSample:8
                             samplesPerPixel:samplesPerPixel
                                     hasAlpha:(samplesPerPixel == 4)
                                     isPlanar:NO
                                 colorSpaceName:NSCalibratedRGBColorSpace
                                 bitmapFormat:bitmapFormat
                                 bytesPerRow:stride
                                 bitsPerPixel:bitsPerPixel];
                 if (!imageRep) {
                     NSLog(@"Failed to create NSBitmapImageRep.");
                     return nil;
                 }
                    
                 memcpy([imageRep bitmapData], bytes, size);
                    
                 NSImage *image = [[NSImage alloc] initWithSize:NSMakeSize(width, height)];
                 [image addRepresentation:imageRep];
                    
                 return image;
             }
    
  5. Wrap the coordinates and image data in a dictionary and return it to the SwiftUI view.
     NSMutableArray *documentArray = [NSMutableArray array];
        
     NSDictionary *documentData = @{
       @"points" : @[
         @{@"x" : @(points[0][0]), @"y" : @(height - points[0][1])},
         @{@"x" : @(points[1][0]), @"y" : @(height - points[1][1])},
         @{@"x" : @(points[2][0]), @"y" : @(height - points[2][1])},
         @{@"x" : @(points[3][0]), @"y" : @(height - points[3][1])}
       ],
       @"image" : image
     };
    

Step 3: Detect and Normalize Documents on iOS

To support document detection on iOS, first, add the DynamsoftDocumentNormalizer package:

#if os(iOS)
    import UIKit
    import CoreGraphics
    import DynamsoftCameraEnhancer
    import DynamsoftCaptureVisionRouter
    import DynamsoftLicense
    import DynamsoftCodeParser
    import DynamsoftLabelRecognizer
    import DynamsoftDocumentNormalizer
    typealias ViewController = UIViewController
    typealias ImageType = UIImage
#elseif os(macOS)
    import Cocoa
    typealias ViewController = NSViewController
    typealias ImageType = NSImage
#endif

Then, invoke the capture method with the document detection template:

let result = cvr.captureFromBuffer(imageData, templateName: PresetTemplate.detectAndNormalizeDocument.rawValue)
var documentArray: [[String: Any]] = []
if let items = result.items, items.count > 0 {
  print("Decoded document Count: \(items.count)")

  for item in items {
      if item.type == .normalizedImage,
          let documentItem = item as? NormalizedImageResultItem
      {

          do {
              let image = try documentItem.imageData?.toUIImage()
              let points = documentItem.location.points

              let pointArray: [[String: CGFloat]] = points.compactMap { point in
                  guard let cgPoint = point as? CGPoint else { return nil }
                  return ["x": cgPoint.x, "y": cgPoint.y]
              }

              let rotatedImage = image!.rotate(byDegrees: 90)

              let documentData: [String: Any] = [
                  "image": rotatedImage!,
                  "points": pointArray,
              ]

              documentArray.append(documentData)
          } catch {
              print("Failed to convert image data to UIImage: \(error)")
          }

      }
  }
}

Explanation

  • The toUIImage() method converts the image data to a UIImage object.
  • To display the document image correctly, call rotate(byDegrees: 90) to rotate the image by 90 degrees.

Step 4: Display the Normalized Document in SwiftUI

Once a normalized document is returned by the Dynamsoft Capture Vision SDK, it can be displayed within a SwiftUI view.

  1. Create a ImageViewer.swift file, which contains an ImageViewer struct for displaying the document image:

     import SwiftUI
    
     struct ImageViewer: View {
         var image: ImageType
         @Binding var isShowingImage: Bool
        
         var body: some View {
             VStack {
                 imageView
                     .resizable()
                     .scaledToFit()
                     .onTapGesture {
                         isShowingImage = false
                     }
             }
             .edgesIgnoringSafeArea(.all)
             .toolbar {
                 ToolbarItem(placement: .automatic) {
                     Button("Back") {
                         isShowingImage = false
                     }
                 }
             }
             .navigationTitle("Photo")
             .padding()
         }
        
         var imageView: Image {
             #if os(iOS)
                 return Image(uiImage: image)
             #elseif os(macOS)
                 return Image(nsImage: image)
             #endif
         }
     }
        
    

    For macOS, the image type is NSImage, while for iOS, it is UIImage.

  2. Add a button and the ImageViewer into the ContentView. Observe the image state to display the ImageViewer:

     import SwiftUI
    
     struct ContentView: View {
         @State private var image: ImageType?
         @State private var shouldCapturePhoto = false
         @State private var isShowingImage = false
        
         var body: some View {
             ZStack {
                 if isShowingImage, let capturedImage = image {
                     ImageViewer(image: capturedImage, isShowingImage: $isShowingImage)
                 } else {
                     CameraView(image: $image, shouldCapturePhoto: $shouldCapturePhoto)
                         .edgesIgnoringSafeArea(.all)
        
                     VStack {
                         Spacer()
                         Button(action: {
                             shouldCapturePhoto = true
                         }) {
                             Circle()
                                 .fill(Color.white)
                                 .frame(width: 70, height: 70)
                                 .overlay(
                                     Circle()
                                         .stroke(Color.black.opacity(0.8), lineWidth: 2)
                                 )
                                 .shadow(radius: 10)
                         }
                         .padding(.bottom, 40)
                     }
                 }
             }.onAppear {
        
             }
        
             #if os(iOS)
                 .onChange(of: image) { _ in
                     if image != nil {
                         isShowingImage = true
                     }
                 }
             #elseif os(macOS)
                 .onChange(of: image) {
                     if image != nil {
                         isShowingImage = true
                     }
                 }
             #endif
         }
     }
    
    
  3. In CameraView.swift, register the onImageCaptured event to update the image and shouldCapturePhoto state:

       import AVFoundation
       import SwiftUI
          
       #if os(iOS)
           struct CameraView: UIViewControllerRepresentable {
               @Binding var image: ImageType?
               @Binding var shouldCapturePhoto: Bool
          
               ...
          
               func makeUIViewController(context: Context) -> CameraViewController {
                   let cameraViewController = CameraViewController()
                   cameraViewController.onImageCaptured = { capturedImage in
                       DispatchQueue.main.async {
                           self.image = capturedImage
                           self.shouldCapturePhoto = false
                       }
                   }
                   context.coordinator.cameraViewController = cameraViewController
                   return cameraViewController
               }
          
               func updateUIViewController(_ uiViewController: CameraViewController, context: Context) {
                   if shouldCapturePhoto {
                       uiViewController.capturePhoto()
                   }
               }
          
               ...
           }
       #elseif os(macOS)
           struct CameraView: NSViewControllerRepresentable {
               @Binding var image: ImageType?
               @Binding var shouldCapturePhoto: Bool
          
               ...
          
               func makeNSViewController(context: Context) -> CameraViewController {
                   let cameraViewController = CameraViewController()
                   cameraViewController.onImageCaptured = { capturedImage in
                       DispatchQueue.main.async {
                           self.image = capturedImage
                           self.shouldCapturePhoto = false
                       }
                   }
                   context.coordinator.cameraViewController = cameraViewController
                   return cameraViewController
               }
          
               func updateNSViewController(_ nsViewController: CameraViewController, context: Context) {
                   if shouldCapturePhoto {
                       nsViewController.capturePhoto()
                   }
               }
          
               ...
           }
       #endif
    
  4. Trigger the onImageCaptured event in the CameraViewController.swift when a document is detected:

     class CameraViewController: ViewController, AVCapturePhotoCaptureDelegate,
     AVCaptureVideoDataOutputSampleBufferDelegate
     {
         ...
         var onImageCaptured: ((ImageType) -> Void)?
         var isCaptureEnabled = false
        
         func capturePhoto() {
             isCaptureEnabled = true
         }
        
         func processCameraFrame(_ pixelBuffer: CVPixelBuffer) {
             ...
                    
             #if os(iOS)
                 DispatchQueue.main.async { [weak self] in
                     guard let self = self else { return }
                     self.overlayView.documentData = documentArray
                     self.overlayView.setNeedsDisplay()
                     if isCaptureEnabled && documentArray.count > 0 {
                         onImageCaptured?(documentArray[0]["image"] as! ImageType)
                         isCaptureEnabled = false
                     }
                 }
    
             #elseif os(macOS)
    
                 DispatchQueue.main.async { [weak self] in
                     guard let self = self else { return }
                     self.overlayView.documentData = documentArray
                     self.overlayView.setNeedsDisplay(self.overlayView.bounds)  
                     if isCaptureEnabled && documentArray.count > 0 {
                         onImageCaptured?(documentArray[0]["image"] as! ImageType)
                         isCaptureEnabled = false
                     }
                 }
             #endif
         }
     }
    

Step 5: Run the Document Scanner on iOS and macOS

  1. Select a target device in Xcode and run the document scanner app.
  2. Detect document edges in real-time using the camera preview:

    iOS document scanner in SwiftUI

  3. Press the button to rectify the document:

    perspective correction for a document

Common Issues & Edge Cases

  • Normalized image appears rotated on iOS. The camera buffer orientation differs between macOS and iOS. On iOS, call rotate(byDegrees: 90) on the result image before displaying it in a SwiftUI view.
  • Document edges not detected in low-light conditions. Ensure sufficient lighting on the document. Detection accuracy drops when contrast between the document and its background is low.
  • captureFromBuffer returns zero items. Verify that the correct preset template (PresetTemplate.detectAndNormalizeDocument) is being used and that the license key is valid and not expired.

Source Code

https://github.com/yushulx/ios-swiftui-barcode-mrz-document-scanner/tree/main/examples/document