MRZ Text Recognition with Jetpack Compose and CameraX

Jetpack Compose is Android’s recommended modern toolkit for building native UI. It simplifies and accelerates UI development on Android.

In this article, we are going to create an MRZ text scanner in Jetpack Compose with CameraX for camera access and Dynamsoft Label Recognizer to perform OCR.

Note: MRZ stands for machine-readable zone. We can find it on ID cards, Visa cards and Passports. It is a special zone designed for machines to get the info of its owner.

Here is a video of the final result:

New Project

Open Android Studio and create a new project with an empty compose activity.

Add Dependencies

  1. Open settings.gradle to add Dynamsoft’s maven repository.

     dependencyResolutionManagement {
         repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
         repositories {
             google()
             mavenCentral()
    +        maven {
    +            url "https://download2.dynamsoft.com/maven/aar"
    +        }
         }
     }
    
  2. Add CameraX and Dynamsoft Label Recognizer to the module’s build.gradle.

    // CameraX core library using the camera2 implementation
    def camerax_version = "1.4.0-alpha02"
    implementation "androidx.camera:camera-core:${camerax_version}"
    implementation "androidx.camera:camera-camera2:${camerax_version}"
    implementation "androidx.camera:camera-lifecycle:${camerax_version}"
    implementation "androidx.camera:camera-view:${camerax_version}"
        
    implementation 'com.dynamsoft:dynamsoftlabelrecognizer:2.2.20'
    

Request Camera Permission

We have to request camera permission to use the camera.

  1. Declare the camera permission in AndroidManifest.xml.

    <uses-feature
        android:name="android.hardware.camera"
        android:required="false" />
    <uses-permission android:name="android.permission.CAMERA" />
    
  2. In MainActivity.kt, add a hasCameraPermission state.

    var hasCamPermission by remember {
        mutableStateOf(
            ContextCompat.checkSelfPermission(
                context,
                Manifest.permission.CAMERA
            ) == PackageManager.PERMISSION_GRANTED
        )
    }
    
  3. Define a launcher to request the permission.

    setContent {
        MRZScannerTheme {
            val context = LocalContext.current
            val launcher = rememberLauncherForActivityResult(
                contract = ActivityResultContracts.RequestPermission(),
                onResult = { granted ->
                    hasCamPermission = granted
                }
            )
        }
    }
    
  4. The launcher is called when the app starts using LaunchedEffect.

    setContent {
        MRZScannerTheme {
            //...
            LaunchedEffect(key1 = true){
                launcher.launch(Manifest.permission.CAMERA)
            }
        }
    }
    

Open Camera Preview

Next, let’s open the camera and display the preview.

Add the PreviewView to do this. It is mounted only after the camera permission is granted.

Column(
    modifier = Modifier.fillMaxSize()
) {
    if (hasCamPermission) {
        AndroidView(
            factory = { context ->
                val previewView = PreviewView(context)
                val preview = Preview.Builder().build()
                val selector = CameraSelector.Builder()
                    .requireLensFacing(CameraSelector.LENS_FACING_BACK)
                    .build()
                preview.setSurfaceProvider(previewView.surfaceProvider)
                
                try {
                    cameraProviderFuture.get().bindToLifecycle(
                        lifecycleOwner,
                        selector,
                        preview
                    )
                } catch (e: Exception) {
                    e.printStackTrace()
                }
                previewView
            },
            modifier = Modifier.weight(1f).padding(bottom = 25.dp)
        )
    }
}

Use Dynamsoft Label Recognizer to Recognize MRZ Text from Bitmap

We need to set up Dynamsoft Label Recognizer as an MRZ recognizer to recognize MRZ from camera frames as Bitmap.

  1. Create a new Java class named MRZRecognizer.

    public class MRZRecognizer {
        private Context context;
        private LabelRecognizer labelRecognizer;
        public MRZRecognizer(Context context){
            this.context = context;
        }
    }
    
  2. Copy the MRZ model files in the assets folder. You can find the model files here.

  3. Activate with a license. You can apply for a license here.

    private void init() {
        LicenseManager.initLicense("DLS2eyJoYW5kc2hha2VDb2RlIjoiMjAwMDAxLTE2NDk4Mjk3OTI2MzUiLCJvcmdhbml6YXRpb25JRCI6IjIwMDAwMSIsInNlc3Npb25QYXNzd29yZCI6IndTcGR6Vm05WDJrcEQ5YUoifQ==", context, new LicenseVerificationListener() {
            @Override
            public void licenseVerificationCallback(boolean isSuccess, CoreException error) {
                if(!isSuccess){
                    error.printStackTrace();
                }
            }
        });
    }
    
  4. Create a new instance of Label Recognizer and update its runtime settings as an MRZ recognizer with a JSON template.

    private void init() {
        try {
            labelRecognizer = new LabelRecognizer();
            updateRuntimeSettingsForMRZ();
        } catch (LabelRecognizerException e) {
            throw new RuntimeException(e);
        }
    }
       
    private void updateRuntimeSettingsForMRZ() throws LabelRecognizerException {
        labelRecognizer.initRuntimeSettings("{\"CharacterModelArray\":[{\"DirectoryPath\":\"\",\"Name\":\"MRZ\"}],\"LabelRecognizerParameterArray\":[{\"Name\":\"default\",\"ReferenceRegionNameArray\":[\"defaultReferenceRegion\"],\"CharacterModelName\":\"MRZ\",\"LetterHeightRange\":[5,1000,1],\"LineStringLengthRange\":[30,44],\"LineStringRegExPattern\":\"([ACI][A-Z<][A-Z<]{3}[A-Z0-9<]{9}[0-9][A-Z0-9<]{15}){(30)}|([0-9]{2}[(01-12)][(01-31)][0-9][MF<][0-9]{2}[(01-12)][(01-31)][0-9][A-Z<]{3}[A-Z0-9<]{11}[0-9]){(30)}|([A-Z<]{0,26}[A-Z]{1,3}[(<<)][A-Z]{1,3}[A-Z<]{0,26}<{0,26}){(30)}|([ACIV][A-Z<][A-Z<]{3}([A-Z<]{0,27}[A-Z]{1,3}[(<<)][A-Z]{1,3}[A-Z<]{0,27}){(31)}){(36)}|([A-Z0-9<]{9}[0-9][A-Z<]{3}[0-9]{2}[(01-12)][(01-31)][0-9][MF<][0-9]{2}[(01-12)][(01-31)][0-9][A-Z0-9<]{8}){(36)}|([PV][A-Z<][A-Z<]{3}([A-Z<]{0,35}[A-Z]{1,3}[(<<)][A-Z]{1,3}[A-Z<]{0,35}<{0,35}){(39)}){(44)}|([A-Z0-9<]{9}[0-9][A-Z<]{3}[0-9]{2}[(01-12)][(01-31)][0-9][MF<][0-9]{2}[(01-12)][(01-31)][0-9][A-Z0-9<]{14}[A-Z0-9<]{2}){(44)}\",\"MaxLineCharacterSpacing\":130,\"TextureDetectionModes\":[{\"Mode\":\"TDM_GENERAL_WIDTH_CONCENTRATION\",\"Sensitivity\":8}],\"Timeout\":9999}],\"LineSpecificationArray\":[{\"BinarizationModes\":[{\"BlockSizeX\":30,\"BlockSizeY\":30,\"Mode\":\"BM_LOCAL_BLOCK\",\"MorphOperation\":\"Close\"}],\"LineNumber\":\"\",\"Name\":\"defaultTextArea->L0\"}],\"ReferenceRegionArray\":[{\"Localization\":{\"FirstPoint\":[0,0],\"SecondPoint\":[100,0],\"ThirdPoint\":[100,100],\"FourthPoint\":[0,100],\"MeasuredByPercentage\":1,\"SourceType\":\"LST_MANUAL_SPECIFICATION\"},\"Name\":\"defaultReferenceRegion\",\"TextAreaNameArray\":[\"defaultTextArea\"]}],\"TextAreaArray\":[{\"Name\":\"defaultTextArea\",\"LineSpecificationNameArray\":[\"defaultTextArea->L0\"]}]}");
        loadModel();
    }
    
  5. We also need to load the model files for MRZ.

    String modelFolder = "MRZ";
    String modelFileName = "MRZ";
    try {
        AssetManager manager = context.getAssets();
        InputStream isPrototxt = manager.open(modelFolder+"/"+modelFileName+".prototxt");
        byte[] prototxt = new byte[isPrototxt.available()];
        isPrototxt.read(prototxt);
        isPrototxt.close();
        InputStream isCharacterModel = manager.open(modelFolder+"/"+modelFileName+".caffemodel");
        byte[] characterModel = new byte[isCharacterModel.available()];
        isCharacterModel.read(characterModel);
        isCharacterModel.close();
        InputStream isTxt = manager.open(modelFolder+"/"+modelFileName+".txt");
        byte[] txt = new byte[isTxt.available()];
        isTxt.read(txt);
        isTxt.close();
        labelRecognizer.appendCharacterModelBuffer(modelFileName, prototxt, txt, characterModel);
    } catch (Exception e) {
        Log.d("DLR","Failed to load model");
        e.printStackTrace();
    }
    
  6. Add a public method to recognize text from a bitmap.

    public DLRResult[] recognizeBitmap(Bitmap bitmap) throws LabelRecognizerException {
        return labelRecognizer.recognizeImage(bitmap);
    }
    

Create an Image Analyser for CameraX to Read MRZ from Camera Frames

CameraX needs an image analyzer instance to process camera frames. Here, we can create a new MRZAnalyzer class extending CameraX’s ImageAnalysis.Analyzer

import android.content.Context
import androidx.annotation.OptIn
import androidx.camera.core.ExperimentalGetImage
import androidx.camera.core.ImageAnalysis
import androidx.camera.core.ImageProxy
import com.dynamsoft.dlr.DLRResult

class MRZAnalyzer(
    private val onMRZScanned: (Array<DLRResult>) -> Unit,
    private val context: Context
): ImageAnalysis.Analyzer {
    private var mrzRecognizer:MRZRecognizer = MRZRecognizer(context)

    @OptIn(ExperimentalGetImage::class)
    override fun analyze(image: ImageProxy) {
        try {
            val bitmap = BitmapUtils.getBitmap(image)
            if (bitmap != null) {
                val results = mrzRecognizer.recognizeBitmap(bitmap)
                if (results.isNotEmpty()) {
                    onMRZScanned(results);
                }
            }
        } catch(e: Exception) {
            e.printStackTrace()
        } finally {
            image.close()
        }
    }
}

We have to convert the camera frames in the ImageProxy format into a bitmap.

The BitmapUtils class used:

package com.tonyxlh.mrzscanner;

import android.annotation.TargetApi;
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.graphics.ImageFormat;
import android.graphics.Matrix;
import android.graphics.Rect;
import android.graphics.YuvImage;
import android.media.Image;
import android.media.Image.Plane;
import android.os.Build.VERSION_CODES;
import androidx.annotation.Nullable;
import android.util.Log;
import androidx.annotation.RequiresApi;
import androidx.camera.core.ExperimentalGetImage;
import androidx.camera.core.ImageProxy;
import java.io.ByteArrayOutputStream;
import java.nio.ByteBuffer;

/** Utils functions for bitmap conversions. */
public class BitmapUtils {
    private static final String TAG = "BitmapUtils";

    /** Converts NV21 format byte buffer to bitmap. */
    @Nullable
    public static Bitmap getBitmap(ByteBuffer data, FrameMetadata metadata) {
        data.rewind();
        byte[] imageInBuffer = new byte[data.limit()];
        data.get(imageInBuffer, 0, imageInBuffer.length);
        try {
            YuvImage image =
                    new YuvImage(
                            imageInBuffer, ImageFormat.NV21, metadata.getWidth(), metadata.getHeight(), null);
            ByteArrayOutputStream stream = new ByteArrayOutputStream();
            image.compressToJpeg(new Rect(0, 0, metadata.getWidth(), metadata.getHeight()), 80, stream);

            Bitmap bmp = BitmapFactory.decodeByteArray(stream.toByteArray(), 0, stream.size());

            stream.close();
            return rotateBitmap(bmp, metadata.getRotation(), false, false);
        } catch (Exception e) {
            Log.e("VisionProcessorBase", "Error: " + e.getMessage());
        }
        return null;
    }

    /** Converts a YUV_420_888 image from CameraX API to a bitmap. */
    @RequiresApi(VERSION_CODES.LOLLIPOP)
    @Nullable
    @ExperimentalGetImage
    public static Bitmap getBitmap(ImageProxy image) {
        FrameMetadata frameMetadata =
                new FrameMetadata.Builder()
                        .setWidth(image.getWidth())
                        .setHeight(image.getHeight())
                        .setRotation(image.getImageInfo().getRotationDegrees())
                        .build();

        ByteBuffer nv21Buffer =
                yuv420ThreePlanesToNV21(image.getImage().getPlanes(), image.getWidth(), image.getHeight());
        return getBitmap(nv21Buffer, frameMetadata);
    }

    /** Rotates a bitmap if it is converted from a bytebuffer. */
    private static Bitmap rotateBitmap(
            Bitmap bitmap, int rotationDegrees, boolean flipX, boolean flipY) {
        Matrix matrix = new Matrix();

        // Rotate the image back to straight.
        matrix.postRotate(rotationDegrees);

        // Mirror the image along the X or Y axis.
        matrix.postScale(flipX ? -1.0f : 1.0f, flipY ? -1.0f : 1.0f);
        Bitmap rotatedBitmap =
                Bitmap.createBitmap(bitmap, 0, 0, bitmap.getWidth(), bitmap.getHeight(), matrix, true);

        // Recycle the old bitmap if it has changed.
        if (rotatedBitmap != bitmap) {
            bitmap.recycle();
        }
        return rotatedBitmap;
    }

    /**
     * Converts YUV_420_888 to NV21 bytebuffer.
     *
     * <p>The NV21 format consists of a single byte array containing the Y, U and V values. For an
     * image of size S, the first S positions of the array contain all the Y values. The remaining
     * positions contain interleaved V and U values. U and V are subsampled by a factor of 2 in both
     * dimensions, so there are S/4 U values and S/4 V values. In summary, the NV21 array will contain
     * S Y values followed by S/4 VU values: YYYYYYYYYYYYYY(...)YVUVUVUVU(...)VU
     *
     * <p>YUV_420_888 is a generic format that can describe any YUV image where U and V are subsampled
     * by a factor of 2 in both dimensions. {@link Image#getPlanes} returns an array with the Y, U and
     * V planes. The Y plane is guaranteed not to be interleaved, so we can just copy its values into
     * the first part of the NV21 array. The U and V planes may already have the representation in the
     * NV21 format. This happens if the planes share the same buffer, the V buffer is one position
     * before the U buffer and the planes have a pixelStride of 2. If this is case, we can just copy
     * them to the NV21 array.
     */
    @RequiresApi(VERSION_CODES.KITKAT)
    private static ByteBuffer yuv420ThreePlanesToNV21(
            Plane[] yuv420888planes, int width, int height) {
        int imageSize = width * height;
        byte[] out = new byte[imageSize + 2 * (imageSize / 4)];

        if (areUVPlanesNV21(yuv420888planes, width, height)) {
            // Copy the Y values.
            yuv420888planes[0].getBuffer().get(out, 0, imageSize);

            ByteBuffer uBuffer = yuv420888planes[1].getBuffer();
            ByteBuffer vBuffer = yuv420888planes[2].getBuffer();
            // Get the first V value from the V buffer, since the U buffer does not contain it.
            vBuffer.get(out, imageSize, 1);
            // Copy the first U value and the remaining VU values from the U buffer.
            uBuffer.get(out, imageSize + 1, 2 * imageSize / 4 - 1);
        } else {
            // Fallback to copying the UV values one by one, which is slower but also works.
            // Unpack Y.
            unpackPlane(yuv420888planes[0], width, height, out, 0, 1);
            // Unpack U.
            unpackPlane(yuv420888planes[1], width, height, out, imageSize + 1, 2);
            // Unpack V.
            unpackPlane(yuv420888planes[2], width, height, out, imageSize, 2);
        }

        return ByteBuffer.wrap(out);
    }

    /** Checks if the UV plane buffers of a YUV_420_888 image are in the NV21 format. */
    @RequiresApi(VERSION_CODES.KITKAT)
    private static boolean areUVPlanesNV21(Plane[] planes, int width, int height) {
        int imageSize = width * height;

        ByteBuffer uBuffer = planes[1].getBuffer();
        ByteBuffer vBuffer = planes[2].getBuffer();

        // Backup buffer properties.
        int vBufferPosition = vBuffer.position();
        int uBufferLimit = uBuffer.limit();

        // Advance the V buffer by 1 byte, since the U buffer will not contain the first V value.
        vBuffer.position(vBufferPosition + 1);
        // Chop off the last byte of the U buffer, since the V buffer will not contain the last U value.
        uBuffer.limit(uBufferLimit - 1);

        // Check that the buffers are equal and have the expected number of elements.
        boolean areNV21 =
                (vBuffer.remaining() == (2 * imageSize / 4 - 2)) && (vBuffer.compareTo(uBuffer) == 0);

        // Restore buffers to their initial state.
        vBuffer.position(vBufferPosition);
        uBuffer.limit(uBufferLimit);

        return areNV21;
    }

    /**
     * Unpack an image plane into a byte array.
     *
     * <p>The input plane data will be copied in 'out', starting at 'offset' and every pixel will be
     * spaced by 'pixelStride'. Note that there is no row padding on the output.
     */
    @TargetApi(VERSION_CODES.KITKAT)
    private static void unpackPlane(
            Plane plane, int width, int height, byte[] out, int offset, int pixelStride) {
        ByteBuffer buffer = plane.getBuffer();
        buffer.rewind();

        // Compute the size of the current plane.
        // We assume that it has the aspect ratio as the original image.
        int numRow = (buffer.limit() + plane.getRowStride() - 1) / plane.getRowStride();
        if (numRow == 0) {
            return;
        }
        int scaleFactor = height / numRow;
        int numCol = width / scaleFactor;

        // Extract the data in the output buffer.
        int outputPos = offset;
        int rowStart = 0;
        for (int row = 0; row < numRow; row++) {
            int inputPos = rowStart;
            for (int col = 0; col < numCol; col++) {
                out[outputPos] = buffer.get(inputPos);
                outputPos += pixelStride;
                inputPos += plane.getPixelStride();
            }
            rowStart += plane.getRowStride();
        }
    }
}

Dependent FrameMetadata.java:

package com.tonyxlh.mrzscanner;

/** Describing a frame info. */
public class FrameMetadata {

    private final int width;
    private final int height;
    private final int rotation;

    public int getWidth() {
        return width;
    }

    public int getHeight() {
        return height;
    }

    public int getRotation() {
        return rotation;
    }

    private FrameMetadata(int width, int height, int rotation) {
        this.width = width;
        this.height = height;
        this.rotation = rotation;
    }

    /** Builder of {@link FrameMetadata}. */
    public static class Builder {

        private int width;
        private int height;
        private int rotation;

        public Builder setWidth(int width) {
            this.width = width;
            return this;
        }

        public Builder setHeight(int height) {
            this.height = height;
            return this;
        }

        public Builder setRotation(int rotation) {
            this.rotation = rotation;
            return this;
        }

        public FrameMetadata build() {
            return new FrameMetadata(width, height, rotation);
        }
    }
}

Then in MainActivity, add the image analysis instance to the lifecycle of CameraX.

val imageAnalysis = ImageAnalysis.Builder()
    .setBackpressureStrategy(STRATEGY_KEEP_ONLY_LATEST)
    .build()
imageAnalysis.setAnalyzer(
    ContextCompat.getMainExecutor(context),
    MRZAnalyzer({results ->
        run {
            val sb = StringBuilder()
            if (results.size == 1) {
                val result = results.get(0)
                if (result.lineResults.size>=2) {
                    for (lineResult in result.lineResults) {
                        sb.append(lineResult.text)
                        sb.append("\n")
                    }
                }
            }
            code = sb.toString()
        }
    },context)
)
try {
    cameraProviderFuture.get().bindToLifecycle(
        lifecycleOwner,
        selector,
        preview,
        imageAnalysis
    )
} catch (e: Exception) {
    e.printStackTrace()
}

The scanned result will be displayed in a Text.

var code by remember {
    mutableStateOf("")
}
//......
Text(
    text = code,
    fontSize = 20.sp,
    fontWeight = FontWeight.Bold,
    modifier = Modifier
        .fillMaxWidth()
        .padding(32.dp)
)

Source Code

Check out the source code of the demo to have a try: https://github.com/tony-xlh/MRZ-Text-Scanner-Jetpack-Compose