Build a Label Recognition Frame Processor Plugin for React Native Vision Camera (Android)

Dynamsoft Label Recognizer (DLR) is an SDK that makes it easy to add text recognition function to our apps. We can use it to recognize text on labels, ID cards, etc. In this article, we are going to create a React Native Vision Camera frame processor plugin for DLR to recognize text from the camera in a React Native application.

Build the Label Recognition Frame Processor Plugin for React Native Vision Camera (Android)

Let’s do this in steps. We will talk about how to create the plugin for the Android platform first and then talk about how to do this for the iOS platform in a second article.

New Project

First, create a plugin project using bob.

npx create-react-native-library vision-camera-dynamsoft-label-recognizer

You can test the project using the following command:

cd example
npx react-native run-android

Add Camera Permission

Add the following permission to the example/android/app/src/main/AndroidManifest.xml file to use cameras.

<uses-permission android:name="android.permission.CAMERA" />

Add Vision Camera

Install react-native-vision-camera for the example project:

npm i react-native-vision-camera

Update the example/src/App.tsx file to use VisionCamera:

import * as React from 'react';
import { SafeAreaView, StyleSheet } from 'react-native';
import { Camera, useCameraDevices } from 'react-native-vision-camera';

export default function BarcodeScanner() {
  const [hasPermission, setHasPermission] = React.useState(false);
  const devices = useCameraDevices();
  const device = devices.back;

  React.useEffect(() => {
    (async () => {
      const status = await Camera.requestCameraPermission();
      setHasPermission(status === 'authorized');
    })();
  }, []);

  return (
      <SafeAreaView style={styles.container}>
        {device != null &&
        hasPermission && (
        <>
            <Camera
            style={StyleSheet.absoluteFill}
            device={device}
            isActive={true}
            />
        </>)}
      </SafeAreaView>
  );
}

const styles = StyleSheet.create({
  container: {
    flex: 1
  },
});

Enable Frame Processor

React Native Reanimated (REA) is needed to enable Frame Processor.

We can follow its installation guide to install it.

Implement the Plugin

Define the Wrapper Function in JavaScript

To make the Frame Processor Plugin available to the Frame Processor Worklet Runtime, create the following wrapper function in JS/TS:

import type { Frame } from 'react-native-vision-camera'

/**
 * Recognize text.
 */
export function recognize(frame: Frame, config: ScanConfig): ScanResult {
  'worklet'
  return recognize(frame, config)
}

We also need to create a file named Definitions.tsx to define relevant interfaces.

Here are the interfaces related to the recognition result which are equivalent to what are defined in the original SDK (see the API docs).

export interface ScanResult {
  results: DLRResult[];
  imageBase64?: string;
}

export interface DLRResult {
  referenceRegionName: string;
  textAreaName: string;
  pageNumber: number;
  location: Quadrilateral;
  lineResults: DLRLineResult[];
}

export interface Quadrilateral{
  points:Point[];
}

export interface Point {
  x:number;
  y:number;
}

export interface DLRLineResult {
  text: string;
  confidence: number;
  characterModelName: string;
  characterResults: DLRCharacherResult[];
  lineSpecificationName: string;
  location: Quadrilateral;
}

export interface DLRCharacherResult {
  characterH: string;
  characterM: string;
  characterL: string;
  characterHConfidence: number;
  characterMConfidence: number;
  characterLConfidence: number;
  location: Quadrilateral;
}

Here are the interfaces related to the configuration of the scanning process.

/**
 * template: JSON template to set the runtime settings of DLR
 * templateName: specify which template to use
 * license: specify your own license
 * scanRegion: set up a scan region
 * customModelConfig: load custom models in a local folder
 * includeImageBase64: enable to return the bitmap in base64
 */
export interface ScanConfig{
  template?: string;
  templateName?: string;
  license?: string;
  scanRegion?: ScanRegion;
  customModelConfig?: CustomModelConfig;
  includeImageBase64?: boolean;
}

/**
 * Set up a scan region so that the plugin will crop the image before recognizing text from it. The value is in percent.
 */
export interface ScanRegion{
  left: number;
  top: number;
  width: number;
  height: number;
}


/**
 * We can load custom models from a local folder.
 */
export interface CustomModelConfig {
  customModelFolder: string;
  customModelFileNames: string[];
}

Write the Native Code for the Plugin

Next, we are going to implement the plugin in the Android side based on the above definitions.

  1. Create a new file named VisionCameraDLRPlugin.java with the following template content:

    import androidx.camera.core.ImageProxy;
    import com.mrousavy.camera.frameprocessor.FrameProcessorPlugin;
    
    public class VisionCameraDLRPlugin extends FrameProcessorPlugin {
    
        private ReactApplicationContext context;
        public void setContext(ReactApplicationContext reactContext){
            context = reactContext;
        }
        @Override
        public Object callback(ImageProxy image, Object[] params) {
            // code goes here
            return null;
        }
    
        VisionCameraDBRPlugin() {
          super("recognize");
        }
    }
    
  2. Register the plugin in VisionCameraDynamsoftLabelRecognizerPackage.java:

    @Override
    public List<NativeModule> createNativeModules(@NonNull ReactApplicationContext reactContext) {
        List<NativeModule> modules = new ArrayList<>();
        modules.add(new VisionCameraDynamsoftLabelRecognizerModule(reactContext));
    +   VisionCameraDLRPlugin plugin = new VisionCameraDLRPlugin();
    +   plugin.setContext(reactContext);
    +   FrameProcessorPlugin.register(plugin);
        return modules;
    }
    
  3. Add the following content to the android/build.gradle file to include Dynamsoft Label Recognizer and CameraX.

    rootProject.allprojects {
      repositories {
        maven {
          url "https://download2.dynamsoft.com/maven/dc/aar"
        }
        maven {
          url "https://download2.dynamsoft.com/maven/dlr/aar"
        }
      }
    }
    
    dependencies {
      implementation 'androidx.camera:camera-core:1.0.2'
      // From node_modules
      implementation project(path: ':react-native-vision-camera')
      // DLR
      implementation 'com.dynamsoft:dynamsoftcore:1.0.0@aar'
      implementation 'com.dynamsoft:dynamsoftlabelrecognizer:2.0.0@aar'
    }
    
  4. Convert the frame to bitmap and rotate it.

    OCR is sensitive to the rotation of text. The camera sensor’s default orientation of Android is landscape. We need to rotate the frame when the phone is portrait.

    We can use the getBitmap method provided by Google to do this.

    Here is how to use it in the plugin:

    @Override
    public Object callback(ImageProxy image, Object[] params) {
        Bitmap bm = BitmapUtils.getBitmap(image);
    }
    
  5. Create an instance of Dynamsoft Label Recognizer if the plugin is called.

    Here, we create a class named LabelRecognizerManager to manage the initialization and runtime settings of Dynamsoft Label Recognizer.

    The manager:

    public class LabelRecognizerManager {
        private LabelRecognizer recognizer = null;
        private ReactApplicationContext mContext;
        private String mLicense;
        public LabelRecognizerManager(ReactApplicationContext context, String license){
            mContext = context;
            mLicense = license;
            initDLR(license);
        }
           
        public LabelRecognizer getRecognizer(){
            if (recognizer == null) {
                initDLR(mLicense);
            }
            return recognizer;
        }
    
        private void initDLR(String license) {
            LabelRecognizer.initLicense(license, new DLRLicenseVerificationListener() {
                @Override
                public void DLRLicenseVerificationCallback(boolean isSuccess, Exception error) {
                    if(!isSuccess){
                        error.printStackTrace();
                    }
                }
            });
            try {
                recognizer = new LabelRecognizer();
            } catch (LabelRecognizerException e) {
                e.printStackTrace();
            }
        }
    }
    

    Then create the instance in the plugin class.

    private ReactApplicationContext context;
    private LabelRecognizer recognizer = null;
    private LabelRecognizerManager manager = null;
    @Override
    public Object callback(ImageProxy image, Object[] params) {
        Bitmap bm = BitmapUtils.getBitmap(image);
        ReadableNativeMap config = getConfig(params);
        if (manager == null) {
            String license = "DLS2eyJoYW5kc2hha2VDb2RlIjoiMjAwMDAxLTE2NDk4Mjk3OTI2MzUiLCJvcmdhbml6YXRpb25JRCI6IjIwMDAwMSIsInNlc3Npb25QYXNzd29yZCI6IndTcGR6Vm05WDJrcEQ5YUoifQ=="; //default 1-day public trial. Apply for a trial license here: https://www.dynamsoft.com/customer/license/trialLicense/?product=dlr
            if (config != null && config.hasKey("license")) {
                license = config.getString("license");
            }
            manager = new LabelRecognizerManager(context,license);
            recognizer = manager.getRecognizer();
        }
    }
       
    private ReadableNativeMap getConfig(Object[] params){
        if (params.length>0) {
            if (params[0] instanceof ReadableNativeMap) {
                ReadableNativeMap config = (ReadableNativeMap) params[0];
                return config;
            }
        }
        return null;
    }
       
    
  6. Update the runtime settings of the label recognizer.

    Add the following methods to the manager for updating the recognizer’s template and model.

    private void loadCustomModel(String modelFolder, ReadableArray fileNames) {
        try {
            for(int i = 0;i<fileNames.size();i++) {
                AssetManager manager = mContext.getAssets();
                InputStream isPrototxt = manager.open(modelFolder+"/"+fileNames.getString(i)+".prototxt");
                byte[] prototxt = new byte[isPrototxt.available()];
                isPrototxt.read(prototxt);
                isPrototxt.close();
                InputStream isCharacterModel = manager.open(modelFolder+"/"+fileNames.getString(i)+".caffemodel");
                byte[] characterModel = new byte[isCharacterModel.available()];
                isCharacterModel.read(characterModel);
                isCharacterModel.close();
                InputStream isTxt = manager.open(modelFolder+"/"+fileNames.getString(i)+".txt");
                byte[] txt = new byte[isTxt.available()];
                isTxt.read(txt);
                isTxt.close();
                recognizer.appendCharacterModelBuffer(fileNames.getString(i), prototxt, txt, characterModel);
            }
            Log.d("DLR","custom model loaded");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    
    public void updateTemplate(String template){
        if (currentTemplate.equals(template) == false) {
            try {
                recognizer.clearAppendedSettings();
                recognizer.appendSettingsFromString(template);
                Log.d("DLR","append template: "+template);
            } catch (LabelRecognizerException e) {
                e.printStackTrace();
            }
            currentTemplate = template;
        }
    }
    
    public void useCustomModel(String modelFolder, ReadableArray modelFileNames){
        if (modelFolder.equals(currentModelFolder) == false) {
            loadCustomModel(modelFolder, modelFileNames);
            currentModelFolder = modelFolder;
        }
    }
    
    public void destroy(){
        recognizer.destroy();
        recognizer = null;
    }
    

    In the plugin class, update the template and load models if they are in the config.

    String templateName = "";
    
    if (config.hasKey("templateName")) {
        templateName = config.getString("templateName");
    }
    
    if (config.hasKey("customModelConfig")) {
        ReadableNativeMap customModelConfig = config.getMap("customModelConfig");
        String modelFolder = customModelConfig.getString("customModelFolder");
        ReadableArray modelFileNames = customModelConfig.getArray("customModelFileNames");
        manager.useCustomModel(modelFolder,modelFileNames);
    }
    
    if (config.hasKey("template")) {
        String template = config.getString("template");
        manager.updateTemplate(template);
    }
    
  7. Crop the image if a scan region is set.

    if (config != null && config.hasKey("scanRegion")) {
        ReadableNativeMap scanRegion = config.getMap("scanRegion");
        double left = scanRegion.getInt("left") / 100.0 * bm.getWidth();
        double top = scanRegion.getInt("top") / 100.0 * bm.getHeight();
        double width = scanRegion.getInt("width") / 100.0 * bm.getWidth();
        double height = scanRegion.getInt("height") / 100.0 * bm.getHeight();
        bm = Bitmap.createBitmap(bm, (int) left, (int) top, (int) width, (int) height, null, false);
    }
    
  8. Recognize text from the image and wrap the result.

    WritableNativeMap scanResult = new WritableNativeMap();
    WritableNativeArray array = new WritableNativeArray();
    try {
        DLRResult[] results = recognizer.recognizeByImage(bm,templateName);
        for (DLRResult result:results) {
            array.pushMap(Utils.getMapFromDLRResult(result));
        }
    } catch (LabelRecognizerException e) {
        e.printStackTrace();
    }
    scanResult.putArray("results",array);
    if (config != null && config.hasKey("includeImageBase64")) {
        if (config.getBoolean("includeImageBase64") == true) {
            scanResult.putString("imageBase64",Utils.bitmap2Base64(bm));
        }
    }
    return scanResult;
    

    A Utils.java file is created to store helper methods.

    public class Utils {
    
        public static Bitmap base642Bitmap(String base64) {
            byte[] decode = Base64.decode(base64,Base64.DEFAULT);
            Bitmap bitmap = BitmapFactory.decodeByteArray(decode,0,decode.length);
            return bitmap;
        }
    
        public static String bitmap2Base64(Bitmap bitmap) {
            ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
            bitmap.compress(Bitmap.CompressFormat.JPEG, 100, outputStream);
            return Base64.encodeToString(outputStream.toByteArray(), Base64.DEFAULT);
        }
    
        public static WritableNativeMap getMapFromDLRResult(DLRResult result){
            WritableNativeMap map = new WritableNativeMap();
            map.putString("referenceRegionName",result.refereneceRegionName);
            map.putString("textAreaName",result.textAreaName);
            map.putInt("confidence",result.confidence);
            map.putInt("pageNumber",result.pageNumber);
            WritableNativeArray lineResults = new WritableNativeArray();
            for (DLRLineResult lineResult:result.lineResults) {
                lineResults.pushMap(getMapFromDLRLineResult(lineResult));
            }
            map.putArray("lineResults",lineResults);
            map.putMap("location",getMapFromLocation(result.location));
            return map;
        }
    
        private static WritableNativeMap getMapFromDLRLineResult(DLRLineResult result){
            WritableNativeMap map = new WritableNativeMap();
            map.putString("lineSpecificationName",result.lineSpecificationName);
            map.putString("text",result.text);
            map.putString("characterModelName",result.characterModelName);
            map.putMap("location",getMapFromLocation(result.location));
            map.putInt("confidence",result.confidence);
            WritableNativeArray characterResults = new WritableNativeArray();
            for (DLRCharacterResult characterResult:result.characterResults) {
                characterResults.pushMap(getMapFromDLRCharacterResult(characterResult));
            }
            map.putArray("characterResults",characterResults);
            return map;
        }
    
        private static WritableNativeMap getMapFromDLRCharacterResult(DLRCharacterResult result){
            WritableNativeMap map = new WritableNativeMap();
            map.putString("characterH",String.valueOf(result.characterH));
            map.putString("characterM",String.valueOf(result.characterM));
            map.putString("characterL",String.valueOf(result.characterL));
            map.putInt("characterHConfidence",result.characterHConfidence);
            map.putInt("characterMConfidence",result.characterMConfidence);
            map.putInt("characterLConfidence",result.characterLConfidence);
            map.putMap("location",getMapFromLocation(result.location));
            return map;
        }
    
        private static WritableNativeMap getMapFromLocation(Quadrilateral location){
            WritableNativeMap map = new WritableNativeMap();
            WritableNativeArray points = new WritableNativeArray();
            for (Point point: location.points) {
                WritableNativeMap pointAsMap = new WritableNativeMap();
                pointAsMap.putInt("x",point.x);
                pointAsMap.putInt("y",point.y);
                points.pushMap(pointAsMap);
            }
            map.putArray("points",points);
            return map;
        }
    }
    

All right, we’ve now finished writing the plugin.

Use the Plugin in the Example Project

Now, we can use the plugin to do some label recognition.

  1. Update example/babel.config.js to add the recognize function.

    const path = require('path');
    const pak = require('../package.json');
    
    module.exports = {
      presets: ['module:metro-react-native-babel-preset'],
      plugins: [
        [
          'module-resolver',
          {
            extensions: ['.tsx', '.ts', '.js', '.json'],
            alias: {
              [pak.name]: path.join(__dirname, '..', pak.source),
            },
          },
        ],
    +   [
    +     'react-native-reanimated/plugin',
    +     {
    +       globals: ['__recognize'],
    +     },
    +   ],
    + ],
    };
    
  2. In App.tsx, add the frameProcessor props for vision camera.

    Define the frame processor:

    import * as REA from 'react-native-reanimated';
    const [recognitionResults, setRecognitionResults] = React.useState([] as DLRLineResult[]);
    const frameProcessor = useFrameProcessor((frame) => {
      'worklet'
        let config:ScanConfig = {};
        let scanResult = recognize(frame,config);
        console.log(scanResult);
        REA.runOnJS(setRecognitionResults)(scanResult.results);
      }
    }, [])
    

    Add it to the vision camera:

      <Camera
        style={StyleSheet.absoluteFill}
        device={device}
        isActive={isActive}
        format={format}
    +   frameProcessor={frameProcessor}
    +   frameProcessorFps={1}
      >
      </Camera>
    
  3. We can define a scan region. The viewfinder is drawn using react-native-svg.

    Set the scan region config:

    import * as REA from 'react-native-reanimated';
    const scanRegion:ScanRegion = {
      left: 5,
      top: 40,
      width: 90,
      height: 10
    }
       
    const [recognitionResults, setRecognitionResults] = React.useState([] as DLRLineResult[]);
    const frameProcessor = useFrameProcessor((frame) => {
      'worklet'
        let config:ScanConfig = {};
        config.scanRegion = scanRegion;
        let scanResult = recognize(frame,config);
        console.log(scanResult);
        REA.runOnJS(setRecognitionResults)(scanResult.results);
      }
    }, [])
    

    Draw the viewfinder:

    <Svg preserveAspectRatio='xMidYMid slice' style={StyleSheet.absoluteFill} viewBox={getViewBox()}>
      <Rect 
        x={scanRegion.left/100*getFrameSize().width}
        y={scanRegion.top/100*getFrameSize().height}
        width={scanRegion.width/100*getFrameSize().width}
        height={scanRegion.height/100*getFrameSize().height}
        strokeWidth="2"
        stroke="red"
      />
    </Svg>
    

    We can get the frame width and height from the frame parameter in the frame processor. Here are the relevant helper methods:

    const getViewBox = () => {
      const frameSize = getFrameSize();
      const viewBox = "0 0 "+frameSize.width+" "+frameSize.height;
      return viewBox;
    }
       
    const getFrameSize = ():{width:number,height:number} => {
      let width:number, height:number;
      if (HasRotation()){ //check whether the original frame is landscape. If so, switch height and width.
        width = frameHeight;
        height = frameWidth;
      }else {
        width = frameWidth;
        height = frameHeight;
      }
      return {width:width,height:height};
    }
    
    const HasRotation = () => {
      let value = false
      if (Platform.OS === 'android') {
        if (!(frameWidth>frameHeight && Dimensions.get('window').width>Dimensions.get('window').height)){
          value = true;
        }
      }
      return value;
    }
    
  4. Then we can use a modal to display the result.

     <Modal
      animationType="slide"
      transparent={true}
      visible={modalVisible}
      onRequestClose={() => {
        Alert.alert("Modal has been closed.");
        modalVisibleShared.value = !modalVisible;
        setModalVisible(!modalVisible);
        setRecognitionResults([]);
      }}
    >
      <View style={styles.centeredView}>
        <View style={styles.modalView}>
          {renderImage()}
          {recognitionResults.map((result, idx) => (
            <Text key={"line-"+idx}>
              {result.characterResults.map((char, idx) => (
                <RecognizedCharacter key={"char-"+idx} char={char}/>
              ))}  
            </Text>
               
          ))}
          <View style={styles.buttonView}>
              <Pressable
                style={[styles.button, styles.buttonClose]}
                onPress={() => {
                  Alert.alert("","Copied");
                  Clipboard.setString(getText());
                }}
              >
                <Text style={styles.textStyle}>Copy</Text>
              </Pressable>
              <Pressable
                style={[styles.button, styles.buttonClose]}
                onPress={() => {
                  modalVisibleShared.value = !modalVisible;
                  setModalVisible(!modalVisible)
                  setRecognitionResults([]);
                }}
              >
                <Text style={styles.textStyle}>Rescan</Text>
              </Pressable>
          </View>
    
        </View>
      </View>
    </Modal>
    

Here is a video of the final result:

Source Code

https://github.com/xulihang/vision-camera-dynamsoft-label-recognizer