How to Build a Text Recognition Capacitor Plugin
Capacitor is an open-source native runtime created by the Ionic team for building Web Native apps. We can use it to create cross-platform iOS, Android, and Progressive Web Apps with JavaScript, HTML, and CSS.
Using Capacitor plugins, we can use JavaScript to interface directly with native APIs. In this article, we are going to build a text recognition Capacitor plugin using Dynamsoft Label Recognizer, which can be used to recognize text like MRZ code on passports. Dynamsoft Label Recognizer has JavaScript, Android and iOS editions, so the plugin can work on all three platforms.
Build a Text Recognition Capacitor Plugin
Let’s do this in steps.
New Plugin Project
In a new terminal, run the following:
npm init @capacitor/plugin
We will be prompted to input relevant project info.
√ What should be the npm package of your plugin?
... capacitor-plugin-dynamsoft-label-recognizer
√ What directory should be used for your plugin?
... capacitor-plugin-dynamsoft-label-recognizer
√ What should be the Package ID for your plugin?
Package IDs are unique identifiers used in apps and plugins. For plugins,
they're used as a Java namespace. They must be in reverse domain name
notation, generally representing a domain name that you or your company owns.
... com.dynamsoft.capacitor.dlr
√ What should be the class name for your plugin?
... LabelRecognizer
√ What is the repository URL for your plugin?
... https://github.com/tony-xlh/capacitor-plugin-dynamsoft-label-recognizer
Create an Example Project
In order to test the plugin, we can create an example project.
Under the root of the plugin, create an example
folder and start a webpack project.
git clone https://github.com/wbkd/webpack-starter
mv webpack-starter example # rename webpack-starter to example
Install the Capacitor plugin to the example project:
cd example
npm install ..
Then, we can run npm start
to test the example project.
Next, we are going to implement the Web, Android and iOS parts of the plugin.
Web Implementation
Add Dynamsoft Label Recognizer as a Dependency
npm install dynamsoft-label-recognizer
Write Definitions
Define interfaces in src/definitions.ts
. The LabelRecognizerPlugin
provides methods to initialize Dynamsoft Label Recognizer, update its settings, and use it to recognize text from a base64-encoded image.
export interface LabelRecognizerPlugin {
initialize(): Promise<void>;
initLicense(options: { license: string }): Promise<void>;
recognizeBase64String(options: { base64: string }): Promise<{results:DLRResult[]}>;
updateRuntimeSettings(options: {settings:RuntimeSettings}): Promise<void>;
resetRuntimeSettings(): Promise<void>;
setEngineResourcesPath(options: { path: string }): Promise<void>;
addListener(
eventName: 'onResourcesLoadStarted',
listenerFunc: onResourcesLoadStartedListener,
): Promise<PluginListenerHandle> & PluginListenerHandle;
addListener(
eventName: 'onResourcesLoaded',
listenerFunc: onResourcesLoadedListener,
): Promise<PluginListenerHandle> & PluginListenerHandle;
removeAllListeners(): Promise<void>;
}
export type onResourcesLoadStartedListener = (resourcePath:string) => void;
export type onResourcesLoadedListener = (resourcePath:string) => void;
export interface RuntimeSettings {
template: string;
customModelConfig?: CustomModelConfig;
}
export interface CustomModelConfig {
customModelFolder: string;
customModelFileNames: string[];
}
Implement the Interfaces
-
Initialization.
We need to set up the license and engine resource path and then initialize Dynamsoft Label Recognizer.
private recognizer: LabelRecognizer | null = null; private engineResourcesPath: string = "https://cdn.jsdelivr.net/npm/dynamsoft-label-recognizer@2.2.11/dist/"; private license: string = "DLS2eyJoYW5kc2hha2VDb2RlIjoiMjAwMDAxLTE2NDk4Mjk3OTI2MzUiLCJvcmdhbml6YXRpb25JRCI6IjIwMDAwMSIsInNlc3Npb25QYXNzd29yZCI6IndTcGR6Vm05WDJrcEQ5YUoifQ=="; async initialize(): Promise<void> { try { LabelRecognizer.license = this.license; LabelRecognizer.engineResourcePath = this.engineResourcesPath; this.setupEvents(); this.recognizer = await LabelRecognizer.createInstance(); } catch (error) { throw error; } }
The license and engine resource path can be set using the following methods. If they are not set, default values will be used.
async setEngineResourcesPath(options: { path: string; }): Promise<void> { this.engineResourcesPath = options.path; } async initLicense(options: { license: string; }): Promise<void> { this.license = options.license; }
-
Runtime setup.
We can update the runtime settings for different use cases.
For the JavaScript version, we can pass a JSON template or a predefined template name like
MRZ
to update the runtime settings. You can learn more about it in the docs.async updateRuntimeSettings(options:{settings:RuntimeSettings}): Promise<void> { if (this.recognizer) { this.recognizer.updateRuntimeSettingsFromString(options.settings.template); } } async resetRuntimeSettings(): Promise<void> { if (this.recognizer) { await this.recognizer.resetRuntimeSettings(); } }
When the runtime settings are updated using a predefined template, the JavaScript version of Dynamsoft Label Recognizer will try to load the model from the resource path. We can monitor the loading progress using the following events.
setupEvents() { LabelRecognizer.onResourcesLoadStarted = (resourcePath) => { // In this event handler, you can display a visual cue to show that the model file is being downloaded. console.log("Loading " + resourcePath); this.notifyListeners("onResourcesLoadStarted",resourcePath); }; LabelRecognizer.onResourcesLoaded = (resourcePath) => { // In this event handler, you can close the visual cue if it was displayed. console.log("Finished loading " + resourcePath); this.notifyListeners("onResourcesLoaded", resourcePath); }; }
-
Recognition.
Because the plugin can also work with native APIs, to communicate with native APIs, we need to pass the image data as base64 and provide a method to recognize text from base64.
async recognizeBase64String(options: { base64: string; }): Promise<{results:DLRResult[]}> { if (this.recognizer) { const results = await this.recognizer.recognize(options.base64); if (results) { return {results:results}; }else{ return {results:[]}; } }else{ throw new Error("Not initialized"); } }
Update the Example to Work as a Text Scanner
In the example project, we are going to use the plugin to recognize the text of a local image as well as a captured frame from the camera. We can select which use case to use, if a use case is selected, update the runtime settings of Dynamsoft Label Recognizer.
Here are the screenshots of the final result.
Key parts of the example’s code:
-
Initialization
let license = "DLS2eyJoYW5kc2hha2VDb2RlIjoiMjAwMDAxLTE2NDk4Mjk3OTI2MzUiLCJvcmdhbml6YXRpb25JRCI6IjIwMDAwMSIsInNlc3Npb25QYXNzd29yZCI6IndTcGR6Vm05WDJrcEQ5YUoifQ=="; await LabelRecognizer.initLicense({license:license}); await LabelRecognizer.initialize(); if (Capacitor.isNativePlatform() === false) { LabelRecognizer.addListener('onResourcesLoadStarted', () => { document.getElementById("status").innerText = "Loading resources..."; }); LabelRecognizer.addListener('onResourcesLoaded', () => { document.getElementById("status").innerText = ""; }); }
-
Update runtime settings based on use case selection
async function changeUseCase(event){ const index = event.target.selectedIndex; if (index === 0) { //general await LabelRecognizer.resetRuntimeSettings(); }else{ //MRZ await LabelRecognizer.updateRuntimeSettings({settings:{template:"MRZ"}}); } }
-
Load an image into base64
HTML:
<input type="file" class="decode-image-file" accept=".jpg,.jpeg,.png,.bmp" />
JS:
function decodeImage(){ let files = document.getElementsByClassName("decode-image-file")[0].files; if (files.length == 0) { return; } let img = document.getElementsByClassName("img")[0]; img.src = ""; let file = files[0]; let fileReader = new FileReader(); fileReader.onload = function(e){ let dataURL = e.target.result; //we can get the base64 from the data url }; fileReader.onerror = function () { console.warn('oops, something went wrong.'); }; fileReader.readAsDataURL(file); }
-
Start the camera and capture a camera frame
To keep the plugin simple, we are not adding camera function to it. Here, we use capacitor-plugin-dynamsoft-camera-preview to open the camera.
-
Install the camera-preview plugin.
npm install capacitor-plugin-dynamsoft-camera-preview
-
Open the camera.
await CameraPreview.startCamera();
-
Capture a frame of the camera preview as base64.
const result = await CameraPreview.takeSnapshot({quality:50});
-
Set up a scan region. The image will be cropped according to the region of interest.
await CameraPreview.setScanRegion( {region: { left: leftPercent, top: topPercent, right: leftPercent + widthPercent, bottom: topPercent + heightPercent, measuredByPercentage:1 } } );
-
-
Recognize text from base64.
async function recognizeBase64String(base64){ document.getElementById("status").innerText = "decoding..."; let response = await LabelRecognizer.recognizeBase64String({base64:base64}); document.getElementById("status").innerText = ""; let results = response.results; console.log(response); }
The example can now work as a text scanner in the browser. Next, we are going to implement the Android and iOS parts of the plugin and make the example work on Android and iOS as native apps as well.
Design Pattern
We are going to follow the bridge design pattern the Capacitor team recommends to write plugins.1 It is a design mechanism that encapsulates an implementation class inside of an interface class. We can get an idea of it by checking out the following example.
@objc func getLanguageCode(_ call: CAPPluginCall) {
let code = implementation.getLanguageCode()
call.resolve([ "value": code ])
}
Android Implementation
Add Dynamsoft Label Recognizer as a Dependency
Open android/build.gradle
to add the Dynamsoft Label Recognizer dependency:
rootProject.allprojects {
repositories {
maven {
url "https://download2.dynamsoft.com/maven/aar"
}
}
}
dependencies {
implementation 'com.dynamsoft:dynamsoftlabelrecognizer:2.2.20'
}
Implementation
Let’s implement the code in the Java files with the following steps.
-
In the
LabelRecognizer
class, add the following implementation.public class LabelRecognizer { private com.dynamsoft.dlr.LabelRecognizer recognizer; public void initLicense(String license, Context context) { Log.d("DLR",license); LicenseManager.initLicense(license, context, new LicenseVerificationListener() { @Override public void licenseVerificationCallback(boolean isSuccess, CoreException error) { if(!isSuccess){ error.printStackTrace(); } } }); } public JSArray recognizeBase64String(String base64) throws LabelRecognizerException { Bitmap bm = Utils.base642Bitmap(base64); DLRResult[] results = recognizer.recognizeImage(bm); JSArray array = new JSArray(); for (DLRResult result:results) { array.put(Utils.getMapFromDLRResult(result)); } return array; } public void updateRuntimeSettings(String template) throws LabelRecognizerException { recognizer.initRuntimeSettings(template); } public void resetRuntimeSettings() throws LabelRecognizerException { recognizer.resetRuntimeSettings(); } public void loadCustomModel(Context ctx, String modelFolder, JSONArray fileNames) throws LabelRecognizerException { Log.d("DLR","model folder: "+modelFolder); try { for(int i = 0;i<fileNames.length();i++) { Log.d("DLR","filename: "+fileNames.get(i)); AssetManager manager = ctx.getAssets(); InputStream isPrototxt = manager.open(modelFolder+"/"+fileNames.getString(i)+".prototxt"); byte[] prototxt = new byte[isPrototxt.available()]; isPrototxt.read(prototxt); isPrototxt.close(); InputStream isCharacterModel = manager.open(modelFolder+"/"+fileNames.getString(i)+".caffemodel"); byte[] characterModel = new byte[isCharacterModel.available()]; isCharacterModel.read(characterModel); isCharacterModel.close(); InputStream isTxt = manager.open(modelFolder+"/"+fileNames.getString(i)+".txt"); byte[] txt = new byte[isTxt.available()]; isTxt.read(txt); isTxt.close(); recognizer.appendCharacterModelBuffer(fileNames.getString(i), prototxt, txt, characterModel); } Log.d("DLR","custom model loaded"); } catch (Exception e) { e.printStackTrace(); } } }
-
Use the implementation in the plugin class.
@CapacitorPlugin(name = "LabelRecognizer") public class LabelRecognizerPlugin extends Plugin { private LabelRecognizer implementation = new LabelRecognizer(); @PluginMethod public void initLicense(PluginCall call) { String license = call.getString("license"); implementation.initLicense(license,getContext()); call.resolve(); } @PluginMethod public void initialize(PluginCall call) { try { implementation.initDLR(); call.resolve(); } catch (LabelRecognizerException e) { e.printStackTrace(); call.reject(e.getMessage()); } } @PluginMethod public void recognizeBase64String(PluginCall call) { String base64 = call.getString("base64"); base64 = base64.replaceFirst("data:.*?;base64,",""); try { JSArray results = implementation.recognizeBase64String(base64); JSObject response = new JSObject(); response.put("results",results); call.resolve(response); } catch (LabelRecognizerException e) { e.printStackTrace(); call.reject(e.getMessage()); } } @PluginMethod public void updateRuntimeSettings(PluginCall call) { JSObject settings = call.getObject("settings"); String template = settings.getString("template"); try { implementation.updateRuntimeSettings(template); } catch (LabelRecognizerException e) { e.printStackTrace(); } if (settings.has("customModelConfig")) { JSObject config = settings.getJSObject("customModelConfig"); String modelFolder = config.getString("customModelFolder"); try { JSONArray fileNames = config.getJSONArray("customModelFileNames"); implementation.loadCustomModel(getContext(), modelFolder, fileNames); } catch (JSONException | LabelRecognizerException e) { e.printStackTrace(); } } call.resolve(); } @PluginMethod public void resetRuntimeSettings(PluginCall call) { try { implementation.resetRuntimeSettings(); call.resolve(); } catch (LabelRecognizerException e) { e.printStackTrace(); call.reject(e.getMessage()); } } }
-
Create a Utils class for static methods which convert the base64 to Bitmap and wrap the result as
JSObject
.public class Utils { public static Bitmap base642Bitmap(String base64) { byte[] decode = Base64.decode(base64,Base64.DEFAULT); return BitmapFactory.decodeByteArray(decode,0,decode.length); } public static JSObject getMapFromDLRResult(DLRResult result){ JSObject map = new JSObject(); map.put("referenceRegionName",result.referenceRegionName); map.put("textAreaName",result.textAreaName); map.put("confidence",result.confidence); map.put("pageNumber",result.pageNumber); JSArray lineResults = new JSArray(); for (DLRLineResult lineResult:result.lineResults) { lineResults.put(getMapFromDLRLineResult(lineResult)); } map.put("lineResults",lineResults); map.put("location",getMapFromLocation(result.location)); return map; } private static JSObject getMapFromDLRLineResult(DLRLineResult result){ JSObject map = new JSObject(); map.put("lineSpecificationName",result.lineSpecificationName); map.put("text",result.text); map.put("characterModelName",result.characterModelName); map.put("location",getMapFromLocation(result.location)); map.put("confidence",result.confidence); JSArray characterResults = new JSArray(); for (DLRCharacterResult characterResult:result.characterResults) { characterResults.put(getMapFromDLRCharacterResult(characterResult)); } map.put("characterResults",characterResults); return map; } private static JSObject getMapFromDLRCharacterResult(DLRCharacterResult result){ JSObject map = new JSObject(); map.put("characterH",String.valueOf(result.characterH)); map.put("characterM",String.valueOf(result.characterM)); map.put("characterL",String.valueOf(result.characterL)); map.put("characterHConfidence",result.characterHConfidence); map.put("characterMConfidence",result.characterMConfidence); map.put("characterLConfidence",result.characterLConfidence); map.put("location",getMapFromLocation(result.location)); return map; } private static JSObject getMapFromLocation(Quadrilateral location){ JSObject map = new JSObject(); JSArray points = new JSArray(); for (Point point: location.points) { JSObject pointAsMap = new JSObject(); pointAsMap.put("x",point.x); pointAsMap.put("y",point.y); points.put(pointAsMap); } map.put("points",points); return map; } }
Different from the JavaScript version, the native version loads custom models from local files. For Android, we can put the model files in a folder under assets and then on the JS side, specify which folder and which files to use.
After the implementation, we can update the example to run as an Android app with the following steps:
-
Add the Android project.
npm install @capacitor/android npx cap add android
-
Build the web app and sync it to the Android project.
npm run build npx cap sync
-
Add camera permission by adding the following to
AndroidManifest.xml
.<uses-permission android:name="android.permission.CAMERA" />
-
Run the app.
npx cap run android
iOS Implementation
The iOS implementation is similar to the Android implementation.
Add Dynamsoft Label Recognizer as a Dependency
Open CapacitorPluginDynamsoftCameraPreview.podspec
to add the Dynamsoft Label Recognizer dependency:
s.libraries = 'c++'
s.dependency "DynamsoftLabelRecognizer", '= 2.2.20'
Write Definitions
Open LabelRecognizerPlugin.m
to define the methods:
CAP_PLUGIN(LabelRecognizerPlugin, "LabelRecognizer",
CAP_PLUGIN_METHOD(echo, CAPPluginReturnPromise);
CAP_PLUGIN_METHOD(initialize, CAPPluginReturnPromise);
CAP_PLUGIN_METHOD(initLicense, CAPPluginReturnPromise);
CAP_PLUGIN_METHOD(recognizeBase64String, CAPPluginReturnPromise);
CAP_PLUGIN_METHOD(updateRuntimeSettings, CAPPluginReturnPromise);
CAP_PLUGIN_METHOD(resetRuntimeSettings, CAPPluginReturnPromise);
)
Implementation
Let’s implement the code in the swift files with the following steps.
-
In the
LabelRecognizer
class, add the following implementation.@objc public class LabelRecognizer: NSObject, LicenseVerificationListener { private var recognizer:DynamsoftLabelRecognizer!; @objc public func echo(_ value: String) -> String { print(value) return value } @objc public func initialize() { recognizer = DynamsoftLabelRecognizer.init() } @objc public func initLicense(_ license: String) { DynamsoftLicenseManager.initLicense(license,verificationDelegate:self) } public func licenseVerificationCallback(_ isSuccess: Bool, error: Error?) { } @objc public func recognizeBase64String(_ base64: String) -> [Any] { var returned_results: [Any] = [] print(base64) let image = Utils.convertBase64ToImage(base64) if image != nil { let results = try? recognizer.recognizeImage(image!) print("count:") print(results?.count) for result in results! { returned_results.append(Utils.wrapDLRResult(result:result)) } } return returned_results } @objc public func updateRuntimeSettings(_ template: String) { try? recognizer.initRuntimeSettings(template) } public func loadCustomModel(modelFolder:String,modelFileNames: [String]){ for model in modelFileNames { guard let prototxt = Bundle.main.url( forResource: model, withExtension: "prototxt", subdirectory: modelFolder ) else { print("model not exist") return } let datapro = try! Data.init(contentsOf: prototxt) let txt = Bundle.main.url(forResource: model, withExtension: "txt", subdirectory: modelFolder) let datatxt = try! Data.init(contentsOf: txt!) let caffemodel = Bundle.main.url(forResource: model, withExtension: "caffemodel", subdirectory: modelFolder) let datacaf = try! Data.init(contentsOf: caffemodel!) DynamsoftLabelRecognizer.appendCharacterModel(model, prototxtBuffer: datapro, txtBuffer: datatxt, characterModelBuffer: datacaf) print("load model %@", model) } } @objc public func resetRuntimeSettings() { try? recognizer.resetRuntimeSettings() } }
-
In the plugin class, use the implementation.
@objc(LabelRecognizerPlugin) public class LabelRecognizerPlugin: CAPPlugin { private let implementation = LabelRecognizer() @objc func initialize(_ call: CAPPluginCall) { implementation.initialize() call.resolve() } @objc func initLicense(_ call: CAPPluginCall) { let license = call.getString("license") ?? "DLS2eyJoYW5kc2hha2VDb2RlIjoiMjAwMDAxLTE2NDk4Mjk3OTI2MzUiLCJvcmdhbml6YXRpb25JRCI6IjIwMDAwMSIsInNlc3Npb25QYXNzd29yZCI6IndTcGR6Vm05WDJrcEQ5YUoifQ==" implementation.initLicense(license) call.resolve() } @objc func recognizeBase64String(_ call: CAPPluginCall) { var base64 = call.getString("base64") ?? "" base64 = removeDataURLHead(base64) call.resolve(["results":implementation.recognizeBase64String(base64)]) } func removeDataURLHead(_ str: String) -> String { var finalStr = str do { let pattern = "data:.*?;base64," let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options.caseInsensitive) finalStr = regex.stringByReplacingMatches(in: str, options: NSRegularExpression.MatchingOptions(rawValue: 0), range: NSMakeRange(0, str.count), withTemplate: "") } catch { print(error) } return finalStr } @objc func updateRuntimeSettings(_ call: CAPPluginCall) { let settings = call.getAny("settings") as? [String:Any] let template = settings!["template"] as! String implementation.updateRuntimeSettings(template) guard let customModelConfig = settings!["customModelConfig"] as? [String:Any] else { call.resolve() return } let modelFolder = customModelConfig["customModelFolder"] as! String let modelFileNames = customModelConfig["customModelFileNames"] as! [String] implementation.loadCustomModel(modelFolder: modelFolder, modelFileNames: modelFileNames) call.resolve() } @objc func resetRuntimeSettings(_ call: CAPPluginCall) { implementation.resetRuntimeSettings() call.resolve() } }
-
Create a Utils class for static methods which convert the base64 to UIImage and wrap the result.
class Utils { static public func convertBase64ToImage(_ imageStr:String) ->UIImage?{ if let data: NSData = NSData(base64Encoded: imageStr, options:NSData.Base64DecodingOptions.ignoreUnknownCharacters) { if let image: UIImage = UIImage(data: data as Data) { return image } } return nil } static func wrapDLRResult (result:iDLRResult) -> [String: Any] { var dict: [String: Any] = [:] dict["confidence"] = result.confidence dict["pageNumber"] = result.pageNumber dict["referenceRegionName"] = result.referenceRegionName dict["textAreaName"] = result.textAreaName dict["location"] = wrapLocation(location:result.location) var lineResults: [[String:Any]] = [] for lineResult in result.lineResults! { let lineResultDict: [String: Any] = wrapDLRLineResult(result: lineResult) lineResults.append(lineResultDict) } dict["lineResults"] = lineResults return dict } static private func wrapDLRLineResult (result:iDLRLineResult) -> [String: Any] { var dict: [String: Any] = [:] dict["confidence"] = result.confidence dict["text"] = result.text dict["characterModelName"] = result.characterModelName dict["lineSpecificationName"] = result.lineSpecificationName dict["location"] = wrapLocation(location:result.location) var characterResults: [[String:Any]] = [] for characterResult in result.characterResults! { let characterResultDict: [String: Any] = wrapDLRCharacterResult(result: characterResult) characterResults.append(characterResultDict) } dict["characterResults"] = characterResults return dict } static private func wrapDLRCharacterResult (result:iDLRCharacterResult) -> [String: Any] { var dict: [String: Any] = [:] dict["characterH"] = result.characterH dict["characterHConfidence"] = result.characterHConfidence dict["characterM"] = result.characterM dict["characterMConfidence"] = result.characterMConfidence dict["characterL"] = result.characterL dict["characterLConfidence"] = result.characterLConfidence dict["location"] = wrapLocation(location:result.location) return dict } static private func wrapLocation (location:iQuadrilateral?) -> [String: Any] { var dict: [String: Any] = [:] var points: [[String:CGFloat]] = [] let CGPoints = location!.points as! [CGPoint] for point in CGPoints { var pointDict: [String:CGFloat] = [:] pointDict["x"] = point.x pointDict["y"] = point.y points.append(pointDict) } dict["points"] = points return dict } }
After the implementation, we can update the example to run as an iOS app with the following steps:
-
Add the iOS project.
npm install @capacitor/ios npx cap add ios
-
Build the web app and sync it to the iOS project.
npm run build npx cap sync
-
Add camera permission by adding the following to
Info.plist
.<key>NSCameraUsageDescription</key> <string>For text scanning</string> <key>NSMicrophoneUsageDescription</key> <string>For camera</string>
-
Run the app.
npx cap run ios
Source Code
https://github.com/tony-xlh/capacitor-plugin-dynamsoft-label-recognizer