How to Build Flet Chat App with Barcode and Gemini APIs

Jan 07, 2024

Gemini is Google’s latest AI model, which can be used for free with a limit of 60 queries per minute, and is capable of recognizing text from images. Generally, 1D barcodes are accompanied by human-readable text, which can be used to verify the accuracy of barcode recognition results. In this article, we will use the Flet Python API to build a desktop chat app integrated with both barcode and Gemini APIs. The app will read barcodes from images using Dynamsoft Barcode Reader and perform OCR on text within images using Gemini’s text recognition capabilities.

Installation

pip install -U google-generativeai dbr flet

Prerequisites

Flet Python API for Desktop Applications

Flet empowers developers to create desktop applications using Python. It offers a crash course for constructing a real-time chat application, which serves as an excellent starting point.

Our application features a list view for displaying chat messages, a text input field, a button for uploading images, a button for sending messages, and a button to clear the chat history.

Flet chat app UI

Chat messages:

    
  chat = ft.ListView(
          expand=True,
          spacing=10,
          auto_scroll=True,
      )

Text input field:

  new_message = ft.TextField(
      hint_text="Write a message...",
      autofocus=True,
      shift_enter=True,
      min_lines=1,
      max_lines=5,
      filled=True,
      expand=True,
      on_submit=send_message_click,
  )

Button to load an image:

    
  def pick_files_result(e: ft.FilePickerResultEvent):
      global image_path
      image_path = None
      if e.files != None:
          image_path = e.files[0].path
          # TODO

  def pick_file(e):
      pick_files_dialog.pick_files()

  pick_files_dialog = ft.FilePicker(on_result=pick_files_result)
  page.overlay.append(pick_files_dialog)

  ft.IconButton(
      icon=ft.icons.UPLOAD_FILE,
      tooltip="Pick an image",
      on_click=pick_file,
  )

Button to send a message:

  def on_message(message: Message):
      if message.message_type == "chat_message":
          m = ChatMessage(message)

          chat.controls.append(m)
          page.update()

  page.pubsub.subscribe(on_message)

  def send_message_click(e):
      global image_path
      if new_message.value != "":
          page.pubsub.send_all(
              Message("Me", new_message.value, message_type="chat_message"))

          question = new_message.value

          new_message.value = ""
          new_message.focus()
          page.update()

          page.pubsub.send_all(
              Message("Gemini", "Thinking...", message_type="chat_message"))

          # TODO

  ft.IconButton(
      icon=ft.icons.SEND_ROUNDED,
      tooltip="Send message",
      on_click=send_message_click,
  ),

PubSub facilitates asynchronous communication across page sessions. The subscribe method enables the receipt of broadcast messages from other sessions, while the send_all method allows for sending messages to all active sessions. Whenever a new message is received, the list view is automatically updated to display this new message.

Button to clear the chat history:

  def clear_message(e):
      global image_path
      image_path = None
      chat.controls.clear()
      page.update()

  ft.IconButton(
      icon=ft.icons.CLEAR_ALL,
      tooltip="Clear all messages",
      on_click=clear_message,
  )

Integrating the Dynamsoft Barcode Reader

The Dynamsoft Barcode Reader is an efficient library designed for barcode scanning. To enable barcode scanning in your app, you must integrate this library. Here’s how you can do it:

Import the Dynamsoft Barcode Reader library and initialize a barcode reader instance using your license key.

 from dbr import *
 license_key = "LICENSE-KEY"
 BarcodeReader.init_license(license_key)
 reader = BarcodeReader()

Decode the barcode from the uploaded image and send the result to the chat.

 def pick_files_result(e: ft.FilePickerResultEvent):
     global image_path, barcode_text
     barcode_text = None
     image_path = None
     if e.files != None:
         image_path = e.files[0].path
         page.pubsub.send_all(
             Message("Me", image_path, message_type="chat_message", is_image=True))

         text_results = None
         try:
             text_results = reader.decode_file(image_path)
         except BarcodeReaderError as bre:
             print(bre)

         if text_results != None:
             barcode_text = text_results[0].barcode_text
             page.pubsub.send_all(
                 Message("DBR", barcode_text, message_type="chat_message"))

Utilizing Google’s Gemini AI for Text Recognition

Gemini can extract text from images. Once you’ve decoded a barcode, you can employ Gemini to verify the accuracy of the text decoded from the barcode. Here are the steps to use Gemini:

Set up the API key for Gemini.

 import google.generativeai as genai
 import google.ai.generativelanguage as glm

 genai.configure(api_key='API-KEY')

Initialize the text and vision models. The vision model takes both text and images as input.

 model_text = genai.GenerativeModel('gemini-pro')
 chat_text = model_text.start_chat(history=[])
 model_vision = genai.GenerativeModel('gemini-pro-vision')
 chat_vision = model_vision.start_chat(history=[])

Customize the command to effectively recognize text from the barcode image.

 def send_message_click(e):
     global image_path
     if new_message.value != "":
         ...

         if question == ":verify":
             question = "recognize text around the barcode"
             response = model_vision.generate_content(
                 glm.Content(
                     parts=[
                         glm.Part(
                             text=question),
                         glm.Part(
                             inline_data=glm.Blob(
                                 mime_type='image/jpeg',
                                 data=pathlib.Path(
                                     image_path).read_bytes()
                             )
                         ),
                     ],
                 ))

             text = response.text
             page.pubsub.send_all(
                 Message("Gemini", text, message_type="chat_message"))

Verifying the Barcode Decoding Results with the Accompanying Text

Now, we can check whether the text read from the barcode exists in the text recognized from the image. Since the text extracted by Gemini might include spaces, it’s essential to eliminate these spaces prior to comparison.

if barcode_text == None:
    return

text = text.replace(" ", "")
if text.find(barcode_text) != -1:
    page.pubsub.send_all(
        Message("Gemini", barcode_text + " is correct ✓", message_type="chat_message"))
else:
    page.pubsub.send_all(
        Message("Gemini", barcode_text + " may not be correct", message_type="chat_message"))

Launch the desktop application and test it with some images that contain 1D barcodes:

flet run chatbot.py

Flet chat app with barcode and gemini APIs

Source Code

https://github.com/yushulx/flet-chat-app-gemini-barcode

LANGUAGES

PLATFORMS

FEATURED