Skip to content

Grovety/OpenAI_Robot_Control

Repository files navigation

OpenAI Realtime API Console for ELECROW CrowPanel Advance 5.0-HMI

This application demonstrates OpenAI Realtime API usage on an ESP32-S3 device with a 5-inch HMI LCD panel. It provides a graphical user interface (GUI) for configuring WiFi settings and entering your OpenAI API key, then establishes a WebRTC communication with the OpenAI Realtime API. Audio input is sent to the model, which returns text responses and a transcription of the audio.

Features

  • Embedded Device Focus: Designed for ELECROW CrowPanel Advance 5.0-HMI. For detailed device hardware information, see Device Hardware Documentation.
  • Real-time Communication: Establishes a WebRTC connection with OpenAI Realtime API.
  • Voice Interaction: Transcribes audio input and displays the model’s text responses.
  • OpenAI Responses API: After audio is captured/streamed, the transcription is sent to the OpenAI Responses API for final processing when the mic is toggled off.
  • User-friendly GUI: Built using LVGL 8.4.
  • Session Persistence: WiFi settings and session configurations are saved in non-volatile storage.
  • Easy Build & Flash: Build from source using ESP-IDF v5.4 or flash prebuilt images.
  • LLM Function Calling: Map natural language requests into robot control functions (movement, speed, headlights, music) using OpenAI API’s function calling.

Installation

Building from Source

  1. Install ESP-IDF framework v5.4.
  2. Clone the repository.
  3. Dependencies are installed via the framework component manager (see idf_component.yml).
  4. Build and flash using the following commands:
    idf.py build
    idf.py -p PORT flash

Flashing Prebuilt Images

  • Use flash_tool.exe to flash the prebuilt images.

Usage

  1. WiFi Setup: Navigate to the WiFi tab and enter your SSID and password.

  2. Authentication: Go to the Auth tab and input your OpenAI API key (non-free tier account required).

  3. Mic Control & Realtime Communication:

    • Tap the on-screen mic button to start and stop audio capture.
    • While the mic is on, audio is streamed to the OpenAI Realtime API for live transcription.
    • When you tap the mic off, the complete audio request is sent to the OpenAI Responses API for final processing.
    • Transcriptions, final responses, and any invoked functions are displayed in the terminal.
  4. Function Calling & Supported Commands: Users can speak naturally—exact phrasing isn’t required—and the model will map intents into robot actions. Example requests include:

    • Movement:
      • Direct: “move forward”, “turn right”
      • Indirect: “go ahead a bit”, “spin to the left”
    • Speed Adjustment:
      • Direct: “go faster”, “go slower”
      • Indirect: “speed up”, “take it easy on the throttle”
    • Headlights:
      • Direct: “turn headlights on”, “turn headlights off”
      • Indirect: “it’s too dark here”, “lights, please”
    • Audio:
      • Direct: “play music”
      • Indirect: “start some tunes”, “let’s have some background music”

    Internally, these map to functions: control_robot_movement(direction), change_robot_speed(speed), robot_headlights(headlights_state), and play_music(). Any unsupported or invalid request triggers reject_request().

  5. Wireless module: Wireless module is used to send control commands to the robot.

  6. Session Controls: Use the terminal to clear the screen or disconnect and stop communication.

Dependencies

  • ESP-IDF Components: All dependencies are listed in the idf_component.yml file and are downloaded automatically.
  • LVGL 8.4: Used for the user interface.
  • ESP WebRTC Examples: Heavily inspired by Espressif's WebRTC Solution.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages