Voice2Text AI

A cross-platform Python application that transcribes voice input using GPU-accelerated Whisper, sends text to local Ollama AI models for intelligent responses, and reads the output aloud with text-to-speech.

Features

🎙️ Real-time voice transcription with OpenAI Whisper (GPU-accelerated)
🤖 AI chat integration with local Ollama models
🔊 Text-to-speech output with pause/resume
🎨 Modern dark gradient UI
📦 Cross-platform executables (Windows/macOS/Linux)
🖥️ Flatpak support
⚡ Optimized performance with silence detection
🔄 Automatic retry logic for network requests

Requirements

Python 3.8+ (maybe, I think the standalone app runs without it)
Ollama running locally (http://localhost:11434) (only for AI interaction)
Microphone access (The whole point is dictation)
CUDA-compatible GPU (optional, for faster Whisper processing. Highly recommended.)

Installation

Pre-built Executables

Download the latest release from GitHub Releases:

Windows: Voice2Text.exe Untested
macOS: Voice2Text Untested
Linux: 'Voice2Text' https://drive.google.com/file/d/1MmF6Vr_3nz1yket2SdnHCI_Od5OpIzQd/view?usp=sharing

Linux (Flatpak) (ToDo)

# Install from Flathub (when available)
flatpak install flathub com.voice2text.app

# Or download from GitHub Releases
# Download: voice2text.flatpak from latest release
# Install: flatpak install --user voice2text.flatpak
# Run: flatpak run com.voice2text.app

From Source (This might actually work)

git clone https://github.com/crhy/Voice2Text-AI.git
cd Voice2Text-AI
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Development

# Run tests
python test_voice.py
python test_mic.py
python test_whisper.py

# Build executable
pyinstaller voice_app.spec

# Build Flatpak (This has never worked.  Yet)
flatpak-builder --force-clean build com.voice2text.app.yml
flatpak-builder --user --install build com.voice2text.app.yml

# Export Flatpak for distribution
flatpak build-bundle build voice2text.flatpak com.voice2text.app

Setup Ollama

# Install Ollama (if not already installed)
curl -fsSL https://ollama.ai/install.sh | sh

# Start Ollama service
ollama serve

# Pull a model (in another terminal)
ollama pull llama3.2  # or any preferred model

Usage

Running the App

Executable: Right Click, Select "Allow Executing File as Program", Double-click the downloaded file (wait, it's slow to load. Once it's up it's crazy fast)
From Source: python voice_app.py
Flatpak: flatpak run com.voice2text.app (Good luck?)

Quick Start

Select your Ollama model from the dropdown
Click "🎙️ Start Dictation"
Speak clearly into your microphone
Click "⏹️ Stop Dictation" when finished
Click "🤖 Query AI" to get AI responses
Responses are displayed and spoken aloud

The app automatically saves your settings and provides real-time status updates.

Notes

Whisper models run locally (internet required for initial download)
Edge TTS provides local speech synthesis
GPU acceleration speeds up transcription significantly
Settings persist between sessions
App works offline after initial model downloads

Troubleshooting

Microphone: Run python test_mic.py
Ollama: Ensure it's running at http://localhost:11434
GPU: Check with nvidia-smi (optional)
Audio: Grant microphone permissions
Models: First run downloads Whisper models (~2GB)

Project Structure

Voice2Text-AI/
├── voice_app.py          # Main GUI application
├── main.py              # Alternative Tkinter version
├── voice_app_kivy.py    # Kivy mobile version
├── voice_to_opencode.py  # CLI version
├── requirements.txt      # Python dependencies
├── test_*.py            # Test scripts
├── *.spec               # PyInstaller configs
├── *.desktop            # Linux desktop files
├── com.voice2text.app.*  # Flatpak manifests
├── config.json          # App configuration
├── voice_config.json    # Voice app settings
├── logo.png             # Application logo
├── install.sh           # Linux installer
└── README.md

Distribution

Building Releases

# Create GitHub release with all platform binaries
# Upload these files to GitHub Releases:
# - Voice2Text.exe (Windows)
# - Voice2Text (macOS)
# - voice2text.flatpak (Linux)

Flathub Submission

To submit to Flathub for official distribution:

# Fork the Flathub repository
# Add your manifest to: https://github.com/flathub/flathub
# Submit pull request with com.voice2text.app.yml

Contributing

Contributions welcome! Please submit issues and pull requests on GitHub.

License

This project is open source. See individual files for license information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice2Text AI

Features

Requirements

Installation

Pre-built Executables

Linux (Flatpak) (ToDo)

From Source (This might actually work)

Development

Setup Ollama

Usage

Running the App

Quick Start

Notes

Troubleshooting

Project Structure

Distribution

Building Releases

Flathub Submission

Contributing

License

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github		.github
README.md		README.md
Screenshot at 2025-11-03 01-22-46.png		Screenshot at 2025-11-03 01-22-46.png
VOICE_README.md		VOICE_README.md
build_flatpak.sh		build_flatpak.sh
buildozer.spec		buildozer.spec
com.voice2text.app.json		com.voice2text.app.json
com.voice2text.app.yml		com.voice2text.app.yml
config.json		config.json
install.sh		install.sh
install_gpu.sh		install_gpu.sh
kokoro_tts.py		kokoro_tts.py
logo.png		logo.png
main.py		main.py
opencode.json		opencode.json
requirements.txt		requirements.txt
requirements_test.txt		requirements_test.txt
run_voice_app.sh		run_voice_app.sh
test_mic.py		test_mic.py
test_voice.py		test_voice.py
test_whisper.py		test_whisper.py
voice_app.desktop		voice_app.desktop
voice_app.py		voice_app.py
voice_app.spec		voice_app.spec
voice_app_kivy.py		voice_app_kivy.py
voice_config.json		voice_config.json
voice_to_opencode.py		voice_to_opencode.py

crhy/Voice2Text-AI

Folders and files

Latest commit

History

Repository files navigation

Voice2Text AI

Features

Requirements

Installation

Pre-built Executables

Linux (Flatpak) (ToDo)

From Source (This might actually work)

Development

Setup Ollama

Usage

Running the App

Quick Start

Notes

Troubleshooting

Project Structure

Distribution

Building Releases

Flathub Submission

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages