TTS/STT : Experimentation with voice samples

**Is your feature request related to a problem? Please describe.**
Glific currently supports TTS (Text-to-Speech) and STT (Speech-to-Text) features in their WhatsApp flows, but they're using Bhashini as the provider, which has proven to be unreliable and inaccurate in production. Users are experiencing failed conversions, poor transcription quality, and service outages. We need to migrate this functionality to Kaapi so that Glific can call TTS/STT providers through our platform, but we cannot simply replicate the existing Bhashini integration without first finding better alternatives.

**Describe the solution you'd like**

1. Conduct thorough testing of all available TTS/STT providers and models in the market, including:

- OpenAI (Whisper for STT, TTS API)
- Google Cloud Speech-to-Text/Text-to-Speech
- AI4Bharat (Indic language models)
- Bhashini (baseline comparison)

2. Use real voice samples from actual Glific usage containing different languages (primarily Hindi and English, but test others as needed) 

Evaluate providers on:

- Accuracy/quality (WER for STT, MOS for TTS)
- Reliability/uptime
- Latency
- Cost
- Language support
- API ease of use

Then we must gave an integration plan according to whatever comes out to be the best from our experiments.

For more details, you can refer this document [here ](https://docs.google.com/document/d/1-uzpSjRTbmlaO5ifycf4xyvrso_PeuofPVXz2j_keeI/edit?tab=t.bwfrxpm41egd)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TTS/STT : Experimentation with voice samples #506

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TTS/STT : Experimentation with voice samples #506

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions