Skip to content

TTS/STT : Experimentation with voice samples #506

@nishika26

Description

@nishika26

Is your feature request related to a problem? Please describe.
Glific currently supports TTS (Text-to-Speech) and STT (Speech-to-Text) features in their WhatsApp flows, but they're using Bhashini as the provider, which has proven to be unreliable and inaccurate in production. Users are experiencing failed conversions, poor transcription quality, and service outages. We need to migrate this functionality to Kaapi so that Glific can call TTS/STT providers through our platform, but we cannot simply replicate the existing Bhashini integration without first finding better alternatives.

Describe the solution you'd like

  1. Conduct thorough testing of all available TTS/STT providers and models in the market, including:
  • OpenAI (Whisper for STT, TTS API)
  • Google Cloud Speech-to-Text/Text-to-Speech
  • AI4Bharat (Indic language models)
  • Bhashini (baseline comparison)
  1. Use real voice samples from actual Glific usage containing different languages (primarily Hindi and English, but test others as needed)

Evaluate providers on:

  • Accuracy/quality (WER for STT, MOS for TTS)
  • Reliability/uptime
  • Latency
  • Cost
  • Language support
  • API ease of use

Then we must gave an integration plan according to whatever comes out to be the best from our experiments.

For more details, you can refer this document here

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions