-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Feature Type
New Feature
Enhancement Area
- Listicles
- Music Cover Video
Feature Description
Setup a new flow to follow:
- Creating/Editing a script (length unknown, maybe "YouTube short" type, or "Youtube documentary" type)
- Use some TTS (ElevenLabs?) to generate voice for script
- [Optional, maybe for next iteration] Generate a video (with images) relevant to the topic.
Motivation
Just a fun idea I had while looking into voice agents. (Those are primarily to help with talking to voice agents, whereas this is just the LLM + TTS portion).
Possible Implementation
So, I honestly can't remember how this project is currently structured, but I'm pretty sure I hard-defined an agent.
While having a pre-defined agent is neat, I think I want to generalize it -- at least this feature, but preferably everywhere -- to a setup where a user can configure multiple agents. The reason for this is because for this helper it makes sense to have a general "script cleaner" agent, but also for users to be able to define their own "writers" (agents) that would specialize in different categories like "true crime", "horror", etc.
I already have it set up to run in Docker, not sure about k8s, but if I set this up in a k8s deployment I could easily do a wrapper over kagent. If I keep this as a simple dockerized app, then it should still be possible to have some statefulness to track writer styles, but it would be neater to have those writers to be agents.
Writers:
- Configurable agents with specific purposes (instructions) on themes and/or whatever the user wants to follow.
Editors
- Unsure if this should be configurable (does it make sense for different style of editors? maybe, like format for YouTube script, etc.)
Alternatives Considered
No response
Additional Context
Would it be best to move from gradio if I go for the configurable writers (so it can be like a local webapp). It may be a hurdle to move from Gradio, but could allow for nicer UI and branding?
I would still have this be a locally-running thing, because I don't want go to the trouble of finding a domain, host a webapp and deal with accounts, security, and whatnot. Just let whatever techy user use this as-is.
Acceptance Criteria (Definition of Done)
TODO