-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
What is it?
A tool that accepts an audio file of dictated notes, transcribes the file into text, and uses an LLM to create a summary.
How would it work?
- User uploads an audio file
- Chunking function cuts the file up into 30 seconds chunks (as this is the only length Whisper ASR can work with) and saves them to the filesystem
- Transcription function processes the chunks one at a time, passes them over to Whisper ASR, and writes the transcript to a text file.
- The finished transcript is passed to the summarisation function, which runs it through an LLM prompted with something like "Summarise these dictated notes in markdown format."
- The finished transcript and summary are saved to the file system.
Tech stack
- Python (and Flask?)
- Whisper ASR (model run locally)
- LLM for text summarisation (chatGPT? I'd prefer to do this for free...)
Issues
- Using Python as I couldn't find any evidence that it's possible to run Whisper ASR locally using node, but there is a Python package for this
- I don't know if there's an LLM I can use for free to do the summarisation.
Enhancments
- Generate tags from the summarised notes, and save the tagged summary to an obsidian vault for future reference
Proof of concept
There's a proof of concept of the file chunking and transcription parts of the programme in this Gist.
Metadata
Metadata
Assignees
Labels
No labels