Theory as well implementation of various type of rag structures
-
Updated
Dec 21, 2025 - Jupyter Notebook
Theory as well implementation of various type of rag structures
A high-performance Speculative RAG pipeline designed to reduce latency by combining fast draft generation and accurate verification using Groq Llama models, local HuggingFace embeddings, ChromaDB vector search, and end-to-end observability with Langfuse.
Add a description, image, and links to the speculative-rag topic page so that developers can more easily learn about it.
To associate your repository with the speculative-rag topic, visit your repo's landing page and select "manage topics."