RAG Prototype: LangChain + LangGraph + Chroma

This project provides a minimal Retrieval-Augmented Generation (RAG) search system using:

LangChain for document loading, splitting, embeddings, and retrieval
Chroma as a local vector database for similarity search
LangGraph-inspired orchestration (simple 2-node pipeline: retrieve -> synthesize)

Features

Local-first: uses HuggingFace sentence-transformers by default; no API key required
Local LLM via Ollama by default (pull a model like llama3.1:8b). If Ollama is not available, it can fall back to OpenAI (if OPENAI_API_KEY is set) or to an extractive response.
Persistent vector store using Chroma
Simple CLI for ingestion and queries

Prerequisites

Python 3.10+

Setup

Create and activate a virtual environment (optional): python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate
Install dependencies: pip install -r requirements.txt
Add some .txt files into data/ (some sample content is already in data/sample/).

Environment variables (optional)

EMBED_MODEL: HuggingFace embeddings model (default: sentence-transformers/all-MiniLM-L6-v2)
LLM_PROVIDER: choose 'ollama' (default) or 'openai'
OLLAMA_MODEL: Ollama model to use (default: llama3.1:8b)
OLLAMA_BASE_URL (or OLLAMA_HOST): e.g., http://localhost:11434
OPENAI_API_KEY: if set and provider=openai, the system will use OpenAI Chat Completions
OPENAI_MODEL: default gpt-4o-mini
CHROMA_URL: if set, the app connects to a Chroma Server (e.g., http://localhost:8000) instead of local .chroma

Usage 0. Extract OCR from XML into .txt (if your data is XML with OCR):

Example: convert all XML files under data_sample/ into plain text files under data/

python -m src.rag_system.cli extract_ocr --input data_sample --output data --glob "**/*.xml"

If you know the exact XPath to the OCR node(s), provide it to improve accuracy, e.g.:

python -m src.rag_system.cli extract_ocr --input data_sample --output data --xpath ".//OCR"

Ingest documents into Chroma (local embedded): python -m src.rag_system.cli ingest --source data

Or, with Chroma Server in Docker (recommended for shared access):

Start server

docker compose up -d chroma

Point the app to the server and use a collection (default: corpus)

export CHROMA_URL=http://localhost:8000 python -m src.rag_system.cli ingest --source data --chroma_url $CHROMA_URL --collection corpus
Run a query (Ollama by default):

Make sure Ollama is installed and running: https://ollama.com/

Pull a model, e.g.: ollama pull llama3.1:8b

Local Chroma

python -m src.rag_system.cli query --provider ollama --ollama_model llama3.1:8b "What is this repository about?"

Chroma Server

python -m src.rag_system.cli query --provider ollama --ollama_model llama3.1:8b --chroma_url $CHROMA_URL "What is this repository about?"

Use OpenAI instead:

export OPENAI_API_KEY=sk-... python -m src.rag_system.cli query --provider openai --model gpt-4o-mini --chroma_url $CHROMA_URL "What is this repository about?"

Advanced options

Choose a different embedding model: python -m src.rag_system.cli ingest --embed_model sentence-transformers/all-mpnet-base-v2
Configure top-k and model for queries:

Ollama

python -m src.rag_system.cli query --provider ollama --ollama_model llama3.1:8b --k 8 "Explain the stack used here"

OpenAI

python -m src.rag_system.cli query --provider openai --model gpt-4o-mini --k 8 "Explain the stack used here"

Project structure

src/rag_system/ingest.py -> Ingestion pipeline (load, split, embed, index)
src/rag_system/graph.py -> Retrieval + synthesis pipeline
src/rag_system/cli.py -> Command-line interface
docker-compose.yml -> Chroma Server (Docker) for remote vector DB
data/ -> Put your .txt files here

Notes

If you do not set OPENAI_API_KEY, answers are generated by a simple extractive fallback (concatenation of top documents) to keep everything offline.
If you set OPENAI_API_KEY, the system uses OpenAI's Chat model configured via OPENAI_MODEL.

Troubleshooting: “Numpy is not available” on macOS

If you see an error like RuntimeError: Numpy is not available when running ingest or query, install NumPy explicitly before other packages and ensure pip/setuptools are recent:

Upgrade build tooling python -m pip install --upgrade pip setuptools wheel
Install NumPy first (compatible range) python -m pip install "numpy>=1.26,<2.1"
Install the project requirements python -m pip install -r requirements.txt

Notes:

On Apple Silicon (M1/M2/M3), use Python 3.10+ from python.org or pyenv and a recent pip (>=23).
If you still hit issues, try recreating the venv and installing NumPy first, then requirements.

How RecursiveCharacterTextSplitter works

We use LangChain’s RecursiveCharacterTextSplitter during ingestion to break large documents into smaller, partially-overlapping chunks before embedding and indexing in Chroma.

Where it’s used here

File: src/rag_system/ingest.py
Code: splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap, add_start_index=True)
You control chunk_size and chunk_overlap via CLI: --chunk_size and --chunk_overlap

What it does (high level)

It tries to split text using a prioritized list of separators to preserve natural boundaries: by default ["\n\n", "\n", " ", ""].
It recursively picks the first separator that actually appears in your text. If a segment is still too long (exceeds chunk_size by the chosen length function), it recursively retries with the next, “finer” separator.
It merges segments back into chunks no larger than chunk_size, with chunk_overlap characters of overlap between consecutive chunks to improve retrieval recall.

Key parameters you’ll care about

chunk_size: Target maximum size (in characters by default) of each chunk.
chunk_overlap: Number of characters to overlap between adjacent chunks, preserving context across boundaries.
separators: Optional custom list of separators to try in order (e.g., section headings, paragraphs, sentences, words, characters). Defaults to ["\n\n", "\n", " ", ""].
keep_separator: Whether to keep the separator in chunks; can be True, False, "start", or "end". Defaults to True.
is_separator_regex: Treat separators as regex patterns (advanced). Defaults to False.
add_start_index: When True, each output Document receives metadata["start_index"] with its starting character offset relative to the original text. We set this to True so you can trace chunks back to the source.
length_function: Function used to measure length (defaults to Python len on characters). You can customize (e.g., token counting) if you subclass or construct differently.

How the algorithm works (step-by-step)

Choose a separator:
- Scan the separators list in order; use the first that occurs in the text. If none match, use the last (often empty string) which falls back to character-level splitting.
Split the text by that separator.
For each resulting piece:
- If piece length < chunk_size: add it to a temporary list of “good” splits.
- If piece length >= chunk_size:
  - First, merge current “good” splits into final chunks (respecting chunk_size and chunk_overlap).
  - Then recursively call the same procedure on the long piece, but now with the remaining, finer separators.
After processing all pieces, merge any remaining “good” splits into the final chunk list.

Merging and overlap

The splitter uses a sliding window over the accumulated splits to emit chunks whose size does not exceed chunk_size.
It ensures consecutive chunks overlap by chunk_overlap characters, which helps retrieval models maintain context when a relevant sentence lies near a boundary.

Why it’s good for RAG

Preserves semantic boundaries when possible (paragraphs, then lines, then words) while guaranteeing chunks are not too large for embedding/token limits.
Overlap improves recall and robustness to query variations.

Practical tuning advice

Start with chunk_size=800 and chunk_overlap=120 (our CLI defaults). Increase chunk_size for long-form technical docs; decrease for short notes.
If your documents have strong structure (e.g., Markdown, headings), consider providing custom separators, e.g.:
- separators=["\n\n# ", "\n\n", "\n", " ", ""] with is_separator_regex=False
If you have very long tokens/words (e.g., base64 or code blobs) and chunks exceed the limit, the recursion eventually falls back to character-level splitting "".
keep_separator="end" can help keep phrase punctuation near the chunk end; "start" can help the next chunk’s beginning be self-contained.
For token-aware sizing (e.g., tiktoken), consider a custom length_function or LangChain’s token-based splitters for more precise control.

Examples with this project

Default ingestion: python -m src.rag_system.cli ingest --source data --chunk_size 800 --chunk_overlap 120
Larger chunks for long technical reports: python -m src.rag_system.cli ingest --source data --chunk_size 1200 --chunk_overlap 150
Smaller, tighter chunks for noisy OCR text: python -m src.rag_system.cli ingest --source data --chunk_size 500 --chunk_overlap 100

Customizing separators (code snippet)

If you want custom separators, edit src/rag_system/ingest.py and pass separators in the splitter construction, for example:

splitter = RecursiveCharacterTextSplitter( chunk_size=chunk_size, chunk_overlap=chunk_overlap, add_start_index=True, separators=["\n\n## ", "\n\n", "\n", " ", ""], # try headings, then paragraphs, lines, words, chars keep_separator=True, )

Edge cases

Documents with no newlines: the splitter quickly falls back to splitting on spaces or characters.
Separator is regex: set is_separator_regex=True and pass patterns (ensure they appear in text, or recursion goes finer).
Extremely long single “words”: recursion will end up splitting at character level to respect chunk_size.

Logging during ingestion

You can enable runtime logs for the ingestion process to monitor progress and performance.

Set the log level via CLI: python -m src.rag_system.cli ingest --source data --log_level DEBUG
Or via environment variable (default is INFO if not provided): export LOG_LEVEL=DEBUG python -m src.rag_system.cli ingest --source data

The logs include:

Start parameters (source, glob, chunking, collection, target Chroma location)
Number of documents loaded
Chunking stats (number of chunks and average length)
Embedding model used
Where data is being written (local Chroma directory or Chroma Server)
Total ingestion time

Dockerized Ollama server

You can run the Ollama server via Docker using the provided docker-compose.yml.

Start Ollama (and optionally Chroma) in the background: docker compose up -d ollama

or both services

docker compose up -d ollama chroma
Point the app to the Dockerized Ollama server: export OLLAMA_BASE_URL=http://localhost:11434
Pull a model inside the container (one-time): docker exec -it ollama ollama pull llama3.1:8b

Then you can query with the CLI (provider=ollama), either against local Chroma or Chroma Server:

Local Chroma: python -m src.rag_system.cli query --provider ollama --ollama_model llama3.1:8b "What is this repository about?"
Chroma Server: export CHROMA_URL=http://localhost:8000 python -m src.rag_system.cli query --provider ollama --ollama_model llama3.1:8b --chroma_url $CHROMA_URL "What is this repository about?"

How to pull a model from the Ollama server

You have three convenient options to download an Ollama model:

If Ollama is installed on your host (macOS/Linux/WSL):
- Ensure the Ollama daemon is running: ollama serve (usually started automatically)
- Pull a model by name: ollama pull llama3.1:8b
If you run Ollama via Docker Compose (this repo’s docker-compose.yml):
- Start the service: docker compose up -d ollama
- Pull the model inside the container: docker exec -it ollama ollama pull llama3.1:8b
- Point the app to the Dockerized server: export OLLAMA_BASE_URL=http://localhost:11434
Using this project’s CLI (talks to the Ollama HTTP API):
- Host or Docker both work as long as the server is reachable.
- Example (defaults to http://localhost:11434 if OLLAMA_BASE_URL is not set): python -m src.rag_system.cli ollama_pull --model llama3.1:8b --base_url $OLLAMA_BASE_URL

Notes

Model names are listed on https://ollama.com/library (e.g., llama3.1, mistral, codellama). Tags like :8b/:70b choose parameter sizes.
The first pull downloads the weights; subsequent pulls are fast.
If the server is remote, set OLLAMA_BASE_URL to that host, e.g., http://your-server:11434.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG Prototype: LangChain + LangGraph + Chroma

Example: convert all XML files under data_sample/ into plain text files under data/

If you know the exact XPath to the OCR node(s), provide it to improve accuracy, e.g.:

Start server

Point the app to the server and use a collection (default: corpus)

Make sure Ollama is installed and running: https://ollama.com/

Pull a model, e.g.: ollama pull llama3.1:8b

Local Chroma

Chroma Server

Use OpenAI instead:

Ollama

OpenAI

Troubleshooting: “Numpy is not available” on macOS

How RecursiveCharacterTextSplitter works

Logging during ingestion

Dockerized Ollama server

or both services

How to pull a model from the Ollama server

About

Uh oh!

Releases

Packages

Languages

liseli/rag_prototype

Folders and files

Latest commit

History

Repository files navigation

RAG Prototype: LangChain + LangGraph + Chroma

Example: convert all XML files under data_sample/ into plain text files under data/

If you know the exact XPath to the OCR node(s), provide it to improve accuracy, e.g.:

Start server

Point the app to the server and use a collection (default: corpus)

Make sure Ollama is installed and running: https://ollama.com/

Pull a model, e.g.: ollama pull llama3.1:8b

Local Chroma

Chroma Server

Use OpenAI instead:

Ollama

OpenAI

Troubleshooting: “Numpy is not available” on macOS

How RecursiveCharacterTextSplitter works

Logging during ingestion

Dockerized Ollama server

or both services

How to pull a model from the Ollama server

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages