NVIDIA-NeMo · miyoungc · Dec 2, 2025 · Dec 2, 2025 · Dec 3, 2025 · Dec 3, 2025
diff --git a/Makefile b/Makefile
@@ -1,4 +1,4 @@
-.PHONY: all test tests test_watch test_coverage test_profile docs pre_commit help
+.PHONY: all test tests test_watch test_coverage test_profile docs docs-serve docs-update-cards docs-check-cards docs-watch-cards pre_commit help
 
 # Default target executed when no specific target is provided to make.
 all: help
@@ -24,6 +24,18 @@ test_profile:
 docs:
 	poetry run sphinx-build -b html docs _build/docs
 
+docs-serve:
+	cd docs && poetry run sphinx-autobuild . _build/html --port 8000 --open-browser
+
+docs-update-cards:
+	cd docs && poetry run python scripts/update_cards/update_cards.py
+
+docs-check-cards:
+	cd docs && poetry run python scripts/update_cards/update_cards.py --dry-run
+
+docs-watch-cards:
+	cd docs && poetry run python scripts/update_cards/update_cards.py watch
+
 pre_commit:
 	pre-commit install
 	pre-commit run --all-files
@@ -39,4 +51,8 @@ help:
 	@echo 'test_watch                   - run unit tests in watch mode'
 	@echo 'test_coverage                - run unit tests with coverage'
 	@echo 'docs                         - build docs, if you installed the docs dependencies'
+	@echo 'docs-serve                   - serve docs locally with auto-rebuild on changes'
+	@echo 'docs-update-cards            - update grid cards in index files from linked pages'
+	@echo 'docs-check-cards             - check if grid cards are up to date (dry run)'
+	@echo 'docs-watch-cards             - watch for file changes and auto-update cards'
 	@echo 'pre_commit                   - run pre-commit hooks'
diff --git a/docs/LIVE_DOCS.md b/docs/LIVE_DOCS.md
@@ -0,0 +1,205 @@
+# Live Documentation Server - Quick Reference
+
+This guide shows you how to run a live documentation server that automatically rebuilds when you save changes.
+
+## Quick Start
+
+The easiest way to get started:
+
+```bash
+# From the repository root
+make docs-serve
+```
+
+Or from the `docs` directory:
+
+```bash
+# Using the shell script
+./serve.sh
+
+# Using the Python script
+python serve.py
+```
+
+## Prerequisites
+
+Install the documentation dependencies first:
+
+```bash
+poetry install --with docs
+```
+
+## Available Methods
+
+### Method 1: Makefile Target (Recommended)
+
+```bash
+# From repository root
+make docs-serve
+```
+
+- ✅ Simplest method
+- ✅ Automatically opens browser
+- ✅ Runs on port 8000
+
+### Method 2: Shell Script
+
+```bash
+cd docs
+./serve.sh [port]
+```
+
+**Features:**
+
+- Default port: 8000
+- Watches for changes in all documentation files
+- Ignores build artifacts and temporary files
+- Also watches Python source code for API docs
+
+**Custom port:**
+
+```bash
+./serve.sh 8080
+```
+
+### Method 3: Python Script
+
+```bash
+cd docs
+python serve.py [OPTIONS]
+```
+
+**Options:**
+
+- `--port PORT`: Port to serve on (default: 8000)
+- `--host HOST`: Host to bind to (default: 0.0.0.0)
+- `--open`: Automatically open browser
+
+**Examples:**
+
+```bash
+# Default settings
+python serve.py
+
+# Custom port with auto-open
+python serve.py --port 8080 --open
+
+# Localhost only
+python serve.py --host 127.0.0.1
+```
+
+### Method 4: Direct Command
+
+```bash
+cd docs
+poetry run sphinx-autobuild . _build/html --port 8000 --open-browser
+```
+
+## How It Works
+
+1. **Initial Build**: The server builds the documentation from scratch
+2. **Watch Mode**: Monitors all source files for changes (`.md`, `.rst`, `.py`, etc.)
+3. **Auto-Rebuild**: When you save a file, it automatically rebuilds only what changed
+4. **Live Reload**: Your browser automatically refreshes to show the updates
+
+## What Files Are Watched?
+
+The server watches:
+
+- ✅ All Markdown files (`.md`)
+- ✅ All reStructuredText files (`.rst`)
+- ✅ Configuration files (`conf.py`, `config.yml`)
+- ✅ Python source code in `nemoguardrails/` (for API docs)
+- ✅ Static assets (images, CSS, etc.)
+
+Files ignored:
+
+- ❌ Build output (`_build/`)
+- ❌ Temporary files (`.swp`, `*~`)
+- ❌ Python cache (`__pycache__/`, `*.pyc`)
+- ❌ Git files (`.git/`)
+
+## Accessing the Documentation
+
+Once the server starts, open your browser to:
+
+```
+http://127.0.0.1:8000
+```
+
+Or if you used a custom port:
+
+```
+http://127.0.0.1:<your-port>
+```
+
+## Stopping the Server
+
+Press `Ctrl+C` in the terminal to stop the server.
+
+## Troubleshooting
+
+### Port Already in Use
+
+If you see an error about the port being in use:
+
+```bash
+# Use a different port
+./serve.sh 8080
+# or
+python serve.py --port 8080
+```
+
+### Module Not Found: sphinx-autobuild
+
+Install the documentation dependencies:
+
+```bash
+poetry install --with docs
+```
+
+### Changes Not Reflecting
+
+1. Check the terminal for build errors
+2. Try a full rebuild:
+
+   ```bash
+   cd docs
+   rm -rf _build
+   make docs-serve
+   ```
+
+### Browser Not Auto-Refreshing
+
+- Make sure you're viewing the page served by the local server (port 8000)
+- Some browser extensions may block the live reload WebSocket
+- Try a different browser or incognito mode
+
+## Tips
+
+1. **Keep the terminal visible**: You'll see build progress and any errors
+2. **Check for errors**: Red text in the terminal indicates build warnings or errors
+3. **Multiple files**: The server batches changes, so save multiple files then wait a moment
+4. **Clean builds**: If things look wrong, stop the server and delete `_build/` directory
+
+## Advanced Configuration
+
+The scripts automatically configure:
+
+- Ignore patterns for temporary files
+- Debounce delay (1 second) to batch rapid changes
+- Watch additional directories (Python source code)
+- Rebuild only changed files for speed
+
+To customize, edit:
+
+- `docs/serve.sh` (bash script)
+- `docs/serve.py` (Python script)
+
+Or run `sphinx-autobuild` directly with your own options:
+
+```bash
+sphinx-autobuild [SOURCE] [BUILD] [OPTIONS]
+```
+
+See `sphinx-autobuild --help` for all available options.
diff --git a/docs/README.md b/docs/README.md
@@ -10,6 +10,10 @@ Product documentation for the toolkit is available at
 1. Make sure you installed the `docs` dependencies.
    Refer to [CONTRIBUTING.md](../CONTRIBUTING.md) for more information about Poetry and dependencies.
 
+   ```console
+   poetry install --with docs
+   ```
+
 1. Build the documentation:
 
    ```console
@@ -18,6 +22,61 @@ Product documentation for the toolkit is available at
 
    The HTML is created in the `_build/docs` directory.
 
+## Live Documentation Server
+
+For local development with automatic rebuilding on file changes, use one of the following methods:
+
+### Option 1: Using the Shell Script (Recommended for Unix/Mac)
+
+```bash
+cd docs
+./serve.sh [port]
+```
+
+Default port is 8000. The server will automatically rebuild documentation when you save changes to any source file.
+
+### Option 2: Using the Python Script (Cross-Platform)
+
+```bash
+cd docs
+python serve.py [--port PORT] [--host HOST] [--open]
+```
+
+Options:
+
+- `--port PORT`: Port to serve on (default: 8000)
+- `--host HOST`: Host to bind to (default: 0.0.0.0)
+- `--open`: Automatically open browser
+
+Examples:
+
+```bash
+# Start server on default port (8000)
+python serve.py
+
+# Start server on custom port with auto-open browser
+python serve.py --port 8080 --open
+
+# Start server accessible only from localhost
+python serve.py --host 127.0.0.1
+```
+
+### Option 3: Direct sphinx-autobuild Command
+
+```bash
+cd docs
+sphinx-autobuild . _build/html --port 8000 --open-browser
+```
+
+Once the server is running:
+
+- Open your browser to `http://127.0.0.1:8000`
+- Edit any documentation file (`.md`, `.rst`, `.py` configs)
+- Save the file
+- The browser will automatically refresh with the updated content
+
+Press `Ctrl+C` to stop the server.
+
 ## Publishing the Documentation
 
 Tag the commit to publish with `docs-v<semver>`.

diff --git a/docs/architecture/README.md → docs/about/architecture/README.md b/docs/architecture/README.md → docs/about/architecture/README.md
diff --git a/docs/architecture/guardrails-server.png → .../about/architecture/guardrails-server.png b/docs/architecture/guardrails-server.png → .../about/architecture/guardrails-server.png
diff --git a/docs/architecture/index.rst → docs/about/architecture/index.rst b/docs/architecture/index.rst → docs/about/architecture/index.rst
diff --git a/docs/architecture/overall-architecture.png → ...out/architecture/overall-architecture.png b/docs/architecture/overall-architecture.png → ...out/architecture/overall-architecture.png
diff --git a/...rchitecture/sequence-diagram-llmrails.png → ...rchitecture/sequence-diagram-llmrails.png b/...rchitecture/sequence-diagram-llmrails.png → ...rchitecture/sequence-diagram-llmrails.png
diff --git a/docs/user-guides/guardrails-process.md → .../about/how-it-works/guardrails-process.md b/docs/user-guides/guardrails-process.md → .../about/how-it-works/guardrails-process.md
@@ -1,35 +1,10 @@
-# Guardrails Process
+# Guardrails Sequence Diagrams
 
-This guide provides an overview of the main types of rails supported in NeMo Guardrails and the process of invoking them.
+This guide provides an overview of the process of invoking guardrails.
 
-## Overview
+The following diagram depicts the guardrails process in detail:
 
-NeMo Guardrails has support for five main categories of rails: input, dialog, output, retrieval, and execution. The diagram below provides an overview of the high-level flow through these categories of flows.
-
-```{image} ../_static/images/programmable_guardrails_flow.png
-:alt: "High-level flow through the five main categories of guardrails in NeMo Guardrails: input rails for validating user input, dialog rails for controlling conversation flow, output rails for validating bot responses, retrieval rails for handling retrieved information, and execution rails for managing custom actions."
-:align: center
-```
-
-## Categories of Rails
-
-There are five types of rails supported in NeMo Guardrails:
-
-1. **Input rails**: applied to the input from the user; an input rail can reject the input ( stopping any additional processing) or alter the input (e.g., to mask potentially sensitive data, to rephrase).
-
-2. **Dialog rails**: influence how the dialog evolves and how the LLM is prompted; dialog rails operate on canonical form messages (more details [here](colang-language-syntax-guide.md)) and determine if an action should be executed, if the LLM should be invoked to generate the next step or a response, if a predefined response should be used instead, etc.
-
-3. **Retrieval rails**: applied to the retrieved chunks in the case of a RAG (Retrieval Augmented Generation) scenario; a retrieval rail can reject a chunk, preventing it from being used to prompt the LLM, or alter the relevant chunks (e.g., to mask potentially sensitive data).
-
-4. **Execution rails**: applied to input/output of the custom actions (a.k.a. tools) that need to be called.
-
-5. **Output rails**: applied to the output generated by the LLM; an output rail can reject the output, preventing it from being returned to the user or alter it (e.g., removing sensitive data).
-
-## The Guardrails Process
-
-The diagram below depicts the guardrails process in detail:
-
-```{image} ../_static/puml/master_rails_flow.png
+```{image} ../../_static/puml/master_rails_flow.png
 :alt: "Sequence diagram showing the complete guardrails process in NeMo Guardrails: 1) Input Validation stage where user messages are processed by input rails that can use actions and LLM to validate or alter input, 2) Dialog stage where messages are processed by dialog rails that can interact with a knowledge base, use retrieval rails to filter retrieved information, and use execution rails to perform custom actions, 3) Output Validation stage where bot responses are processed by output rails that can use actions and LLM to validate or alter output. The diagram shows all optional components and their interactions, including knowledge base queries, custom actions, and LLM calls at various stages."
 :width: 720px
 :align: center
@@ -45,7 +20,7 @@ The guardrails process has multiple stages that a user message goes through:
 
 The diagram below depicts the dialog rails flow in detail:
 
-```{image} ../_static/puml/dialog_rails_flow.png
+```{image} ../../_static/puml/dialog_rails_flow.png
 :alt: "Sequence diagram showing the detailed dialog rails flow in NeMo Guardrails: 1) User Intent Generation stage where the system first searches for similar canonical form examples in a vector database, then either uses the closest match if embeddings_only is enabled, or asks the LLM to generate the user's intent. 2) Next Step Prediction stage where the system either uses a matching flow if one exists, or searches for similar flow examples and asks the LLM to generate the next step. 3) Bot Message Generation stage where the system either uses a predefined message if one exists, or searches for similar bot message examples and asks the LLM to generate an appropriate response. The diagram shows all the interactions between the application code, LLM Rails system, vector database, and LLM, with clear branching paths based on configuration options and available predefined content."
 :width: 500px
 :align: center
@@ -63,7 +38,7 @@ The dialog rails flow has multiple stages that a user message goes through:
 
 When the `single_llm_call.enabled` is set to `True`, the dialog rails flow will be simplified to a single LLM call that predicts all the steps at once. The diagram below depicts the simplified dialog rails flow:
 
-```{image} ../_static/puml/single_llm_call_flow.png
+```{image} ../../_static/puml/single_llm_call_flow.png
 :alt: "Sequence diagram showing the simplified dialog rails flow in NeMo Guardrails when single LLM call is enabled: 1) The system first searches for similar examples in the vector database for canonical forms, flows, and bot messages. 2) A single LLM call is made using the generate_intent_steps_message task prompt to predict the user's canonical form, next step, and bot message all at once. 3) The system then either uses the next step from a matching flow if one exists, or uses the LLM-generated next step. 4) Finally, the system either uses a predefined bot message if available, uses the LLM-generated message if the next step came from the LLM, or makes one additional LLM call to generate the bot message. This simplified flow reduces the number of LLM calls needed to process a user message."
 :width: 600px
 :align: center

diff --git a/docs/about/how-it-works/how-rails-work.md b/docs/about/how-it-works/how-rails-work.md
@@ -0,0 +1,22 @@
+---
+title: How Guardrails Work
+description: Learn how the NeMo Guardrails toolkit applies guardrails at multiple stages of the LLM interaction.
+---
+
+# How Guardrails Work
+
+The NeMo Guardrails toolkit applies guardrails at multiple stages of the LLM interaction.
+
+| Stage | Rail Type | Common Use Cases |
+|-------|-----------|------------------|
+| **Before LLM** | Input rails | Content safety, jailbreak detection, topic control, PII masking |
+| **After LLM** | Output rails | Response filtering, fact checking, sensitive data removal |
+| **RAG pipeline** | Retrieval rails | Document filtering, chunk validation |
+| **Tool calls** | Execution rails | Action input/output validation |
+| **Conversation** | Dialog rails | Flow control, guided conversations |
+
+```{image} ../../_static/images/programmable_guardrails_flow.png
+:alt: "Programmable Guardrails Flow"
+:width: 800px
+:align: center
+```