Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
87c34e4
Initial plan
Copilot Nov 23, 2025
504e5bf
Initial plan for native OpenWebUI citations support
Copilot Nov 23, 2025
d817d1e
Add native OpenWebUI citations support to Azure AI Foundry pipeline
Copilot Nov 23, 2025
bdef4af
Add documentation for native OpenWebUI citations feature
Copilot Nov 23, 2025
bac7435
Fix code review issues: env var parsing and type annotations
Copilot Nov 23, 2025
21973ca
Address final code review feedback: type annotations and efficiency i…
Copilot Nov 23, 2025
4747dea
Refactor: Add helper function for boolean env parsing and simplify ti…
Copilot Nov 23, 2025
a5b8bf3
Merge branch 'main' into copilot/support-openwebui-citations
owndev Nov 25, 2025
d078d6b
Fix citation format to match OpenWebUI specification with distances f…
Copilot Nov 26, 2025
c4d7d5c
Fix code review comments on citation format
Copilot Nov 26, 2025
6c30f6a
Fix citation structure: emit separate events per source with unique n…
Copilot Nov 26, 2025
43a0e64
Fix comments per code review feedback
Copilot Nov 26, 2025
7360c4c
Add comprehensive debug logging and fix distances field for citation …
Copilot Nov 26, 2025
76fec62
Optimize logging to avoid JSON serialization when INFO logging is dis…
Copilot Nov 26, 2025
6888955
Fix NoneType error when citation fields are null
Copilot Nov 26, 2025
f67f7ea
Remove get_bool_env, change defaults, remove doc prefix from titles, …
Copilot Nov 26, 2025
0e1a58c
Fix: use stream_processor_with_citations when AZURE_AI_OPENWEBUI_CITA…
Copilot Nov 26, 2025
7283f1c
Remove _convert_doc_refs_to_markdown_links function as requested
Copilot Nov 26, 2025
f9407a7
Add detailed debug logging for citation events to help troubleshoot […
Copilot Nov 26, 2025
24d3ef9
Fix: use [docX] title in metadata.source field for proper OpenWebUI d…
Copilot Nov 26, 2025
52b007c
Fix: Filter OpenWebUI citations to only show documents referenced in …
Copilot Nov 26, 2025
a95bf01
fix(azure_ai_foundry.py): Update citation title
owndev Nov 26, 2025
9358ddb
Add support for Azure AI Search relevance scores (original_search_sco…
Copilot Dec 1, 2025
a09daee
Fix score matching: use title as primary key, add multiple matching s…
Copilot Dec 1, 2025
e7c3050
refactor: Improve readability of score and content key checks in Pipe…
owndev Dec 1, 2025
f1d84e8
Add [docX] to markdown link conversion and enhanced score debugging
Copilot Dec 3, 2025
da20a8b
Refactor: Extract DOC_REF_PATTERN constant, optimize conversion counting
Copilot Dec 3, 2025
8acb84e
Remove AZURE_AI_LINK_CITATIONS valve and _convert_doc_refs_to_links m…
Copilot Dec 3, 2025
3ecdbc7
Improve code comments for score normalization logic
Copilot Dec 3, 2025
5b030d1
Fix score selection to use filter_reason field per Azure documentation
Copilot Dec 3, 2025
0c39ca5
Fix code review issues: remove unused variable, add explicit score ha…
Copilot Dec 3, 2025
3442a3f
Convert [docX] references to <source> tags for OpenWebUI citation lin…
Copilot Dec 3, 2025
1be8cc7
Refactor: Extract _format_source_tag helper, optimize streaming conve…
Copilot Dec 3, 2025
5f106fa
Refactor: Extract _build_citation_names_map, move replace_ref outside…
Copilot Dec 3, 2025
d4c792a
Use html.escape for proper XSS protection in source tag names
Copilot Dec 3, 2025
2a9d802
Convert [docX] to markdown links with document URLs instead of <sourc…
Copilot Dec 3, 2025
7e88c19
Remove AZURE_AI_ENHANCE_CITATIONS and AZURE_AI_OPENWEBUI_CITATIONS va…
Copilot Dec 3, 2025
99920c5
Fix all_chunks undefined error and revert version to 2.6.0
Copilot Dec 3, 2025
b4b882b
Update Azure AI documentation to reflect citation changes
Copilot Dec 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ The functions include a built-in encryption mechanism for sensitive information:

- Enables interaction with **Azure OpenAI** and other **Azure AI** models.
- Supports Azure Search integration for enhanced document retrieval.
- **Native OpenWebUI Citations Support** 🎯: Rich citation cards, source previews, and inline citation correlations for Azure AI Search responses (Azure OpenAI only).
- Supports multiple Azure AI models selection via the `AZURE_AI_MODEL` environment variable (e.g. `gpt-4o;gpt-4o-mini`).
- Customizable pipeline display with configurable prefix via `AZURE_AI_PIPELINE_PREFIX`.
- Azure AI Search / RAG integration with enhanced collapsible citation display (Azure OpenAI only).
Expand All @@ -112,6 +113,8 @@ The functions include a built-in encryption mechanism for sensitive information:

🔗 [Learn More About Azure AI](https://azure.microsoft.com/en-us/solutions/ai)

📖 [Azure AI Citations Documentation](./docs/azure-ai-citations.md)

### **2. [N8N Pipeline](./pipelines/n8n/n8n.py)**

> [!TIP]
Expand Down
202 changes: 202 additions & 0 deletions docs/azure-ai-citations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# Azure AI Foundry Pipeline - Native OpenWebUI Citations

This document describes the native OpenWebUI citation support in the Azure AI Foundry Pipeline, which enables rich citation cards and source previews in the OpenWebUI frontend.

## Overview

The Azure AI Foundry Pipeline supports **native OpenWebUI citations** for Azure AI Search (RAG) responses. This feature is **automatically enabled** when you configure Azure AI Search data sources (`AZURE_AI_DATA_SOURCES`). The OpenWebUI frontend will display:

- **Citation cards** with source information and relevance scores
- **Source previews** with content snippets
- **Relevance percentage** displayed on citation cards (requires `AZURE_AI_INCLUDE_SEARCH_SCORES=true`)
- **Clickable `[docX]` references** that link directly to document URLs
- **Interactive citation UI** with expandable source details

## Features

### Automatic Citation Support

When Azure AI Search is configured, the pipeline automatically:

1. Emits citation events via `__event_emitter__` for the OpenWebUI frontend
2. Converts `[docX]` references in the response to clickable markdown links
3. Filters citations to only show documents actually referenced in the response
4. Extracts relevance scores from Azure Search when available

### Configuration Options

| Environment Variable | Default | Description |
|---------------------|---------|-------------|
| `AZURE_AI_DATA_SOURCES` | `""` | JSON configuration for Azure AI Search (required for citations) |
| `AZURE_AI_INCLUDE_SEARCH_SCORES` | `true` | Enable relevance score extraction from Azure Search |

### How It Works

#### Streaming Responses

When Azure AI Search returns citations in a streaming response:

1. The pipeline detects citations in the SSE (Server-Sent Events) stream
2. `[docX]` references in each chunk are converted to markdown links with document URLs
3. After the stream ends, citation events are emitted via `__event_emitter__`
4. Citations are filtered to only include documents referenced in the response

#### Non-Streaming Responses

When Azure AI Search returns citations in a non-streaming response:

1. The pipeline extracts citations from the response context
2. `[docX]` references in the content are converted to markdown links
3. Individual citation events are emitted via `__event_emitter__` for each referenced source

## Citation Format

### OpenWebUI Citation Event Structure

Each citation is emitted as a separate event to ensure all sources appear in the UI. Citation events follow the official OpenWebUI specification (see [OpenWebUI Events Documentation](https://docs.openwebui.com/features/plugin/development/events#source-or-citation-and-code-execution)):

```python
{
"type": "citation",
"data": {
"document": ["Document content..."], # Content from this citation
"metadata": [{"source": "https://..."}], # Metadata with source URL
"source": {
"name": "[doc1] Document Title", # Unique name with index
"url": "https://..." # Source URL if available
},
"distances": [0.95] # Relevance score (displayed as percentage)
}
}
```

Key points:
- Each source document gets its own citation event
- The `source.name` includes the doc index (`[doc1]`, `[doc2]`, etc.) to prevent grouping
- The `distances` array contains relevance scores from Azure AI Search, which OpenWebUI displays as a percentage on the citation cards

### Azure Citation Format (Input)

Azure AI Search returns citations in this format:

```python
{
"title": "Document Title",
"content": "Full or partial content",
"url": "https://...",
"filepath": "/path/to/file",
"chunk_id": "chunk-123",
"score": 0.95,
"metadata": {}
}
```

The pipeline automatically converts Azure citations to OpenWebUI format.

## Usage

### Basic Setup

Configure Azure AI Search to enable citation support:

```bash
# Azure AI Search configuration (required for citations)
AZURE_AI_DATA_SOURCES='[{"type":"azure_search","parameters":{"endpoint":"https://YOUR-SEARCH-SERVICE.search.windows.net","index_name":"YOUR-INDEX-NAME","authentication":{"type":"api_key","key":"YOUR-SEARCH-API-KEY"}}}]'

# Enable relevance scores (default: true)
AZURE_AI_INCLUDE_SEARCH_SCORES=true
```

### Clickable Document Links

The pipeline automatically converts `[docX]` references to clickable markdown links:

```markdown
# Input from Azure AI
The answer can be found in [doc1] and [doc2].

# Output (converted by pipeline)
The answer can be found in [[doc1]](https://example.com/doc1.pdf) and [[doc2]](https://example.com/doc2.pdf).
```

This works for both streaming and non-streaming responses.

### Relevance Scores

When `AZURE_AI_INCLUDE_SEARCH_SCORES=true` (default), the pipeline:

1. Automatically adds `include_contexts: ["citations", "all_retrieved_documents"]` to Azure Search requests
2. Extracts scores based on the `filter_reason` field:
- `filter_reason="rerank"` → uses `rerank_score`
- `filter_reason="score"` or not present → uses `original_search_score`
3. Displays the score as a percentage on citation cards

## Implementation Details

### Helper Functions

The pipeline includes these helper functions for citation processing:

1. **`_extract_citations_from_response()`**: Extracts citations from Azure responses
2. **`_normalize_citation_for_openwebui()`**: Converts Azure citations to OpenWebUI format
3. **`_emit_openwebui_citation_events()`**: Emits citation events via `__event_emitter__`
4. **`_merge_score_data()`**: Matches citations with score data from `all_retrieved_documents`
5. **`_build_citation_urls_map()`**: Builds mapping of citation indices to URLs
6. **`_format_citation_link()`**: Creates markdown links for `[docX]` references
7. **`_convert_doc_refs_to_links()`**: Converts all `[docX]` references in content to markdown links

### Title Fallback Logic

The pipeline uses intelligent title fallback:

1. Use `title` field if available
2. Fallback to filename extracted from `filepath` or `url`
3. Fallback to `"Unknown Document"` if all are empty

This ensures every citation has a meaningful display name.

### Citation Filtering

Citations are filtered to only show documents that are actually referenced in the response content. For example, if Azure returns 5 citations but the response only references `[doc1]` and `[doc3]`, only those 2 citations will appear in the UI.

## Troubleshooting

### Citations Not Appearing

**Problem**: Citations don't appear in the OpenWebUI frontend

**Solutions**:
1. Check that Azure AI Search is properly configured (`AZURE_AI_DATA_SOURCES`)
2. Ensure you're using an Azure OpenAI endpoint (not a generic Azure AI endpoint)
3. Verify the response contains `[docX]` references
4. Check browser console and server logs for errors

### Relevance Scores Showing 0%

**Problem**: All citation cards show 0% relevance

**Solutions**:
1. Verify `AZURE_AI_INCLUDE_SEARCH_SCORES=true` is set
2. Check that your Azure Search index supports scoring
3. Enable DEBUG logging to see the raw score values from Azure

### Links Not Working

**Problem**: `[docX]` references are not clickable

**Solutions**:
1. Ensure citations have valid `url` or `filepath` fields
2. Check that the document URL is accessible
3. Verify the markdown link format is being generated correctly

## References

- [OpenWebUI Pipelines Citation Feature Discussion](https://github.com/open-webui/pipelines/issues/229)
- [OpenWebUI Event Emitter Documentation](https://docs.openwebui.com/features/plugin/development/events)
- [Azure AI Search Documentation](https://learn.microsoft.com/en-us/azure/search/)
- [Azure On Your Data API Reference](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/references/on-your-data)

## Version History

- **v2.6.0**: Major refactor - removed `AZURE_AI_ENHANCE_CITATIONS` and `AZURE_AI_OPENWEBUI_CITATIONS` valves; citation support is now always enabled when `AZURE_AI_DATA_SOURCES` is configured; added clickable `[docX]` markdown links; improved score extraction using `filter_reason` field
- **v2.5.x**: Dual citation modes (OpenWebUI events + markdown/HTML)
84 changes: 26 additions & 58 deletions docs/azure-ai-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,9 @@ AZURE_AI_ENDPOINT="https://<deployment>.openai.azure.com/openai/deployments/<mod
# Complete JSON configuration for Azure Search - copy exactly and replace placeholder values
AZURE_AI_DATA_SOURCES='[{"type":"azure_search","parameters":{"endpoint":"https://<your-search-service>.search.windows.net","index_name":"<your-index-name>","authentication":{"type":"api_key","key":"<your-search-api-key>"}}}]'

# Enable enhanced citation display for better readability (default: true)
AZURE_AI_ENHANCE_CITATIONS=true
# Enable relevance score extraction from Azure Search (default: true)
# When enabled, automatically adds include_contexts to get original_search_score and rerank_score
AZURE_AI_INCLUDE_SEARCH_SCORES=true
```

### Azure AI Search / RAG Integration
Expand Down Expand Up @@ -155,73 +156,40 @@ For advanced use cases, you can include additional parameters:
- **Missing API key**: Ensure your Azure Search API key has proper permissions
- **Index not found**: Verify your index name matches exactly (case-sensitive)

#### Enhanced Citation Display
#### Native OpenWebUI Citation Support

The pipeline automatically enhances Azure AI Search responses to make citations and source documents more accessible and readable. When Azure AI Search is configured, the pipeline transforms the raw citation data into a user-friendly format.
The pipeline automatically provides native OpenWebUI citation support for Azure AI Search responses. When Azure AI Search is configured, the pipeline:

**Original Azure AI Response:**
1. **Emits citation events** via `__event_emitter__` for the OpenWebUI frontend to display interactive citation cards
2. **Converts `[docX]` references** to clickable markdown links that link directly to document URLs
3. **Extracts relevance scores** when `AZURE_AI_INCLUDE_SEARCH_SCORES=true`
4. **Filters citations** to only show documents actually referenced in the response

```json
{
"choices": [
{
"message": {
"content": "**Docker container actions** are a type of GitHub Actions [doc1]...",
"context": {
"citations": [
{
"content": "environment variable. The token can be used to authenticate...",
"title": "README.md",
"chunk_id": "0"
}
]
}
}
}
]
}
```

**Enhanced Response with Collapsible Citations:**
**Example: Clickable Document Links**

```html
```markdown
# Original Azure AI response
**Docker container actions** are a type of GitHub Actions [doc1]...

<details>
<summary>📚 Sources and References</summary>

<details>
<summary>[doc1] - README.md</summary>

📁 **File:** `README.md`
📄 **Chunk ID:** 0
**Content:**
> environment variable. The token can be used to authenticate the workflow when accessing GitHub resources...

</details>

<details>
<summary>[doc2] - Documentation.md</summary>
# Enhanced response (with clickable links)
**Docker container actions** are a type of GitHub Actions [[doc1]](https://example.com/README.md)...
```

📁 **File:** `Documentation.md`
📄 **Chunk ID:** 1
**Content:**
> Docker container actions contain all their dependencies in the container and are therefore very consistent...
**Citation Card Features:**

</details>
- **Source information** with `[docX]` prefix for easy identification
- **Relevance percentage** displayed on citation cards (requires `AZURE_AI_INCLUDE_SEARCH_SCORES=true`)
- **Document preview** with content snippets
- **Clickable links** to source documents when URLs are available
- **Streaming support** with links converted inline as content streams

</details>
```
**Relevance Score Selection:**

**Enhanced Citation Features:**
The pipeline uses the `filter_reason` field from Azure Search to select the appropriate score:
- `filter_reason="rerank"` → uses `rerank_score`
- `filter_reason="score"` or not present → uses `original_search_score`

- **Collapsible interface** with expandable sections for clean presentation
- **Two-level organization** - main sources section and individual document details
- **Complete content display** - full document content, not just previews
- **Document references** with clear [doc1], [doc2] labels for easy cross-referencing
- **Source metadata** including file paths, URLs, and chunk IDs for precise tracking
- **Streaming support** with citations properly formatted for both streaming and non-streaming responses
- **Space efficient** - collapsed by default to avoid overwhelming the main response
For more details, see the [Azure AI Citations Documentation](azure-ai-citations.md).

> [!TIP]
> To use **Azure OpenAI** and other **Azure AI** models **simultaneously**, you can use the following URL: `https://<your project>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview`
Loading