Skip to content

Conversation

@Anxhela21
Copy link
Contributor

@Anxhela21 Anxhela21 commented Dec 3, 2025

Description

This PR introduces several changes to the Lightspeed Stack query pipeline with a focus on Solr vector I/O support, streaming query, and Solr-specific filtering support within query requests.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • RAG integration: responses now include top contextual chunks and referenced documents from the vector store (streaming and non-streaming flows).
    • Pre-query vector retrieval injects relevant context into user queries for more informed answers.
  • Chores

    • Switched vector backend to Solr-based remote provider with local persistence option.
    • Toolgroups updated to include RAG runtime.
  • Bug Fixes

    • Broadened referenced-document URL handling to accept plain strings as well as validated URLs.

✏️ Tip: You can customize this high-level summary in your review settings.

Anxhela21 and others added 15 commits November 19, 2025 15:32
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
@openshift-ci
Copy link

openshift-ci bot commented Dec 3, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Walkthrough

Adds Solr-based RAG vector-store integration and persistence, injects top RAG chunks into query/streaming flows before LLM calls, relaxes ReferencedDocument.doc_url typing to accept plain strings, and exposes RAGChunk/ReferencedDocument through streaming payloads and public APIs. run.yaml now references external providers via env and configures solr-vector and sqlite persistence.

Changes

Cohort / File(s) Summary
Configuration
run.yaml
external_providers_dir changed to ${env.EXTERNAL_PROVIDERS_DIR}; replaced inline FAISS vector_io with solr-vector remote provider (solr_url, collection_name, vector_field, content_field, embedding_dimension, inference_provider_id) and added sqlite persistence (db_path, namespace); added tool_groups entry builtin::rag; added empty shields; vector_dbs stays empty.
Query endpoint
src/app/endpoints/query.py
Introduces module-level OFFLINE flag; pre-fetches RAG chunks from vector DB (Solr) and converts to RAGChunk/ReferencedDocument (source uses parent_id when OFFLINE, else reference_url); injects top-5 chunks into user content before LLM call; attaches rag_chunks to response/TurnSummary; adds expanded logging and error handling around vector DB queries; minor public surface/type adjustments (Document import, doc_id optional).
Streaming endpoint
src/app/endpoints/streaming_query.py
Adds query_vector_io_for_chunks to fetch RAG chunks and referenced docs; adds OFFLINE handling; updates stream_end_event signature to accept vector_io_referenced_docs; injects rag_context into streaming LLM requests; propagates rag_chunks and referenced documents into end-of-stream payloads and caching; exports RAGChunk and ReferencedDocument.
Request models
src/models/requests.py
Adds optional solr: Optional[dict[str, Any]] field to QueryRequest (with description/example).
Response models
src/models/responses.py
Changes ReferencedDocument.doc_url type from `AnyUrl
Tests
tests/unit/app/endpoints/test_query.py, tests/unit/app/endpoints/test_conversations_v2.py
Updated expectations to account for doc_url now being string or stringified URL (malformed URL preserved as raw string; removed trailing slash expectation in one test).

Sequence Diagram

sequenceDiagram
    autonumber
    actor Client
    participant QueryHandler as Query Endpoint
    participant VectorDB as Vector DB (Solr)
    participant LLM as LLM
    participant Transcript as Transcript/Storage

    Client->>QueryHandler: POST query request
    QueryHandler->>VectorDB: query top-N vectors (filters via solr)
    VectorDB-->>QueryHandler: rag_chunks + chunk metadata
    QueryHandler->>QueryHandler: build rag_context (top 5)
    QueryHandler->>LLM: send user message + rag_context
    LLM-->>QueryHandler: response + token usage
    QueryHandler->>Transcript: store conversation + rag_chunks/referenced_documents
    QueryHandler-->>Client: response payload (includes rag_chunks & referenced_documents)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

Suggested reviewers

  • tisnik
  • are-ces
  • asamal4

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title comprehensively describes the main changes: Solr Vector I/O Provider support, streaming query support, and Solr keyword filtering, which align with the core modifications across run.yaml, query endpoints, and request models.
Docstring Coverage ✅ Passed Docstring coverage is 96.41% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Anxhela Coba <acoba@redhat.com>
@Anxhela21 Anxhela21 marked this pull request as ready for review December 8, 2025 16:59
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/utils/checks.py (1)

60-78: Update docstring to reflect new parent directory validation.

The docstring doesn't describe the new behavior introduced in lines 82-91. When must_exists=False and must_be_writable=True, the function now validates parent directory existence and writability.

Apply this diff to update the docstring:

 def directory_check(
     path: FilePath, must_exists: bool, must_be_writable: bool, desc: str
 ) -> None:
     """
-    Ensure the given path is an existing directory.
+    Ensure the given path is a valid directory.
 
-    If the path is not a directory, raises InvalidConfigurationError.
+    If must_exists is True, verifies the directory exists. If must_be_writable
+    is True and the directory doesn't exist, validates that its parent directory
+    exists and is writable.
 
     Parameters:
         path (FilePath): Filesystem path to validate.
-        must_exists (bool): Should the directory exists?
-        must_be_writable (bool): Should the check test if directory is writable?
+        must_exists (bool): If True, the directory must exist.
+        must_be_writable (bool): If True, the directory (or its parent if the
+            directory doesn't exist) must be writable.
         desc (str): Short description of the value being checked; used in error
         messages.
 
     Raises:
         InvalidConfigurationError: If `path` does not point to a directory or
-        is not writable when required.
+        is not writable when required, or if parent directory validation fails.
     """
src/app/endpoints/query.py (1)

746-769: Avoid relying on vector_db_ids being undefined when no_tools=True

In retrieve_response, vector_db_ids is only populated in the else branch (when no_tools is False), but it’s later read inside the RAG block:

try:
    if vector_db_ids:
        ...

When no_tools is True, this condition currently triggers an UnboundLocalError that is caught by the blanket except, producing misleading “Failed to query vector database for chunks” warnings even though RAG is intentionally disabled.

Define vector_db_ids up front and use it consistently so the RAG path is skipped cleanly when no_tools is set:

@@
-    agent, conversation_id, session_id = await get_agent(
+    agent, conversation_id, session_id = await get_agent(
         client,
         model_id,
         system_prompt,
         available_input_shields,
         available_output_shields,
         query_request.conversation_id,
         query_request.no_tools or False,
     )
@@
-    logger.debug("Conversation ID: %s, session ID: %s", conversation_id, session_id)
-    # bypass tools and MCP servers if no_tools is True
-    if query_request.no_tools:
+    logger.debug("Conversation ID: %s, session ID: %s", conversation_id, session_id)
+
+    # Track available vector DBs (used for toolgroups and RAG); default to empty
+    vector_db_ids: list[str] = []
+
+    # bypass tools and MCP servers if no_tools is True
+    if query_request.no_tools:
         mcp_headers = {}
         agent.extra_headers = {}
         toolgroups = None
     else:
@@
-        # Include RAG toolgroups when vector DBs are available
-        vector_dbs = await client.vector_dbs.list()
-        vector_db_ids = [vdb.identifier for vdb in vector_dbs]
+        # Include RAG toolgroups when vector DBs are available
+        vector_dbs = await client.vector_dbs.list()
+        vector_db_ids = [vdb.identifier for vdb in vector_dbs]

With this change, the later if vector_db_ids: guard in the RAG block works as intended without spurious exceptions/logs when no_tools=True.

🧹 Nitpick comments (11)
tests/unit/cache/test_sqlite_cache.py (1)

394-415: Stronger field-level checks look good; final assertions are slightly redundant

The new field-by-field assertions and the str(...) comparison for doc_url correctly exercise the serialization/deserialization path and the widened doc_url type. Functionally this test now does a much better job of pinning down the expected shape of CacheEntry.

You could trim a bit of redundancy by either:

  • Dropping the later assert retrieved_entries[0].referenced_documents is not None / title check, since they’re implied by the new assertions, or
  • Reusing retrieved_entry everywhere for consistency.

This is purely a readability/maintenance tweak; behavior is fine as-is.

src/models/requests.py (1)

3-168: Clarify solr field shape and avoid set literal in examples

The new solr: Optional[dict[str, Any]] field is a reasonable flexible hook for Solr filters and matches how it’s consumed in vector_io.query. Two small improvements:

  • The example currently uses a Python set:
examples=[
    {"fq": {"product:*openshift*", "product_version:*4.16*"}},
]

which is unusual in a JSON-ish example. A list is clearer and serializes more predictably:

-        examples=[
-            {"fq": {"product:*openshift*", "product_version:*4.16*"}},
-        ],
+        examples=[
+            {"fq": ["product:*openshift*", "product_version:*4.16*"]},
+        ],
  • Consider briefly documenting in the docstring or description that solr is an opaque bag of Solr parameters (e.g., fq, q.op, etc.) so clients know they’re passing them through verbatim.
src/models/context.py (1)

8-51: New RAG fields on ResponseGeneratorContext look good; update docstring

vector_io_rag_chunks: list[RAGChunk] | None and vector_io_referenced_docs: list[ReferencedDocument] | None are well-typed and align with the response models.

To keep documentation in sync, consider extending the class docstring’s Attributes section to include brief descriptions of these two new fields.

src/app/endpoints/query.py (1)

771-897: Refine RAG integration: config for OFFLINE source URL and reduce verbose payload logging

The new vector-DB/RAG flow is a good step, but two aspects are worth tightening up:

  1. Hard‑coded internal base URL and OFFLINE flag

    • OFFLINE = True and the inline "https://mimir.corp.redhat.com" base URL (both for doc_url and source) effectively bake an environment-specific detail into the endpoint.
    • This makes it hard to run in other environments and leaks internal domain knowledge in code.

    Consider:

-# TODO: move this setting to a higher level configuration
-OFFLINE = True
+OFFLINE = True # TODO: drive from configuration instead of a module constant


and pushing the base URL into configuration (or deriving it from chunk metadata) so you don’t need `"https://mimir.corp.redhat.com"` literals here.

2. **Logging entire vector_io query_response at info level**

- `logger.info(f"The query response total payload: {query_response}")` will dump all retrieved chunks and metadata on every query.
- This can be very noisy and may expose large or sensitive content in logs.

Recommend downgrading and summarizing:

```diff
-            logger.info(f"The query response total payload: {query_response}")
+            logger.debug("vector_io.query response: %s", query_response)
+            logger.info(
+                "vector_io.query returned %d chunks for vector_db_id=%s",
+                len(getattr(query_response, "chunks", []) or []),
+                vector_db_id,
+            )

This keeps observability while avoiding heavy, content-rich logs at info level.

run.yaml (1)

117-130: Consider parameterizing the Solr URL for production deployments.

The hardcoded localhost:8983 URL will not work in containerized or distributed environments. Consider using an environment variable reference similar to line 19.

   vector_io:
     - provider_id: solr-vector
       provider_type: remote::solr_vector_io
       config:
-        solr_url: "http://localhost:8983/solr"
+        solr_url: ${env.SOLR_URL:-http://localhost:8983/solr}
         collection_name: "portal-rag"
src/app/endpoints/streaming_query.py (6)

81-84: Hardcoded OFFLINE flag should be moved to configuration.

The TODO acknowledges this needs to be addressed. This module-level constant affects source URL resolution behavior and should be configurable, especially for toggling between environments.

Would you like me to help implement this as part of the configuration system, or open an issue to track this task?


165-171: Update docstring to document the new vector_io_referenced_docs parameter.

The function signature now includes vector_io_referenced_docs, but the docstring's Parameters section doesn't document it.

     Parameters:
         metadata_map (dict): A mapping containing metadata about
         referenced documents.
         summary (TurnSummary): Summary of the conversation turn.
         token_usage (TokenCounter): Token usage information.
         media_type (str): The media type for the response format.
+        vector_io_referenced_docs (list[ReferencedDocument] | None): Referenced
+        documents from vector_io query.
 
     Returns:

487-515: Reconsider emitting "No Violation" token for successful shield validations.

Yielding "No Violation" as a LLM_TOKEN_EVENT for every successful shield validation may create noise in the client stream. Clients typically only need to know about violations, not successful passes. Consider yielding nothing or using a different event type (e.g., debug/info level).

         else:
-            # Yield "No Violation" message for successful shield validations
-            yield stream_event(
-                data={
-                    "id": chunk_id,
-                    "token": "No Violation",
-                },
-                event_type=LLM_TOKEN_EVENT,
-                media_type=media_type,
-            )
+            # Successful shield validations are silently passed
+            pass

1230-1231: Use more Pythonic empty list check.

Comparing with == [] is less idiomatic than using truthiness.

         # Convert empty list to None for consistency with existing behavior
-        if toolgroups == []:
+        if not toolgroups:
             toolgroups = None

1296-1298: Missing pylint: disable=broad-except comment for consistency.

Line 1086 has the pylint disable comment for the broad except, but line 1296 does not, which may trigger linting warnings.

-    except Exception as e:
+    except Exception as e:  # pylint: disable=broad-except
         logger.warning("Failed to query vector database for chunks: %s", e)
         logger.debug("Vector DB query error details: %s", traceback.format_exc())

1304-1307: RAG context formatting could benefit from a helper function.

The RAG context formatting logic could be extracted for reusability and testability.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0550f02 and eaf66dd.

📒 Files selected for processing (11)
  • run.yaml (3 hunks)
  • src/app/endpoints/query.py (6 hunks)
  • src/app/endpoints/streaming_query.py (14 hunks)
  • src/models/context.py (2 hunks)
  • src/models/requests.py (2 hunks)
  • src/models/responses.py (1 hunks)
  • src/utils/checks.py (1 hunks)
  • tests/unit/app/endpoints/test_conversations_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_query.py (4 hunks)
  • tests/unit/cache/test_postgres_cache.py (1 hunks)
  • tests/unit/cache/test_sqlite_cache.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/cache/test_postgres_cache.py
  • tests/unit/app/endpoints/test_query.py
  • tests/unit/cache/test_sqlite_cache.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/cache/test_postgres_cache.py
  • tests/unit/app/endpoints/test_query.py
  • tests/unit/cache/test_sqlite_cache.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/models/context.py
  • src/utils/checks.py
  • src/models/responses.py
  • src/models/requests.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Pydantic configuration classes must extend ConfigurationBase; data models must extend BaseModel

Files:

  • src/models/context.py
  • src/models/responses.py
  • src/models/requests.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
🧠 Learnings (3)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/**/*.py : Use `field_validator` and `model_validator` for custom validation in Pydantic models

Applied to files:

  • src/models/requests.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).

Applied to files:

  • src/app/endpoints/streaming_query.py
🧬 Code graph analysis (3)
src/models/context.py (1)
src/models/responses.py (2)
  • RAGChunk (25-30)
  • ReferencedDocument (326-336)
src/app/endpoints/query.py (1)
src/models/responses.py (2)
  • RAGChunk (25-30)
  • ReferencedDocument (326-336)
src/app/endpoints/streaming_query.py (3)
src/models/responses.py (2)
  • RAGChunk (25-30)
  • ReferencedDocument (326-336)
src/utils/types.py (1)
  • TurnSummary (89-163)
src/app/endpoints/query.py (1)
  • get_rag_toolgroups (950-977)
🪛 GitHub Actions: E2E Tests
src/utils/checks.py

[error] 87-87: InvalidConfigurationError: Check directory to store feedback '/tmp/data/feedback' cannot be created - parent directory does not exist

🔇 Additional comments (10)
src/utils/checks.py (1)

82-91: Unable to verify review comment claims due to repository access constraints.

The review comment makes specific claims about a pipeline failure and broken E2E tests that require access to the full codebase, test files, and pipeline logs to verify. Without repository access, I cannot:

  • Confirm the reported pipeline failure with /tmp/data/feedback
  • Verify all usages of directory_check and assess impact
  • Review test code to understand breaking changes
  • Determine if the parent directory validation is the root cause

Manual verification is required by a developer with repository access.

src/models/responses.py (1)

326-336: doc_url union type change looks correct

Allowing doc_url: AnyUrl | str | None aligns with the new tests and preserves backwards compatibility for existing AnyUrl usage; no issues seen here.

tests/unit/app/endpoints/test_conversations_v2.py (1)

93-116: Updated expectation for doc_url string is appropriate

Asserting str(ref_docs[0]["doc_url"]) == "http://example.com" makes the test compatible with doc_url being either an AnyUrl or a plain string, matching the updated model typing.

tests/unit/cache/test_postgres_cache.py (1)

516-537: More granular CacheEntry assertions improve robustness

The revised test that checks each CacheEntry field and stringifies doc_url before comparison is clearer and more resilient to representation differences than relying on whole-object equality.

tests/unit/app/endpoints/test_query.py (1)

1007-1102: Tests correctly capture new doc_url semantics (including malformed URLs)

The updated expectations:

  • Using str(doc.doc_url) / str(docs[i].doc_url) for valid URLs, and
  • Asserting that malformed URLs yield a non-None ReferencedDocument with doc.doc_url == "not a valid url"

are consistent with ReferencedDocument.doc_url now allowing raw strings as well as AnyUrl. This gives better resilience to imperfect metadata without dropping referenced documents.

src/app/endpoints/query.py (1)

289-423: RAG chunk propagation into transcripts and QueryResponse is wired correctly

Unpacking summary.rag_chunks, converting them once to rag_chunks_dict for store_transcript, and exposing rag_chunks=summary.rag_chunks or [] via QueryResponse keeps the same source of truth for RAG data across storage and API response. The additional referenced_documents handling remains consistent with that contract.

run.yaml (2)

19-19: LGTM!

Using environment variable reference for external_providers_dir improves configurability across different deployment environments.


144-160: LGTM!

The vector_dbs and embedding model configurations are consistent with the new Solr provider setup. The 384 embedding dimension correctly matches the granite-embedding-30m-english model specification.

src/app/endpoints/streaming_query.py (2)

78-78: LGTM!

The logger correctly uses the shared logger name "app.endpoints.handlers" as per the project convention for unified logging across endpoint handlers. Based on learnings.


1218-1227: Potential TypeError when concatenating toolgroups.

If get_rag_toolgroups returns None (when vector_db_ids is empty), and mcp_toolgroups is a list, line 1225 could attempt None + list causing a TypeError. However, the condition on line 1224 guards this case.

The logic is correct but could be clearer. The current flow works because line 1224 checks if vector_db_ids before concatenation.

vector_db_id=vector_db_id, query=query_request.query, params=params
)

logger.info("The query response total payload: %s", query_response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid logging entire query response payloads.

Logging the full query_response payload at INFO level may expose sensitive document content in logs. Consider logging only metadata (chunk count, scores) or moving to DEBUG level.

-            logger.info("The query response total payload: %s", query_response)
+            logger.debug("Query response chunk count: %d", len(query_response.chunks) if query_response.chunks else 0)

Also applies to: 1264-1264

🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 1022 and 1264, the code
currently logs the entire query_response payload which can expose sensitive
document content; instead, do not log full payloads at INFO level—either change
the log calls to logger.debug or replace the message with metadata-only
information (e.g., number of chunks, top scores, document IDs, and any
non-sensitive timing/size metrics). Extract and log only those safe fields
(chunk count, aggregated scores, document ids/labels) and remove or redact the
full query_response from INFO-level logs; ensure any debug-level log that
contains more detail is gated behind DEBUG so sensitive data is not emitted in
production.

Comment on lines +1046 to +1048
ReferencedDocument(
doc_title=metadata.get("title", None),
doc_url="https://mimir.corp.redhat.com"
+ reference_doc,
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Hardcoded URL should be extracted to configuration.

The URL https://mimir.corp.redhat.com is hardcoded in multiple places. This should be moved to configuration for flexibility across environments.

Consider adding a configuration value like mimir_base_url and referencing it here:

# In configuration
mimir_base_url = configuration.mimir_base_url or "https://mimir.corp.redhat.com"

# Then use:
source = urljoin(mimir_base_url, parent_id)

Also applies to: 1067-1069, 1275-1277

🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 1046-1051 (also apply same
change at 1067-1069 and 1275-1277): the base URL "https://mimir.corp.redhat.com"
is hardcoded; extract it into configuration (e.g., add mimir_base_url in your
settings with a sensible default), read that config value where this module
runs, and replace the string concatenation with a safe join (use urljoin or
equivalent) to build the full URL from mimir_base_url and the relative path
(reference_doc / parent_id). Ensure the code falls back to the default when
config is missing and update all other occurrences at the noted lines to use the
same config value.

Comment on lines 90 to 91
if not os.access(parent_path, os.W_OK):
raise InvalidConfigurationError(f"{desc} '{path}' is not writable")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix misleading error message.

The error message references path but the check is for parent_path writability. This makes debugging more difficult.

Apply this diff to fix the error message:

             if not os.access(parent_path, os.W_OK):
-                raise InvalidConfigurationError(f"{desc} '{path}' is not writable")
+                raise InvalidConfigurationError(
+                    f"{desc} '{path}' cannot be created - parent directory is not writable"
+                )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not os.access(parent_path, os.W_OK):
raise InvalidConfigurationError(f"{desc} '{path}' is not writable")
if not os.access(parent_path, os.W_OK):
raise InvalidConfigurationError(
f"{desc} '{path}' cannot be created - parent directory is not writable"
)
🤖 Prompt for AI Agents
In src/utils/checks.py around lines 90 to 91, the raised
InvalidConfigurationError currently mentions the child `path` while the code is
checking writability of `parent_path`; update the error message to reference the
parent directory (parent_path) — ideally include both parent_path and the
original path in the message (e.g. "parent directory '{parent_path}' for
'{path}' is not writable") so the error accurately reflects what failed.

@tisnik tisnik requested review from are-ces and asimurka December 8, 2025 17:17
Copy link
Contributor

@asimurka asimurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rebase, you will need to change vector_db to vector_stores, messages from UserMessage format to pure string and copy everything to query v2 implementations that use responses api. Overall LGTM, great work!

@Anxhela21 Anxhela21 force-pushed the anx/streaming_query branch from eaf66dd to 60c3c4f Compare December 9, 2025 21:43
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/database.py (1)

116-136: Add a default case to handle unsupported database types.

The match statement lacks a default case. If db_config.db_type is neither "sqlite" nor "postgres", engine remains None and Line 136 creates a sessionmaker with bind=None. This could lead to confusing runtime errors later.

         case "postgres":
             logger.info("Initialize PostgreSQL database")
             postgres_config = db_config.config
             logger.debug("Configuration: %s", postgres_config)
             if not isinstance(postgres_config, PostgreSQLDatabaseConfiguration):
                 raise TypeError(
                     f"Expected PostgreSQLDatabaseConfiguration, got {type(postgres_config)}"
                 )
             engine = _create_postgres_engine(postgres_config, **create_engine_kwargs)
+        case _:
+            raise ValueError(f"Unsupported database type: {db_config.db_type}")

     session_local = sessionmaker(autocommit=False, autoflush=False, bind=engine)
♻️ Duplicate comments (4)
src/app/endpoints/streaming_query.py (4)

1046-1051: Hardcoded URL should be extracted to configuration - still present.

The URL https://mimir.corp.redhat.com is hardcoded in multiple places. This was flagged in a previous review.

Also applies to: 1067-1069, 1275-1277


905-908: RAG chunks queried twice per streaming request - still present.

The vector DB is queried at line 906-908 via query_vector_io_for_chunks, and again within retrieve_response at lines 1243-1298. This doubles latency and load. Consider passing the pre-fetched chunks to retrieve_response or consolidating the query logic.

Also applies to: 1243-1298


1022-1022: Avoid logging entire query response payloads - still present.

This was flagged in a previous review. Logging the full query_response payload at INFO level may expose sensitive document content.

-            logger.info("The query response total payload: %s", query_response)
+            logger.debug("Query response chunk count: %d", len(query_response.chunks) if query_response.chunks else 0)

1025-1033: Redundant rag_chunks creation - still present.

The rag_chunks list created on lines 1025-1033 is never used; it's overwritten by final_rag_chunks on lines 1059-1082. Remove the unused code.

             if query_response.chunks:
-                rag_chunks = [
-                    RAGChunk(
-                        content=str(chunk.content),  # Convert to string if needed
-                        source=getattr(chunk, "doc_id", None)
-                        or getattr(chunk, "source", None),
-                        score=getattr(chunk, "score", None),
-                    )
-                    for chunk in query_response.chunks[:5]  # Limit to top 5 chunks
-                ]
-                logger.info("Retrieved %d chunks from vector DB", len(rag_chunks))
-
                 # Extract doc_ids from chunks for referenced_documents

Also applies to: 1059-1084

🧹 Nitpick comments (41)
tests/e2e/features/steps/common_http.py (1)

123-132: Richer status-code assertion is good; consider a small style tweak

Including the parsed JSON (or text fallback) in the failure message is very helpful for debugging and keeps behavior correct. The if ...: assert False, ... pattern is slightly unconventional, though; consider raising AssertionError directly for clarity and to avoid the dummy False:

-    if context.response.status_code != status:
-        # Include response body in error message for debugging
-        try:
-            error_body = context.response.json()
-        except Exception:
-            error_body = context.response.text
-        assert False, (
-            f"Status code is {context.response.status_code}, expected {status}. "
-            f"Response: {error_body}"
-        )
+    if context.response.status_code != status:
+        # Include response body in error message for debugging
+        try:
+            error_body = context.response.json()
+        except Exception:
+            error_body = context.response.text
+        raise AssertionError(
+            f"Status code is {context.response.status_code}, expected {status}. "
+            f"Response: {error_body}"
+        )

Optionally, you could also narrow the exception to ValueError/json.JSONDecodeError instead of catching Exception, but that’s not critical here.

tests/e2e/features/conversation_cache_v2.feature (1)

13-29: TODO: Track and address the skipped test for empty vector DB bug.

This scenario is currently skipped due to a known bug where an empty vector database causes a 500 error. The test should be re-enabled once the underlying issue is resolved.

Do you want me to create an issue to track this bug and the test re-enablement, or is this already tracked under one of the referenced stories (LCORE-1036, LCORE-935, LCORE-882)?

scripts/gen_doc.py (2)

10-10: Adding tests directories to DIRECTORIES may overwrite existing READMEs

Including "tests/unit", "tests/integration", and "tests/e2e" means this script will now (re)generate README.md in every one of those directories and their subdirectories, overwriting any manually maintained READMEs there. If that’s intentional, this is fine; if some test folders have curated documentation, you may want to either narrow the list or skip directories that already contain a non‑generated README.


53-63: Harden main loop against missing directories and reduce brittleness of path handling

As written, generate_documentation_on_path(f"{directory}/") will raise if any directory in DIRECTORIES doesn’t exist (e.g., tests/e2e in an older or partial checkout). You can make this more robust by checking for existence once and reusing a Path object:

 def main():
     """Entry point to this script, regenerates documentation in all directories."""
-    for directory in DIRECTORIES:
-        generate_documentation_on_path(f"{directory}/")
-        for path in Path(directory).rglob("*"):
-            if path.is_dir():
-                if (
-                    path.name == "lightspeed_stack.egg-info"
-                    or path.name == "__pycache__"
-                    or ".ruff_cache" in str(path)
-                ):
-                    continue
-                generate_documentation_on_path(path)
+    for directory in DIRECTORIES:
+        base = Path(directory)
+        if not base.exists():
+            print(f"[gendoc] Skipping missing directory: {base}")
+            continue
+
+        generate_documentation_on_path(base)
+        for path in base.rglob("*"):
+            if not path.is_dir():
+                continue
+            if (
+                path.name == "lightspeed_stack.egg-info"
+                or path.name == "__pycache__"
+                or ".ruff_cache" in str(path)
+            ):
+                continue
+            generate_documentation_on_path(path)

This keeps behavior the same when directories are present, but avoids failures (and makes the assumption about directory existence explicit) when they’re not.

docs/demos/lcore/lcore.md (3)

34-37: Improve language clarity.

The phrase "It's a real framework independent on programming language" is awkward. Consider revising to: "It's a language-independent framework with providers for RAG, quota control, guardrails, metrics, etc."


23-104: Consider the scope and audience for the extensive Llama Stack content.

The PR introduces significant Llama Stack framework documentation (lines 23-104) comprising roughly 40% of the file's content. While this provides valuable context, verify that this level of detail is appropriate for a Lightspeed Core demo presentation. The new Solr Vector I/O Provider, streaming query support, and keyword filtering features are indirectly covered under RAG and LLM Inference sections, but not explicitly highlighted.

If this is intended as an architectural overview that positions Lightspeed Core within the broader Llama Stack ecosystem, the current approach works well. However, if the primary audience expects practical Solr-specific details or streaming implementation guidance, consider adding a dedicated technical section.


75-104: Consider emoji accessibility for documentation portability.

The section headers use emoji (🤖, 🛡️, 🔧, 📚, 🎯) for visual appeal. While this works well for browser/HTML presentation, verify that emoji render correctly across all planned distribution formats (PDF, markdown viewers, screen readers, etc.). If the documentation will be widely shared or translated, consider adding text fallback descriptions or using a more neutral header style.

src/app/endpoints/tools.py (4)

40-59: Docstring now slightly under-describes builtin toolgroups

The implementation clearly returns tools from both builtin toolgroups and MCP server toolgroups (distinguished via server_source), but the docstring still frames this primarily as “from all configured MCP servers.” Consider updating the description (and possibly the Raises section) to reflect that builtin toolgroups are also included and that only upstream connectivity issues currently surface as HTTP errors while missing toolgroups are skipped.


68-76: Good upstream connection handling; consider redacting low-level error details

Mapping connection failures to a 503 using ServiceUnavailableResponse and raising HTTPException(**response.model_dump()) gives a clean, centralized error path and aligns with the “handle Llama Stack connection errors” guidance. The only concern is that cause=str(e) may expose low‑level connection details to clients (hosts, ports, internal messages). If you want a stricter separation between internal diagnostics and public API, consider logging the full exception but returning a more generic cause (or omitting it) in the response body.

If you want to double‑check, please confirm in the Llama Stack client docs that the connection error you’re catching here is the one you intend to surface and that its message doesn’t contain sensitive data.


85-97: BadRequestError is silently converted into partial success

For each toolgroup, a BadRequestError is logged and then skipped, resulting in a 200 response with that toolgroup’s tools omitted. This may be exactly what you want (best‑effort aggregation), but it means misconfigured/removed toolgroups never surface to clients as errors, only as missing tools and log noise.

If misconfiguration is something operators or clients should notice, consider one of:

  • Downgrading log level to warning if this is expected and transient, or keeping error but adding metrics; and/or
  • Surfacing partial‑failure information in ToolsResponse (e.g., a “skipped_toolgroups” field) instead of only logs.

This keeps the endpoint resilient while making failures more visible and debuggable.

You may want to confirm in the Llama Stack client docs whether BadRequestError here strictly means “toolgroup not found” vs broader 4xx classes, to decide how user‑visible this should be.


78-83: Aggregation and server_source derivation are correct; possible micro-refactors

The aggregation over toolgroups_response, derivation of server_source (builtin vs MCP URL), and the final summary logging all look logically sound. A couple of optional refinements if this endpoint becomes hot or mcp_servers grows:

  • Instead of recomputing next(s for s in configuration.mcp_servers ...) for each tool, prebuild a {name: server} dict once so lookup is O(1).
  • The builtin vs MCP counts in the info log require two full passes over consolidated_tools; if needed, you could maintain these counters while appending tools.

These are minor and only worth doing if this endpoint turns into a high‑traffic or large‑dataset path.

Also applies to: 99-127, 135-140

scripts/llama_stack_tutorial.sh (1)

1-24: Consider adding error handling.

The script lacks error handling directives like set -e which would cause it to exit on errors. While this might be intentional for a tutorial script (to continue even if some commands fail), consider adding set -e or at least documenting this behavior.

Add error handling at the top of the script:

 #!/bin/bash
+set -e  # Exit on error
 
 # Llama Stack Tutorial - Interactive Guide 

Or if you want the tutorial to continue even on errors, add a comment explaining this:

 #!/bin/bash
+# Note: Error handling is intentionally omitted to allow the tutorial to continue
+# even if some endpoints are unavailable
 
 # Llama Stack Tutorial - Interactive Guide 
src/authentication/k8s.py (1)

261-284: Variable shadowing: response is reused with different types.

The variable response is first assigned the SAR result (line 272), then conditionally reassigned to a ForbiddenResponse (line 283). This shadowing makes the code harder to follow and could lead to bugs if the logic changes.

         try:
             authorization_api = K8sClientSingleton.get_authz_api()
             sar = kubernetes.client.V1SubjectAccessReview(
                 spec=kubernetes.client.V1SubjectAccessReviewSpec(
                     user=user_info.user.username,
                     groups=user_info.user.groups,
                     non_resource_attributes=kubernetes.client.V1NonResourceAttributes(
                         path=self.virtual_path, verb="get"
                     ),
                 )
             )
-            response = authorization_api.create_subject_access_review(sar)
+            sar_response = authorization_api.create_subject_access_review(sar)
 
         except Exception as e:
             logger.error("API exception during SubjectAccessReview: %s", e)
             response = ServiceUnavailableResponse(
                 backend_name="Kubernetes API",
                 cause="Unable to perform authorization check",
             )
             raise HTTPException(**response.model_dump()) from e
 
-        if not response.status.allowed:
+        if not sar_response.status.allowed:
             response = ForbiddenResponse.endpoint(user_id=user_info.user.uid)
             raise HTTPException(**response.model_dump())
src/authentication/jwk_token.py (1)

37-46: TODO: Missing error handling for connection issues.

The function has a TODO comment (line 42) about handling connection errors and timeouts. While aiohttp.ClientError is caught in the caller (lines 134-139), the resp.raise_for_status() on line 44 will raise aiohttp.ClientResponseError which is a subclass of ClientError and is handled.

However, consider adding an explicit timeout to prevent hanging on slow/unresponsive servers:

 async def get_jwk_set(url: str) -> KeySet:
     """Fetch the JWK set from the cache, or fetch it from the URL if not cached."""
     async with _jwk_cache_lock:
         if url not in _jwk_cache:
-            async with aiohttp.ClientSession() as session:
-                # TODO(omertuc): handle connection errors, timeouts, etc.
-                async with session.get(url) as resp:
+            timeout = aiohttp.ClientTimeout(total=10)
+            async with aiohttp.ClientSession(timeout=timeout) as session:
+                async with session.get(url) as resp:
                     resp.raise_for_status()
                     _jwk_cache[url] = JsonWebKey.import_key_set(await resp.json())
         return _jwk_cache[url]
src/configuration.py (1)

22-22: Type rename from ConversationCacheConfiguration to ConversationHistoryConfiguration.

The import and return type annotation have been updated to use ConversationHistoryConfiguration, aligning with the broader rename across the codebase.

Consider renaming the method to match the new type.

The property method name conversation_cache_configuration still contains "cache" while the return type is now ConversationHistoryConfiguration. For consistency, consider renaming to conversation_history_configuration in a future refactor.

Also applies to: 141-141

docs/streaming_query_endpoint.puml (2)

27-28: Consider clarifying shield event behavior.

Based on the AI summary, shield events are only emitted on violations (silent on success). The diagram shows unconditional "emit shield validation event" which may be misleading. Consider adding a note or updating the alt text to clarify this conditional behavior.

     else Chunk Type: shield
-        EventHandler->>SSE: emit shield validation event
+        EventHandler->>SSE: emit shield violation event (if violation detected)

13-14: Consider documenting RAG retrieval in the flow.

The AI summary indicates the streaming flow incorporates RAG data via query_vector_io_for_chunks, producing rag_chunks and vector_io_referenced_docs that are included in end-stream payloads. Consider adding this to the diagram for completeness.

src/constants.py (1)

102-104: New authentication module constants are properly defined.

The constants follow the existing naming pattern and use descriptive string values. Consider adding brief comments describing each module's purpose, consistent with other constants in this file (e.g., lines 93-98 have comments).

As per coding guidelines, constants should have descriptive comments:

 AUTH_MOD_NOOP_WITH_TOKEN = "noop-with-token"
+# API key based authentication module
 AUTH_MOD_APIKEY_TOKEN = "api-key-token"
 AUTH_MOD_JWK_TOKEN = "jwk-token"
+# Red Hat Identity header authentication module
 AUTH_MOD_RH_IDENTITY = "rh-identity"
src/cache/cache_factory.py (1)

4-4: Config type rename wiring is correct; optional docstring polish

Switching conversation_cache to accept ConversationHistoryConfiguration and extending the invalid type error message to include CACHE_TYPE_NOOP are consistent with the new configuration model; behavior of the factory remains the same. You may optionally update the function docstring to mention NoopCache as a possible return type for completeness, but it's not functionally required.

Also applies to: 20-20, 45-49

tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml (1)

1-32: Verify auth_enabled for noop-with-token e2e config

This config is named for the noop-with-token auth flow and sets authentication.module: "noop-with-token", but service.auth_enabled is false. If the intent is to exercise token-based auth in the e2e suite (Authorized-tag scenarios), you may want auth_enabled: true; otherwise this config might not actually enforce the auth module. Please double-check against how other noop-with-token configs are used in the tests.
Based on learnings, noop-with-token tests typically rely on a dedicated config when the Authorized tag is present.

src/authorization/middleware.py (1)

35-45: Confirm APIKEY_TOKEN → Noop resolvers mapping and structured error usage

The switch to InternalServerErrorResponse/ForbiddenResponse for raising HTTP exceptions via response.model_dump() looks consistent with the new response model pattern and keeps error shapes centralized.

For AUTH_MOD_APIKEY_TOKEN, get_authorization_resolvers now returns NoopRolesResolver and NoopAccessResolver, which effectively disables role-based checks—any request that passes API key authentication will satisfy check_access(action, user_roles). If API keys are meant to grant blanket access, this is fine; if you intend to keep per-Action authorization semantics even for API keys, you may want a dedicated resolver instead of the noop pair. Please confirm the intended behavior.

Also applies to: 68-69, 85-86, 94-95

src/app/endpoints/rags.py (2)

53-99: Consider log level for configuration details.

Line 82 logs the entire llama_stack_configuration object at info level. This could potentially expose sensitive configuration details in production logs. Consider using debug level instead, or logging only non-sensitive attributes.

-    logger.info("Llama stack config: %s", llama_stack_configuration)
+    logger.debug("Llama stack config: %s", llama_stack_configuration)

101-154: Same log level concern applies here.

Line 130 has the same pattern—consider using debug level to avoid leaking configuration details in production logs.

-    logger.info("Llama stack config: %s", llama_stack_configuration)
+    logger.debug("Llama stack config: %s", llama_stack_configuration)
tests/e2e/features/environment.py (1)

84-102: Consider using elif for mutually exclusive config scenarios.

Lines 92-102 use separate if statements for InvalidFeedbackStorageConfig and NoCacheConfig. If a scenario were tagged with both, NoCacheConfig would overwrite the config and trigger an immediate restart. If these are mutually exclusive, using elif would make this explicit and prevent unexpected behavior.

     if "InvalidFeedbackStorageConfig" in scenario.effective_tags:
         context.scenario_config = f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-invalid-feedback-storage.yaml"
-    if "NoCacheConfig" in scenario.effective_tags:
+    elif "NoCacheConfig" in scenario.effective_tags:
         context.scenario_config = (
             f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-no-cache.yaml"
         )
src/app/endpoints/root.py (2)

779-784: Consider adding 200 response for OpenAPI consistency.

Unlike other endpoints in this PR (e.g., shields_responses, models_responses), root_responses doesn't include a 200 response entry. While the response_class=HTMLResponse handles the success case, adding an explicit 200 entry would maintain consistency across endpoints.


787-808: Parameter order differs from other endpoints.

The signature has auth before request, while other endpoints in this PR (e.g., rags.py, shields.py, models.py) use request before auth. Consider reordering for consistency.

 async def root_endpoint_handler(
-    auth: Annotated[AuthTuple, Depends(get_auth_dependency())],
     request: Request,
+    auth: Annotated[AuthTuple, Depends(get_auth_dependency())],
 ) -> HTMLResponse:
src/app/endpoints/metrics.py (1)

64-64: Unnecessary str() cast on CONTENT_TYPE_LATEST.

CONTENT_TYPE_LATEST from prometheus_client is already a string constant. The explicit str() conversion is redundant.

-    return PlainTextResponse(generate_latest(), media_type=str(CONTENT_TYPE_LATEST))
+    return PlainTextResponse(generate_latest(), media_type=CONTENT_TYPE_LATEST)
.github/workflows/e2e_tests.yaml (1)

188-191: Consider using health check polling instead of fixed sleep.

A fixed 20-second sleep may be insufficient under load or excessive when services start quickly. Consider polling the health endpoint with a timeout.

      - name: Wait for services
        run: |
          echo "Waiting for services to be healthy..."
-          sleep 20
+          for i in {1..30}; do
+            if curl -sf http://localhost:8080/health > /dev/null 2>&1; then
+              echo "Services healthy after $i seconds"
+              break
+            fi
+            sleep 1
+          done
docs/config.json (1)

156-162: Minor: Consider minimum value of 1 for embedding_dimension.

An embedding dimension of 0 is technically allowed by the schema but would be semantically invalid.

           "embedding_dimension": {
             "default": 768,
             "description": "Dimensionality of embedding vectors.",
-            "minimum": 0,
+            "minimum": 1,
             "title": "Embedding dimension",
             "type": "integer"
           },
src/utils/endpoints.py (2)

637-659: Consider early return for disabled transcripts.

The nested if-else structure can be simplified with an early return pattern for better readability.

     # Store transcript if enabled
-    if not is_transcripts_enabled_func():
-        logger.debug("Transcript collection is disabled in the configuration")
-    else:
-        # Prepare attachments
-        attachments = query_request.attachments or []
-
-        # Determine rag_chunks: use provided value or empty list
-        transcript_rag_chunks = rag_chunks if rag_chunks is not None else []
-
-        store_transcript_func(
-            user_id=user_id,
-            conversation_id=conversation_id,
-            model_id=model_id,
-            provider_id=provider_id,
-            query_is_valid=True,
-            query=query_request.query,
-            query_request=query_request,
-            summary=summary,
-            rag_chunks=transcript_rag_chunks,
-            truncated=False,
-            attachments=attachments,
-        )
+    if is_transcripts_enabled_func():
+        attachments = query_request.attachments or []
+        transcript_rag_chunks = rag_chunks if rag_chunks is not None else []
+        store_transcript_func(
+            user_id=user_id,
+            conversation_id=conversation_id,
+            model_id=model_id,
+            provider_id=provider_id,
+            query_is_valid=True,
+            query=query_request.query,
+            query_request=query_request,
+            summary=summary,
+            rag_chunks=transcript_rag_chunks,
+            truncated=False,
+            attachments=attachments,
+        )
+    else:
+        logger.debug("Transcript collection is disabled in the configuration")

591-610: Consider refactoring cleanup_after_streaming to reduce parameter count.

This function has 19 parameters (triggering pylint R0913/R0917). Consider grouping related parameters into a dataclass or TypedDict to improve maintainability and reduce the chance of argument ordering errors.

Example approach:

@dataclass
class StreamingCleanupContext:
    user_id: str
    conversation_id: str
    model_id: str
    provider_id: str
    llama_stack_model_id: str
    query_request: QueryRequest
    summary: TurnSummary
    metadata_map: dict[str, Any]
    started_at: str
    skip_userid_check: bool
    rag_chunks: list[dict[str, Any]] | None = None

async def cleanup_after_streaming(
    context: StreamingCleanupContext,
    client: AsyncLlamaStackClient,
    config: AppConfig,
    callbacks: StreamingCleanupCallbacks,  # another dataclass for the 4 funcs
) -> None:
src/app/endpoints/conversations_v2.py (1)

229-241: Redundant cache type check at line 232.

The check_conversation_existence function checks if configuration.conversation_cache_configuration.type is None but this is already validated in the calling handlers before this function is invoked. Consider removing the redundant check or documenting why it's needed for safety.

 def check_conversation_existence(user_id: str, conversation_id: str) -> None:
     """Check if conversation exists."""
-    # checked already, but we need to make pyright happy
-    if configuration.conversation_cache_configuration.type is None:
-        return
+    # Callers must validate cache availability before calling this function
+    assert configuration.conversation_cache_configuration.type is not None
     conversations = configuration.conversation_cache.list(user_id, False)

Alternatively, if keeping for type narrowing, the comment already explains the intent which is acceptable.

src/app/endpoints/query.py (4)

79-82: Configuration TODO noted - consider addressing before merge.

The OFFLINE flag controls chunk source resolution but is currently hardcoded. The TODO comment acknowledges this should be moved to configuration. Consider prioritizing this to avoid environment-specific issues.


783-790: Use lazy % formatting for logger calls instead of f-strings.

f-strings are evaluated even when the log level is disabled, causing unnecessary overhead. Use %s placeholders with logger for deferred formatting.

-            logger.info(f"Initial params: {params}")
-            logger.info(f"query_request.solr: {query_request.solr}")
+            logger.info("Initial params: %s", params)
+            logger.info("query_request.solr: %s", query_request.solr)
             if query_request.solr:
                 # Pass the entire solr dict under the 'solr' key
                 params["solr"] = query_request.solr
-                logger.info(f"Final params with solr filters: {params}")
+                logger.info("Final params with solr filters: %s", params)
             else:
                 logger.info("No solr filters provided")
-            logger.info(f"Final params being sent to vector_io.query: {params}")
+            logger.info("Final params being sent to vector_io.query: %s", params)

816-821: Hardcoded URL should be extracted to configuration.

The URL https://mimir.corp.redhat.com is hardcoded. This should be configurable for different environments.

Consider adding a configuration value like mimir_base_url:

# In configuration
mimir_base_url = configuration.mimir_base_url or "https://mimir.corp.redhat.com"

# Then use:
doc_url=urljoin(mimir_base_url, reference_doc),

827-830: Consider narrowing the exception type or logging the exception class.

Catching bare Exception can mask unexpected errors. Consider catching more specific exceptions or at minimum logging the exception type for debugging.

-    except Exception as e:
-        logger.warning(f"Failed to query vector database for chunks: {e}")
-        logger.debug(f"Vector DB query error details: {traceback.format_exc()}")
+    except Exception as e:
+        logger.warning("Failed to query vector database for chunks: %s (%s)", e, type(e).__name__)
+        logger.debug("Vector DB query error details: %s", traceback.format_exc())
src/app/endpoints/streaming_query.py (2)

505-515: Inconsistent behavior: "No Violation" token emitted for successful shield validations.

For successful shield validations (no violations), the code now emits a "No Violation" token to the stream. This may clutter the response with unnecessary tokens. Consider whether this should be silently skipped instead.

         else:
-            # Yield "No Violation" message for successful shield validations
-            yield stream_event(
-                data={
-                    "id": chunk_id,
-                    "token": "No Violation",
-                },
-                event_type=LLM_TOKEN_EVENT,
-                media_type=media_type,
-            )
+            # Successful shield validations are silently ignored
+            pass

1296-1298: Bare Exception catch should log exception type.

Consider logging the exception type for better debugging.

     except Exception as e:
-        logger.warning("Failed to query vector database for chunks: %s", e)
+        logger.warning("Failed to query vector database for chunks: %s (%s)", e, type(e).__name__)
         logger.debug("Vector DB query error details: %s", traceback.format_exc())
src/app/endpoints/streaming_query_v2.py (2)

56-56: Use __name__ for module logger pattern.

Per coding guidelines, use logger = logging.getLogger(__name__) instead of a hardcoded string. This ensures consistent logger naming across the codebase.

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)

101-103: Consider specifying a more precise return type.

The return type Any is vague. Consider using a Callable type hint for better documentation and IDE support.

-def create_responses_response_generator(  # pylint: disable=too-many-locals,too-many-statements
-    context: ResponseGeneratorContext,
-) -> Any:
+from collections.abc import Callable
+
+def create_responses_response_generator(  # pylint: disable=too-many-locals,too-many-statements
+    context: ResponseGeneratorContext,
+) -> Callable[[AsyncIterator[OpenAIResponseObjectStream]], AsyncIterator[str]]:
src/models/config.py (1)

841-850: Redundant validation check.

The check if self.api_key_config.api_key.get_secret_value() is None on line 847 is redundant. The APIKeyTokenConfiguration class already has min_length=1 on the api_key field, which would fail validation before this check is reached. Additionally, get_secret_value() returns a string, not None, for a valid SecretStr.

Consider simplifying or removing this check:

         if self.module == constants.AUTH_MOD_APIKEY_TOKEN:
             if self.api_key_config is None:
                 raise ValueError(
                     "API Key configuration section must be specified "
                     "when using API Key token authentication"
                 )
-            if self.api_key_config.api_key.get_secret_value() is None:
-                raise ValueError(
-                    "api_key parameter must be specified when using API_KEY token authentication"
-                )
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eaf66dd and 60c3c4f.

⛔ Files ignored due to path filters (12)
  • docs/architecture.png is excluded by !**/*.png
  • docs/architecture.svg is excluded by !**/*.svg
  • docs/config.png is excluded by !**/*.png
  • docs/config.svg is excluded by !**/*.svg
  • docs/convesation_history.svg is excluded by !**/*.svg
  • docs/demos/lcore/images/journey.png is excluded by !**/*.png
  • docs/demos/lcore/images/llama_stack_as_library.svg is excluded by !**/*.svg
  • docs/demos/lcore/images/llama_stack_as_service.svg is excluded by !**/*.svg
  • docs/query_endpoint.svg is excluded by !**/*.svg
  • docs/rest_api.svg is excluded by !**/*.svg
  • docs/streaming_query_endpoint.svg is excluded by !**/*.svg
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (107)
  • .coderabbit.yaml (1 hunks)
  • .github/PULL_REQUEST_TEMPLATE.md (1 hunks)
  • .github/workflows/e2e_tests.yaml (7 hunks)
  • .tekton/lightspeed-stack-pull-request.yaml (3 hunks)
  • .tekton/lightspeed-stack-push.yaml (3 hunks)
  • CONTRIBUTING.md (6 hunks)
  • Containerfile (2 hunks)
  • Makefile (3 hunks)
  • README.md (12 hunks)
  • docker-compose-library.yaml (1 hunks)
  • docker-compose.yaml (1 hunks)
  • docs/auth.md (1 hunks)
  • docs/config.html (1 hunks)
  • docs/config.json (1 hunks)
  • docs/config.md (1 hunks)
  • docs/config.puml (9 hunks)
  • docs/demos/lcore/lcore.md (5 hunks)
  • docs/query_endpoint.puml (1 hunks)
  • docs/rag_guide.md (2 hunks)
  • docs/streaming_query_endpoint.puml (1 hunks)
  • examples/lightspeed-stack-api-key-auth.yaml (1 hunks)
  • examples/lightspeed-stack-rh-identity.yaml (1 hunks)
  • pyproject.toml (2 hunks)
  • requirements.hermetic.txt (1 hunks)
  • requirements.torch.txt (1 hunks)
  • rpms.in.yaml (1 hunks)
  • rpms.lock.yaml (1 hunks)
  • scripts/gen_doc.py (2 hunks)
  • scripts/llama_stack_tutorial.sh (1 hunks)
  • scripts/remove_torch_deps.sh (1 hunks)
  • src/app/database.py (2 hunks)
  • src/app/endpoints/README.md (2 hunks)
  • src/app/endpoints/authorized.py (1 hunks)
  • src/app/endpoints/config.py (3 hunks)
  • src/app/endpoints/conversations.py (11 hunks)
  • src/app/endpoints/conversations_v2.py (8 hunks)
  • src/app/endpoints/feedback.py (6 hunks)
  • src/app/endpoints/health.py (2 hunks)
  • src/app/endpoints/info.py (4 hunks)
  • src/app/endpoints/metrics.py (3 hunks)
  • src/app/endpoints/models.py (3 hunks)
  • src/app/endpoints/providers.py (3 hunks)
  • src/app/endpoints/query.py (12 hunks)
  • src/app/endpoints/query_v2.py (9 hunks)
  • src/app/endpoints/rags.py (1 hunks)
  • src/app/endpoints/root.py (3 hunks)
  • src/app/endpoints/shields.py (3 hunks)
  • src/app/endpoints/streaming_query.py (13 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
  • src/app/endpoints/tools.py (3 hunks)
  • src/app/main.py (3 hunks)
  • src/app/routers.py (3 hunks)
  • src/authentication/README.md (2 hunks)
  • src/authentication/__init__.py (2 hunks)
  • src/authentication/api_key_token.py (1 hunks)
  • src/authentication/jwk_token.py (2 hunks)
  • src/authentication/k8s.py (4 hunks)
  • src/authentication/rh_identity.py (1 hunks)
  • src/authentication/utils.py (1 hunks)
  • src/authorization/middleware.py (5 hunks)
  • src/cache/cache_factory.py (3 hunks)
  • src/configuration.py (2 hunks)
  • src/constants.py (2 hunks)
  • src/metrics/utils.py (2 hunks)
  • src/models/README.md (1 hunks)
  • src/models/config.py (21 hunks)
  • src/models/context.py (1 hunks)
  • src/models/requests.py (4 hunks)
  • src/models/responses.py (17 hunks)
  • src/quota/__init__.py (1 hunks)
  • src/quota/quota_limiter.py (2 hunks)
  • src/quota/user_quota_limiter.py (1 hunks)
  • src/utils/README.md (1 hunks)
  • src/utils/checks.py (1 hunks)
  • src/utils/common.py (3 hunks)
  • src/utils/endpoints.py (6 hunks)
  • src/utils/quota.py (2 hunks)
  • src/utils/responses.py (1 hunks)
  • src/utils/shields.py (1 hunks)
  • test.containerfile (1 hunks)
  • tests/configuration/rh-identity-config.yaml (1 hunks)
  • tests/e2e/README.md (1 hunks)
  • tests/e2e/__init__.py (1 hunks)
  • tests/e2e/configs/README.md (1 hunks)
  • tests/e2e/configs/run-azure.yaml (1 hunks)
  • tests/e2e/configs/run-ci.yaml (3 hunks)
  • tests/e2e/configs/run-rhelai.yaml (1 hunks)
  • tests/e2e/configuration/README.md (1 hunks)
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml (1 hunks)
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml (1 hunks)
  • tests/e2e/configuration/lightspeed-stack-library-mode.yaml (1 hunks)
  • tests/e2e/configuration/lightspeed-stack-no-cache.yaml (1 hunks)
  • tests/e2e/configuration/lightspeed-stack-server-mode.yaml (1 hunks)
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml (1 hunks)
  • tests/e2e/features/README.md (1 hunks)
  • tests/e2e/features/authorized_noop_token.feature (2 hunks)
  • tests/e2e/features/conversation_cache_v2.feature (1 hunks)
  • tests/e2e/features/conversations.feature (5 hunks)
  • tests/e2e/features/environment.py (4 hunks)
  • tests/e2e/features/feedback.feature (4 hunks)
  • tests/e2e/features/health.feature (2 hunks)
  • tests/e2e/features/info.feature (4 hunks)
  • tests/e2e/features/query.feature (3 hunks)
  • tests/e2e/features/steps/README.md (1 hunks)
  • tests/e2e/features/steps/__init__.py (1 hunks)
  • tests/e2e/features/steps/common_http.py (2 hunks)
  • tests/e2e/features/steps/conversation.py (5 hunks)
⛔ Files not processed due to max files limit (25)
  • tests/e2e/features/steps/llm_query_response.py
  • tests/e2e/features/streaming_query.feature
  • tests/e2e/test_api.py
  • tests/e2e/utils/README.md
  • tests/integration/README.md
  • tests/integration/endpoints/README.md
  • tests/integration/endpoints/test_config_integration.py
  • tests/integration/endpoints/test_health_integration.py
  • tests/integration/endpoints/test_info_integration.py
  • tests/integration/endpoints/test_query_v2_integration.py
  • tests/integration/test_middleware_integration.py
  • tests/integration/test_openapi_json.py
  • tests/integration/test_rh_identity_integration.py
  • tests/unit/README.md
  • tests/unit/app/README.md
  • tests/unit/app/endpoints/README.md
  • tests/unit/app/endpoints/test_authorized.py
  • tests/unit/app/endpoints/test_config.py
  • tests/unit/app/endpoints/test_conversations.py
  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/app/endpoints/test_feedback.py
  • tests/unit/app/endpoints/test_health.py
  • tests/unit/app/endpoints/test_info.py
  • tests/unit/app/endpoints/test_metrics.py
  • tests/unit/app/endpoints/test_models.py
✅ Files skipped from review due to trivial changes (16)
  • src/models/README.md
  • rpms.in.yaml
  • tests/e2e/features/steps/README.md
  • .github/PULL_REQUEST_TEMPLATE.md
  • src/quota/init.py
  • examples/lightspeed-stack-api-key-auth.yaml
  • tests/e2e/README.md
  • tests/e2e/configuration/README.md
  • docs/config.md
  • tests/e2e/configs/README.md
  • tests/e2e/init.py
  • Containerfile
  • src/quota/user_quota_limiter.py
  • tests/e2e/features/README.md
  • tests/configuration/rh-identity-config.yaml
  • requirements.hermetic.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/utils/checks.py
🧰 Additional context used
📓 Path-based instructions (10)
tests/e2e/features/**/*.feature

📄 CodeRabbit inference engine (CLAUDE.md)

Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Files:

  • tests/e2e/features/info.feature
  • tests/e2e/features/feedback.feature
  • tests/e2e/features/health.feature
  • tests/e2e/features/conversation_cache_v2.feature
  • tests/e2e/features/query.feature
  • tests/e2e/features/authorized_noop_token.feature
  • tests/e2e/features/conversations.feature
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/e2e/features/steps/__init__.py
  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/conversation.py
  • tests/e2e/features/environment.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/app/main.py
  • src/metrics/utils.py
  • src/authentication/utils.py
  • src/authorization/middleware.py
  • src/app/endpoints/models.py
  • src/utils/endpoints.py
  • src/cache/cache_factory.py
  • src/models/context.py
  • src/app/endpoints/tools.py
  • src/authentication/jwk_token.py
  • src/configuration.py
  • src/app/routers.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/rags.py
  • src/authentication/rh_identity.py
  • src/utils/shields.py
  • src/app/endpoints/health.py
  • src/models/requests.py
  • src/app/endpoints/query.py
  • src/utils/common.py
  • src/authentication/k8s.py
  • src/app/endpoints/providers.py
  • src/utils/responses.py
  • src/utils/quota.py
  • src/app/endpoints/info.py
  • src/app/database.py
  • src/constants.py
  • src/authentication/api_key_token.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/config.py
  • src/authentication/__init__.py
  • src/app/endpoints/conversations.py
  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/shields.py
  • src/app/endpoints/authorized.py
  • src/quota/quota_limiter.py
  • src/app/endpoints/root.py
  • src/models/responses.py
  • src/app/endpoints/query_v2.py
  • src/models/config.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/models.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/rags.py
  • src/app/endpoints/health.py
  • src/app/endpoints/query.py
  • src/app/endpoints/providers.py
  • src/app/endpoints/info.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/config.py
  • src/app/endpoints/conversations.py
  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/shields.py
  • src/app/endpoints/authorized.py
  • src/app/endpoints/root.py
  • src/app/endpoints/query_v2.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/models.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/rags.py
  • src/app/endpoints/health.py
  • src/app/endpoints/query.py
  • src/app/endpoints/providers.py
  • src/app/endpoints/info.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/config.py
  • src/app/endpoints/conversations.py
  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/shields.py
  • src/app/endpoints/authorized.py
  • src/app/endpoints/root.py
  • src/app/endpoints/query_v2.py
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

pyproject.toml: Configure pylint with source-roots = "src"
Exclude src/auth/k8s.py from pyright type checking

Files:

  • pyproject.toml
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Pydantic configuration classes must extend ConfigurationBase; data models must extend BaseModel

Files:

  • src/models/context.py
  • src/models/requests.py
  • src/models/responses.py
  • src/models/config.py
src/**/constants.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define shared constants in central constants.py file with descriptive comments

Files:

  • src/constants.py
src/**/__init__.py

📄 CodeRabbit inference engine (CLAUDE.md)

Package __init__.py files must contain brief package descriptions

Files:

  • src/authentication/__init__.py
src/models/config.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/config.py: All configuration must use Pydantic models extending ConfigurationBase with extra="forbid" to reject unknown fields
Use type hints Optional[FilePath], PositiveInt, SecretStr for Pydantic configuration models

Files:

  • src/models/config.py
🧠 Learnings (19)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code

Applied to files:

  • tests/e2e/features/info.feature
  • src/app/main.py
  • src/metrics/utils.py
  • src/app/endpoints/models.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/health.py
  • src/app/endpoints/query.py
  • src/app/endpoints/info.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/conversations.py
  • src/app/endpoints/shields.py
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • examples/lightspeed-stack-rh-identity.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/lightspeed-stack-library-mode.yaml
  • src/authentication/README.md
  • tests/e2e/features/authorized_noop_token.feature
  • tests/e2e/configuration/lightspeed-stack-server-mode.yaml
  • docs/auth.md
  • tests/e2e/configuration/lightspeed-stack-no-cache.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/features/authorized_noop_token.feature
  • tests/e2e/configuration/lightspeed-stack-no-cache.yaml
📚 Learning: 2025-08-19T08:57:27.714Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 417
File: src/lightspeed_stack.py:60-63
Timestamp: 2025-08-19T08:57:27.714Z
Learning: In the lightspeed-stack project, file permission hardening (chmod 0o600) for stored configuration JSON files is not required as it's not considered a security concern in their deployment environment.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling

Applied to files:

  • src/app/main.py
  • src/authentication/utils.py
  • src/authorization/middleware.py
  • src/app/endpoints/models.py
  • src/utils/endpoints.py
  • src/app/endpoints/feedback.py
  • src/utils/quota.py
  • src/app/endpoints/info.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/conversations.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Run `uv run make verify` to run all linters (black, pylint, pyright, ruff, docstyle, check-types) before completion

Applied to files:

  • Makefile
  • CONTRIBUTING.md
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/e2e/test_list.txt : Maintain test list in `tests/e2e/test_list.txt` for end-to-end tests

Applied to files:

  • Makefile
  • .github/workflows/e2e_tests.yaml
  • tests/e2e/configuration/lightspeed-stack-library-mode.yaml
  • tests/e2e/features/health.feature
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/e2e/features/**/*.feature : Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Applied to files:

  • Makefile
  • .github/workflows/e2e_tests.yaml
  • tests/e2e/features/environment.py
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.

Applied to files:

  • docker-compose-library.yaml
  • README.md
  • docs/demos/lcore/lcore.md
📚 Learning: 2025-07-23T14:26:40.340Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 278
File: src/app/endpoints/feedback.py:113-118
Timestamp: 2025-07-23T14:26:40.340Z
Learning: In the lightspeed-stack project after PR #278, the `UserDataCollection.feedback_storage` property dynamically constructs paths from `user_data_dir` and will always return a valid string, making fallback logic like `or Path("")` redundant.

Applied to files:

  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Always check `pyproject.toml` for existing dependencies and versions before adding new ones

Applied to files:

  • pyproject.toml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Use `uv sync --group dev --group llslibdev` to install development dependencies

Applied to files:

  • pyproject.toml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/{unit,integration}/**/*.py : Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Applied to files:

  • CONTRIBUTING.md
📚 Learning: 2025-08-18T11:45:59.961Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:77-77
Timestamp: 2025-08-18T11:45:59.961Z
Learning: torch==2.7.1 is available on PyPI and is a valid version that can be used in dependency specifications.

Applied to files:

  • requirements.torch.txt
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/**/*.py : Use `field_validator` and `model_validator` for custom validation in Pydantic models

Applied to files:

  • src/models/requests.py
📚 Learning: 2025-08-06T06:02:21.060Z
Learnt from: eranco74
Repo: lightspeed-core/lightspeed-stack PR: 348
File: src/utils/endpoints.py:91-94
Timestamp: 2025-08-06T06:02:21.060Z
Learning: The direct assignment to `agent._agent_id` in `src/utils/endpoints.py` is a necessary workaround for the missing agent rehydration feature in the LLS client SDK. This allows preserving conversation IDs when handling existing agents.

Applied to files:

  • src/app/endpoints/conversations.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/config.py : All configuration must use Pydantic models extending `ConfigurationBase` with `extra="forbid"` to reject unknown fields

Applied to files:

  • docs/config.puml
  • src/models/config.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/config.py : Use type hints `Optional[FilePath]`, `PositiveInt`, `SecretStr` for Pydantic configuration models

Applied to files:

  • docs/config.puml
  • src/models/config.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/**/*.py : Pydantic configuration classes must extend `ConfigurationBase`; data models must extend `BaseModel`

Applied to files:

  • docs/config.puml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

Comment on lines 145 to 160
### API Key Token (`api-key-token`)

Authentication that checks a given API Key token is present as a Bearer token

**Configuration:**
```yaml
module: "api-key-token"
api_key_config:
api_key: "some-api-key"
```

**Behavior:**
- Extracts bearer token from the `Authorization` header
- Same user ID and username handling as `noop`
- Token is passed through and validated against the API Key given from configuration, for downstream use

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix API key token YAML example to include the authentication root

The new API Key Token section’s config example omits the authentication: root key, unlike the other module examples. As written, the snippet isn’t a complete, copy-pasteable config fragment.

Consider updating it to:

authentication:
  module: "api-key-token"
  api_key_config:
    api_key: "some-api-key"

This keeps it consistent with the other documented modules and with the actual configuration shape.

🤖 Prompt for AI Agents
In docs/auth.md around lines 145 to 160, the YAML example for the API Key Token
module is missing the top-level authentication: key so it isn’t a complete,
copy-pasteable config fragment; update the snippet to include the
authentication: root and nest module and api_key_config under it so the example
matches the configuration shape used elsewhere in the docs.

Comment on lines 333 to 334
* It's own REST API (stable, standard)
* Llama Stack as backed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix grammatical error.

Line 334 uses "It's own" which should be "Its own" (possessive form).

-* It's own REST API (stable, standard)
+* Its own REST API (stable, standard)
🤖 Prompt for AI Agents
In docs/demos/lcore/lcore.md around lines 333 to 334, fix the grammatical error
by changing "It's own REST API (stable, standard)" to the correct possessive
form "Its own REST API (stable, standard)"; update that line only so it uses
"Its" (no apostrophe).

---
### Lightspeed Evaluation Framework

<font size="10">[https://github.com/lightspeed-core/lightspeed-evaluation/](https://github.com/lightspeed-core/lightspeed-evaluation)</font>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if the GitHub repository exists and is accessible
url="https://github.com/lightspeed-core/lightspeed-evaluation/"

echo "=== GitHub Repository Verification ==="
echo "URL: $url"
echo

# Try to access the repository
echo "Attempting to access repository..."
status=$(curl -s -o /dev/null -w "%{http_code}" --head "$url" 2>&1)
echo "HTTP Status: $status"

# Also try with -L to follow redirects
status_redirect=$(curl -s -o /dev/null -w "%{http_code}" -L --head "$url" 2>&1)
echo "HTTP Status (with redirects): $status_redirect"

# Validate URL format
if [[ "$url" =~ ^https://github\.com/[^/]+/[^/]+/?$ ]]; then
  echo "URL Format: ✓ Valid GitHub URL format"
else
  echo "URL Format: ✗ Invalid GitHub URL format"
fi

Repository: lightspeed-core/lightspeed-stack

Length of output: 311


🌐 Web query:

GitHub repository lightspeed-core/lightspeed-evaluation

💡 Result:

I couldn't find a GitHub repository named lightspeed-core/lightspeed-evaluation. Closely related projects I found:

  • lightspeed-core/lightspeed-stack (LLM tooling stack). [1]
  • openshift/lightspeed-service (OpenShift Lightspeed assistant). [2]
  • An OpenHub entry for a lightspeed-evaluation project (summary/extraction, not the GitHub repo). [3]

If you want, tell me whether you mean a different org/name or I can search for a fork/alternate name and look inside those repos.

Sources:
[1] https://github.com/lightspeed-core/lightspeed-stack
[2] https://github.com/openshift/lightspeed-service
[3] https://openhub.net/p/lightspeed-evaluation


The GitHub link references a non-existent repository.

The repository at https://github.com/lightspeed-core/lightspeed-evaluation/ does not exist. Did you mean https://github.com/lightspeed-core/lightspeed-stack? Update the link to point to the correct, existing repository or remove it if it's no longer relevant.

🤖 Prompt for AI Agents
In docs/demos/lcore/lcore.md around line 401, the GitHub URL points to a
non-existent repo (https://github.com/lightspeed-core/lightspeed-evaluation/);
update the link to the correct repository (e.g.,
https://github.com/lightspeed-core/lightspeed-stack) or remove the link if it’s
no longer relevant — replace the href and visible text accordingly and verify
the new URL resolves.

Comment on lines 88 to 91
* [Sequence diagrams](#sequence-diagrams)
* [Query endpoint REST API handler](#query-endpoint-rest-api-handler)
* [Streaming query endpoint REST API handler](#streaming-query-endpoint-rest-api-handler)
* [Versioning](#versioning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Table of contents heading level inconsistency.

"Streaming query endpoint REST API handler" (line 90) appears at the same level as "Query endpoint REST API handler" (line 89), but in the actual content (lines 967-969), it's rendered as a level-2 heading (##) rather than level-3 (###), breaking the hierarchy.

In the document body, change the streaming query section heading to be consistent:

-## Streaming query endpoint REST API handler
+### Streaming query endpoint REST API handler
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* [Sequence diagrams](#sequence-diagrams)
* [Query endpoint REST API handler](#query-endpoint-rest-api-handler)
* [Streaming query endpoint REST API handler](#streaming-query-endpoint-rest-api-handler)
* [Versioning](#versioning)
### Streaming query endpoint REST API handler
🤖 Prompt for AI Agents
In README.md around lines 88 to 91, the table of contents lists "Streaming query
endpoint REST API handler" at the same toc level as the nested "Query endpoint
REST API handler", but the actual section (lines ~967-969) is written as a
level-2 heading (##) instead of a level-3 (###), breaking hierarchy; update the
streaming query section heading in the document body from a level-2 heading to a
level-3 heading so it matches the TOC nesting and restores consistent heading
levels.

Comment on lines +162 to +161
solr: Optional[dict[str, Any]] = Field(
None,
description="Solr-specific query parameters including filter queries",
examples=[
{"fq": {"product:*openshift*", "product_version:*4.16*"}},
],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the example: JSON does not support Python sets.

The example uses a Python set literal {"product:*openshift*", "product_version:*4.16*"} which is not valid JSON. JSON only supports arrays for collections. This will cause confusion in the OpenAPI documentation.

     solr: Optional[dict[str, Any]] = Field(
         None,
         description="Solr-specific query parameters including filter queries",
         examples=[
-            {"fq": {"product:*openshift*", "product_version:*4.16*"}},
+            {"fq": ["product:*openshift*", "product_version:*4.16*"]},
         ],
     )
🤖 Prompt for AI Agents
In src/models/requests.py around lines 162-168, the solr example uses a Python
set literal ({"product:*openshift*", "product_version:*4.16*"}) which is invalid
JSON; replace the set with a JSON array or other valid JSON structure (for
example use "fq": ["product:*openshift*", "product_version:*4.16*"]) so the
OpenAPI examples are valid JSON.

Comment on lines 764 to 714
model_config = {
"json_schema_extra": {
"examples": [
{
"conversation_id": "123e4567-e89b-12d3-a456-426614174000",
"success": True,
"response": "Conversation deleted successfully",
}
"label": "deleted",
"value": {
"conversation_id": "123e4567-e89b-12d3-a456-426614174000",
"success": True,
"response": "Conversation deleted successfully",
},
},
{
"label": "not found",
"value": {
"conversation_id": "123e4567-e89b-12d3-a456-426614174000",
"success": True,
"response": "Conversation can not be deleted",
},
},
]
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Example inconsistency and typo in "not found" case.

The "not found" example (lines 776-782) shows success: True which would be incorrect after fixing the success field. Also, the message uses "can not be deleted" (line 780) while the code uses "cannot be deleted" (line 756).

After fixing the success field, update the example:

                 {
                     "label": "not found",
                     "value": {
                         "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
-                        "success": True,
-                        "response": "Conversation can not be deleted",
+                        "success": False,
+                        "response": "Conversation cannot be deleted",
                     },
                 },
🤖 Prompt for AI Agents
In src/models/responses.py around lines 764-785, the "not found" example is
inconsistent: it sets "success": True and uses "can not be deleted" while the
code expects failure and the phrasing "cannot be deleted". Update that example
to set "success": False and change the response string to "Conversation cannot
be deleted" (matching the code), keeping the rest of the example structure
intact.

Comment on lines 6 to 11
from log import get_logger
from models.responses import InternalServerErrorResponse, QuotaExceededResponse
from quota.quota_exceed_error import QuotaExceedError
from quota.quota_limiter import QuotaLimiter

logger = get_logger(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for get_logger definition
echo "=== Definition of get_logger ==="
rg -n "def get_logger" --type py -A 5

echo ""
echo "=== Import location of get_logger ==="
rg -n "^from log import get_logger" --type py

echo ""
echo "=== Location of log module ==="
fd -t f "log\.py$" | head -20

echo ""
echo "=== get_logger implementation in log module ==="
if [ -f "src/log.py" ]; then
  cat -n "src/log.py"
fi

Repository: lightspeed-core/lightspeed-stack

Length of output: 311


🌐 Web query:

get_logger python logging pattern

💡 Result:

Recommended pattern — concise:

  • In application modules: create a module-level logger with logging.getLogger(name) and configure handlers/formatters in the application entry point only.
  • In libraries: create module-level loggers the same way but do NOT configure handlers; install a NullHandler so library logging won’t emit by default.

Minimal examples:

  • Application (configure once, e.g. main):
    import logging

def main():
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s:%(name)s: %(message)s")
logger = logging.getLogger(name)
logger.info("started")

if name == "main":
main()

  • Library module (no configuration; add NullHandler):
    import logging

logger = logging.getLogger(name)
logger.addHandler(logging.NullHandler())

def do_work():
logger.debug("doing work")

Why:

  • logging.getLogger(name) gives hierarchical, module-level loggers that inherit configuration from the application/root logger. Multiple calls with the same name return the same Logger. Configure logging centrally (root logger or specific named loggers). [1]
  • The Logging Cookbook documents patterns for multi-module apps and shows using NullHandler in libraries to avoid forcing handlers on application users. [2]

Sources:
[1] Python logging module docs — usage and getLogger recommendations.
[2] Python Logging Cookbook — using logging across modules and NullHandler.


Use standard logging pattern for consistency with coding guidelines.

Line 11 uses logger = get_logger(__name__) instead of the required logger = logging.getLogger(__name__) pattern. Replace the import with import logging and use logging.getLogger(__name__) to align with coding guidelines and standard Python logging practices.

🤖 Prompt for AI Agents
In src/utils/quota.py around lines 6 to 11, the module uses a custom get_logger
import and assigns logger = get_logger(__name__); replace the custom import with
a standard import logging and change the logger assignment to use
logging.getLogger(__name__) so the file follows the project’s standard logging
pattern and Python logging best practices.

Then The status code of the response is 200
When I use REST API conversation endpoint with conversation_id from above using HTTP GET method
Then The status code of the response is 200
And The conversation history contains 1 messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix grammatical error in assertion message.

The assertion uses "contains 1 messages" which is grammatically incorrect. It should be "contains 1 message" (singular).

Apply this diff to correct the grammar:

-     And The conversation history contains 1 messages
+     And The conversation history contains 1 message
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
And The conversation history contains 1 messages
And The conversation history contains 1 message
🤖 Prompt for AI Agents
In tests/e2e/features/conversation_cache_v2.feature around line 207, the
assertion text reads "And The conversation history contains 1 messages" which is
grammatically incorrect; update the step text to "And The conversation history
contains 1 message" (singular) so the assertion reads correctly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (16)
tests/integration/endpoints/test_query_v2_integration.py (2)

357-375: Resolve merge conflict in mock_tool_output.results structure.

Unresolved merge conflict markers prevent the file from being parsed. The conflict shows divergent mock structures for tool search results:

  • HEAD: Inline list of mocks with file_id, filename, and score attributes only
  • upstream/main: Separate mock_result object with additional attributes dict containing doc_url and link fields

Choose the structure that matches the actual Llama Stack API response format. The upstream/main version appears more complete with URL metadata.

-<<<<<<< HEAD
-    mock_tool_output.results = [
-        mocker.MagicMock(
-            file_id="doc-1",
-            filename="ansible-docs.txt",
-            score=0.95,
-        )
-    ]
-=======
-    mock_result = mocker.MagicMock()
-    mock_result.file_id = "doc-1"
-    mock_result.filename = "ansible-docs.txt"
-    mock_result.score = 0.95
-    mock_result.attributes = {
-        "doc_url": "https://example.com/ansible-docs.txt",
-        "link": "https://example.com/ansible-docs.txt",
-    }
-    mock_tool_output.results = [mock_result]
->>>>>>> upstream/main
+    mock_result = mocker.MagicMock()
+    mock_result.file_id = "doc-1"
+    mock_result.filename = "ansible-docs.txt"
+    mock_result.score = 0.95
+    mock_result.attributes = {
+        "doc_url": "https://example.com/ansible-docs.txt",
+        "link": "https://example.com/ansible-docs.txt",
+    }
+    mock_tool_output.results = [mock_result]

1244-1275: Resolve merge conflict in transcript behavior test.

Unresolved merge conflict markers prevent the file from being parsed. The conflict affects the function signature and transcript mocking:

  • HEAD: No mocker parameter, no mocking of store_transcript (may create actual files during tests)
  • upstream/main: Includes mocker parameter and mocks store_transcript to prevent file creation

The upstream/main approach is preferred for integration tests, as it prevents unwanted side effects (actual file I/O) while still verifying the integration logic.

 async def test_query_v2_endpoint_transcript_behavior(
     test_config: AppConfig,
     mock_llama_stack_client: AsyncMockType,
     test_request: Request,
     test_auth: AuthTuple,
     patch_db_session: Session,
-<<<<<<< HEAD
-=======
+    mocker: MockerFixture,
->>>>>>> upstream/main
 ) -> None:
     """Test transcript storage behavior based on configuration.
 
     This integration test verifies:
     - Endpoint succeeds with transcripts enabled
     - Endpoint succeeds with transcripts disabled
     - Conversation is persisted regardless of transcript setting
     - Integration between query handler and transcript configuration
 
     Args:
         test_config: Test configuration
         mock_llama_stack_client: Mocked Llama Stack client
         test_request: FastAPI request
         test_auth: noop authentication tuple
         patch_db_session: Test database session
-<<<<<<< HEAD
-    """
-    _ = mock_llama_stack_client
-
-=======
+        mocker: pytest-mock fixture
+    """
+    _ = mock_llama_stack_client
+
+    # Mock store_transcript to prevent file creation
+    mocker.patch("app.endpoints.query.store_transcript")
+
->>>>>>> upstream/main
     test_config.user_data_collection_configuration.transcripts_enabled = True
tests/e2e/features/environment.py (1)

92-112: Harden NoCacheConfig config swapping and restore logic

The new immediate switch/restart for NoCacheConfig in before_scenario is fine, but the restore path in after_scenario has a couple of pitfalls:

  • after_scenario always restores using context.feature_config for NoCacheConfig, but context.feature_config is only set in before_feature for Authorized features. If a NoCacheConfig scenario ever appears in a non‑Authorized feature, this will raise an AttributeError and leave lightspeed-stack.yaml in the no‑cache state.
  • Unlike the feature‑level Authorized path, there is no dedicated backup of lightspeed-stack.yaml when you overwrite it for NoCacheConfig, so the restore logic depends on knowing the original config path instead of having a real snapshot of the active config.

You can make this path robust and self‑contained by introducing a per‑scenario backup for NoCacheConfig and cleaning it up afterwards, e.g.:

@@ def before_scenario(context: Context, scenario: Scenario) -> None:
-    if "InvalidFeedbackStorageConfig" in scenario.effective_tags:
-        context.scenario_config = f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-invalid-feedback-storage.yaml"
-    if "NoCacheConfig" in scenario.effective_tags:
-        context.scenario_config = (
-            f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-no-cache.yaml"
-        )
-        # Switch config and restart immediately
-        switch_config(
-            context.scenario_config
-        )  # Copies to default lightspeed-stack.yaml
-        restart_container("lightspeed-stack")
+    if "InvalidFeedbackStorageConfig" in scenario.effective_tags:
+        context.scenario_config = (
+            f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-invalid-feedback-storage.yaml"
+        )
+    if "NoCacheConfig" in scenario.effective_tags:
+        context.scenario_config = (
+            f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-no-cache.yaml"
+        )
+        # Backup current config, switch to no-cache, and restart immediately
+        context.scenario_config_backup = create_config_backup("lightspeed-stack.yaml")
+        switch_config(context.scenario_config)  # Copies to default lightspeed-stack.yaml
+        restart_container("lightspeed-stack")
@@ def after_scenario(context: Context, scenario: Scenario) -> None:
-    if "InvalidFeedbackStorageConfig" in scenario.effective_tags:
-        switch_config(context.feature_config)
-        restart_container("lightspeed-stack")
-    if "NoCacheConfig" in scenario.effective_tags:
-        switch_config(context.feature_config)
-        restart_container("lightspeed-stack")
+    if "InvalidFeedbackStorageConfig" in scenario.effective_tags:
+        switch_config(context.feature_config)
+        restart_container("lightspeed-stack")
+    if "NoCacheConfig" in scenario.effective_tags:
+        # Restore the backed-up config for this scenario if available
+        if hasattr(context, "scenario_config_backup"):
+            switch_config(context.scenario_config_backup)
+            restart_container("lightspeed-stack")
+            remove_config_backup(context.scenario_config_backup)
+            delattr(context, "scenario_config_backup")

This removes the dependency on context.feature_config for NoCacheConfig restores, avoids AttributeError when the tag is used outside Authorized features, and ensures per‑scenario state is properly cleaned up so behavior doesn’t leak across scenarios. Based on learnings, Behave’s Context persists across scenarios, so this cleanup is important for test isolation.

tests/unit/app/endpoints/test_rags.py (1)

192-206: Resolve merge conflict in RagInfo mock class.

The RagInfo.__init__ method contains unresolved merge conflict markers, causing multiple pipeline failures:

  • IndentationError (syntax error)
  • Ruff, Black, and Pylint parsing failures

Remove the conflict markers and decide whether to include the verbose docstring from upstream/main or keep the minimal version from HEAD. Ensure the resulting code is syntactically valid.

requirements.aarch64.txt (4)

1-4197: Unresolved merge conflicts in requirements.aarch64.txt—regenerate via uv from pyproject.toml.

This file contains unresolved conflict markers (<<<<<<< HEAD, =======, >>>>>>>). Any marker will break pip/uv and CI. Do not hand-edit. Fix all conflicts in pyproject.toml, then regenerate all requirements-*.txt files using uv to ensure consistency across architectures.

The file includes conflicting pins for faiss-cpu, fastapi, greenlet, kubernetes, litellm, llama-stack, llama-stack-client, mcp, openai, opentelemetry packages (api, sdk, instrumentation, exporter variants, proto), protobuf, and urllib3. Additionally, python-jose (and ecdsa) appear only in the HEAD section and are absent from upstream/main—since authlib is present in both branches, python-jose is likely redundant and should be removed from pyproject.toml before regeneration.


2115-2213: Resolve the OpenTelemetry version conflict by selecting one coherent version set.

The requirements file contains an unresolved merge conflict with split OpenTelemetry versions: HEAD uses 1.38.0 family + instrumentation 0.59b0, while upstream/main uses 1.39.0 family + instrumentation 0.60b0. Select one coherent set (e.g., 1.39.0 family + 0.60b0 instrumentation) and regenerate the requirements file to ensure all opentelemetry* packages maintain version coherence throughout requirements.aarch64.txt and related requirement files.


654-696: Resolve the merge conflict in requirements.aarch64.txt and regenerate with uv.

The file contains unresolved conflict markers for faiss-cpu and fastapi. Accept the upstream/main versions (faiss-cpu==1.13.1 and fastapi==0.124.0), which match pyproject.toml. Then regenerate:

uv pip compile pyproject.toml -o requirements.aarch64.txt --generate-hashes --group llslibdev --python-platform aarch64-unknown-linux-gnu --torch-backend cpu --python-version 3.12

faiss-cpu is pinned at top-level to maintain consistent control across dependency graphs; keep it uniform across architecture files.


1000-1118: Resolve merge conflict: greenlet requires a single pinned version.

Remove the unresolved merge markers and choose one greenlet version (3.2.4 or 3.3.0). Both are compatible with SQLAlchemy on Python 3.12; 3.3.0 is newer with improved wheel packaging. After resolving the conflict, regenerate the requirements.aarch64.txt with uv to ensure consistency across all requirements files.

requirements.x86_64.txt (4)

654-696: Pick one: faiss-cpu 1.13.0 vs 1.13.1; keep policy pin.

Choose a single version (likely 1.13.1) and ensure it’s pinned in pyproject before compiling, then propagate into all platform requirement files. Based on learnings, keep faiss-cpu pinned at the top-level.

Based on learnings, ...


2115-2212: Resolve the OpenTelemetry package version conflict in requirements.x86_64.txt.

Git merge conflict markers show HEAD pinned to 1.38.x/0.59b0 while upstream/main uses 1.39.x/0.60b0. All opentelemetry packages (api, exporter-otlp, exporter-otlp-proto-*, instrumentation, proto, sdk, semantic-conventions) must be aligned to the same minor version. Select one version set, remove the conflict markers, update all affected packages consistently, and recompile to avoid compatibility issues.


1508-1531: litellm / llama-stack[‑client] bumps: verify API compatibility.

Upgrading to llama-stack 0.3.0 introduces breaking changes: tools parameter schema changed from "parameters" to { "input_schema", "output_schema" }, API routes reorganized (v1/, v1alpha/, v1beta/), and some endpoints removed/unpublished. If these versions are merged, code must be updated for tool definitions and API endpoint usage; otherwise, keep existing pins (1.80.7, 0.2.22) and regenerate.


673-696: FastAPI and Starlette versions are incompatible.

FastAPI 0.124.0 requires Starlette >=0.40.0 and <0.49.0. Starlette 0.50.0 exceeds the upper bound and will not work with FastAPI 0.124.0. Either downgrade Starlette to <0.49.0 or select a different FastAPI version. Resolve the merge conflict and pin compatible versions in pyproject.toml.

Also applies to: 3607-3613

tests/unit/models/responses/test_error_responses.py (1)

9-36: Resolve merge conflicts and adopt PromptTooLongResponse tests

This test module currently contains unresolved merge conflict markers (<<<<<<< HEAD, =======, >>>>>>> upstream/main), which breaks parsing and unit tests.

You should:

  • Keep the upstream/main side that introduces PROMPT_TOO_LONG_DESCRIPTION and PromptTooLongResponse in the imports.
  • Drop the conflict markers and ensure the new TestPromptTooLongResponse class is present once.

For example:

-from models.responses import (
-    BAD_REQUEST_DESCRIPTION,
-    FORBIDDEN_DESCRIPTION,
-    INTERNAL_SERVER_ERROR_DESCRIPTION,
-    NOT_FOUND_DESCRIPTION,
-<<<<<<< HEAD
-=======
-    PROMPT_TOO_LONG_DESCRIPTION,
->>>>>>> upstream/main
-    QUOTA_EXCEEDED_DESCRIPTION,
+from models.responses import (
+    BAD_REQUEST_DESCRIPTION,
+    FORBIDDEN_DESCRIPTION,
+    INTERNAL_SERVER_ERROR_DESCRIPTION,
+    NOT_FOUND_DESCRIPTION,
+    PROMPT_TOO_LONG_DESCRIPTION,
+    QUOTA_EXCEEDED_DESCRIPTION,
@@
-    NotFoundResponse,
-<<<<<<< HEAD
-=======
-    PromptTooLongResponse,
->>>>>>> upstream/main
+    NotFoundResponse,
+    PromptTooLongResponse,
@@
-<<<<<<< HEAD
-=======
-class TestPromptTooLongResponse:
-    """Test cases for PromptTooLongResponse."""
-
-    def test_constructor_with_default_response(self) -> None:
-        """Test PromptTooLongResponse with default response."""
-        response = PromptTooLongResponse(
-            cause="The prompt exceeds the maximum allowed length."
-        )
-        assert isinstance(response, AbstractErrorResponse)
-        assert response.status_code == status.HTTP_413_REQUEST_ENTITY_TOO_LARGE
-        assert isinstance(response.detail, DetailModel)
-        assert response.detail.response == "Prompt is too long"
-        assert response.detail.cause == "The prompt exceeds the maximum allowed length."
-
-    def test_openapi_response(self) -> None:
-        """Test PromptTooLongResponse.openapi_response() method."""
-        schema = PromptTooLongResponse.model_json_schema()
-        model_examples = schema.get("examples", [])
-        expected_count = len(model_examples)
-
-        result = PromptTooLongResponse.openapi_response()
-        assert result["description"] == PROMPT_TOO_LONG_DESCRIPTION
-        assert result["model"] == PromptTooLongResponse
-        assert "examples" in result["content"]["application/json"]
-        examples = result["content"]["application/json"]["examples"]
-
-        # Verify example count matches schema examples count
-        assert len(examples) == expected_count
-        assert expected_count == 1
-
-        # Verify example structure
-        assert "prompt too long" in examples
-        prompt_example = examples["prompt too long"]
-        assert "value" in prompt_example
-        assert "detail" in prompt_example["value"]
-        assert prompt_example["value"]["detail"]["response"] == "Prompt is too long"
-        assert (
-            prompt_example["value"]["detail"]["cause"]
-            == "The prompt exceeds the maximum allowed length."
-        )
-
-    def test_openapi_response_with_explicit_examples(self) -> None:
-        """Test PromptTooLongResponse.openapi_response() with explicit examples."""
-        result = PromptTooLongResponse.openapi_response(examples=["prompt too long"])
-        examples = result["content"]["application/json"]["examples"]
-
-        # Verify only 1 example is returned when explicitly specified
-        assert len(examples) == 1
-        assert "prompt too long" in examples
-
-
->>>>>>> upstream/main
+class TestPromptTooLongResponse:
+    """Test cases for PromptTooLongResponse."""
+
+    def test_constructor_with_default_response(self) -> None:
+        """Test PromptTooLongResponse with default response."""
+        response = PromptTooLongResponse(
+            cause="The prompt exceeds the maximum allowed length."
+        )
+        assert isinstance(response, AbstractErrorResponse)
+        assert response.status_code == status.HTTP_413_REQUEST_ENTITY_TOO_LARGE
+        assert isinstance(response.detail, DetailModel)
+        assert response.detail.response == "Prompt is too long"
+        assert response.detail.cause == "The prompt exceeds the maximum allowed length."
+
+    def test_openapi_response(self) -> None:
+        """Test PromptTooLongResponse.openapi_response() method."""
+        schema = PromptTooLongResponse.model_json_schema()
+        model_examples = schema.get("examples", [])
+        expected_count = len(model_examples)
+
+        result = PromptTooLongResponse.openapi_response()
+        assert result["description"] == PROMPT_TOO_LONG_DESCRIPTION
+        assert result["model"] == PromptTooLongResponse
+        assert "examples" in result["content"]["application/json"]
+        examples = result["content"]["application/json"]["examples"]
+
+        # Verify example count matches schema examples count
+        assert len(examples) == expected_count
+        assert expected_count == 1
+
+        # Verify example structure
+        assert "prompt too long" in examples
+        prompt_example = examples["prompt too long"]
+        assert "value" in prompt_example
+        assert "detail" in prompt_example["value"]
+        assert prompt_example["value"]["detail"]["response"] == "Prompt is too long"
+        assert (
+            prompt_example["value"]["detail"]["cause"]
+            == "The prompt exceeds the maximum allowed length."
+        )
+
+    def test_openapi_response_with_explicit_examples(self) -> None:
+        """Test PromptTooLongResponse.openapi_response() with explicit examples."""
+        result = PromptTooLongResponse.openapi_response(examples=["prompt too long"])
+        examples = result["content"]["application/json"]["examples"]
+
+        # Verify only 1 example is returned when explicitly specified
+        assert len(examples) == 1
+        assert "prompt too long" in examples

After this, the SyntaxError and Ruff “merge conflict markers detected” should go away.

Also applies to: 666-719

docs/config.html (1)

1-21: Resolve merge conflicts in generated config HTML (and regenerate from Markdown)

docs/config.html still contains merge conflict markers across multiple sections (<<<<<<< HEAD, =======, >>>>>>> upstream/main). This makes the file invalid HTML and breaks tooling (Ruff/Black are already complaining).

Given this file is clearly generated from docs/config.md, the safest approach is:

  • First, resolve the conflicts in docs/config.md (see separate comment) so that it contains a single, correct version of the configuration schema, including the new APIKeyTokenConfiguration section and api_key_config field in AuthenticationConfiguration.
  • Then, regenerate docs/config.html from the Markdown (e.g., via your existing pandoc or docs build pipeline), rather than hand‑editing the HTML.
  • Ensure the regenerated HTML no longer includes any conflict markers and that the tables (e.g., AuthenticationConfiguration) mention api_key_config alongside jwk_config and rh_identity_config.

Until these conflicts are resolved and the HTML is regenerated, this doc will remain broken.

Also applies to: 191-268, 443-721, 1009-1266, 1570-1815, 2275-2452

docs/config.md (1)

10-23: Resolve Markdown merge conflicts and keep the API key schema updates

docs/config.md also has unresolved merge conflict markers, which breaks the Markdown and the HTML generation:

  • At the top around APIKeyTokenConfiguration (Lines ~10‑23).
  • In AuthenticationConfiguration where api_key_config is conditionally present (Lines ~59‑63).
  • In narrative text for ModelContextProtocolServer, PostgreSQLDatabaseConfiguration, and ServiceConfiguration.

To fix:

  • Remove all <<<<<<< HEAD / ======= / >>>>>>> upstream/main markers.
  • Keep the upstream/main additions that introduce:
    • The ## APIKeyTokenConfiguration section with its api_key field table.
    • The api_key_config row in the AuthenticationConfiguration table.
    • The reflowed one‑line paragraphs (they are equivalent content, just wrapped differently).

Conceptually, the cleaned sections should look like:

## APIKeyTokenConfiguration

API Key Token configuration.

| Field   | Type   | Description |
|--------|--------|-------------|
| api_key | string |             |

...

| Field          | Type   | Description |
|----------------|--------|-------------|
| module         | string |             |
| skip_tls_verification | boolean |    |
| k8s_cluster_api       | string  |    |
| k8s_ca_cert_path      | string  |    |
| jwk_config            |        |     |
| api_key_config        |        |     |
| rh_identity_config    |        |     |

Once these conflicts are resolved, re‑run your docs build to regenerate docs/config.html and clear the markdownlint errors.

Also applies to: 59-63, 318-331, 351-359, 473-482

tests/unit/app/endpoints/test_streaming_query_v2.py (1)

1-637: Critical: Resolve all merge conflicts before proceeding.

This file contains unresolved merge conflicts throughout (lines 1-5, 12-19, 50-56, 89-95, 148-152, 184-198, 220-224, 240-244, 263-289, 309-315, 348-354, 405-409, 501-505, 557-637), causing all pipeline checks to fail:

  • Unit tests: SyntaxError: invalid syntax
  • Ruff/Black/Pylint: Cannot parse due to conflict markers

Key decisions to make when resolving:

  1. Imports (lines 12-19): Choose between HTTPException (HEAD) or RateLimitError (upstream) — likely need both if keeping the quota test.

  2. stream_end_event signature (lines 148-152, 405-409, 501-505): Upstream uses lambda _m, _t, _aq, _rd, _media which aligns with new RAG chunk fields (rag_chunks, referenced_documents). HEAD uses lambda _m, _s, _t, _media. Choose based on the production signature in streaming_query_v2.py.

  3. Conversation mocking (lines 50-56, 89-95, 309-315, 348-354): Upstream adds mock_client.conversations.create mocks — include these if retrieve_response now creates conversations.

  4. Error handling pattern (lines 263-289): HEAD raises HTTPException; upstream returns StreamingResponse with error status code. Align with the production implementation.

  5. Conversation ID expectations (lines 220-224, 240-244): HEAD expects conv-xyz; upstream expects abc123def456. Align with how retrieve_response returns the conversation ID.

  6. Quota test (lines 557-637): Upstream adds a new test for RateLimitError — keep if this error handling exists in production.

♻️ Duplicate comments (5)
src/utils/checks.py (1)

90-91: Fix misleading error message.

The error message references path but the actual check is for parent_path writability, making debugging more difficult.

Apply this diff to clarify the error message:

             if not os.access(parent_path, os.W_OK):
-                raise InvalidConfigurationError(f"{desc} '{path}' is not writable")
+                raise InvalidConfigurationError(
+                    f"{desc} '{path}' cannot be created - parent directory '{parent_path}' is not writable"
+                )

Based on past review comments, this issue was previously flagged but remains unresolved.

src/models/responses.py (1)

749-788: ConversationDeleteResponse.success should reflect deleted and examples must be updated

__init__ still hardcodes success=True (Line 763), so the response claims success even when deleted=False. The “not found” example also shows success: True with a failure message and uses the inconsistent phrase “can not be deleted”.

This is the same issue previously flagged; please wire success to deleted and fix the example payload.

Apply something like:

     def __init__(self, *, deleted: bool, conversation_id: str) -> None:
@@
         response_msg = (
             "Conversation deleted successfully"
             if deleted
             else "Conversation cannot be deleted"
         )
         super().__init__(
             conversation_id=conversation_id,  # type: ignore[call-arg]
-            success=True,  # type: ignore[call-arg]
+            success=deleted,  # type: ignore[call-arg]
             response=response_msg,  # type: ignore[call-arg]
         )
@@
                 {
                     "label": "not found",
                     "value": {
                         "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
-                        "success": True,
-                        "response": "Conversation can not be deleted",
+                        "success": False,
+                        "response": "Conversation cannot be deleted",
                     },
                 },
src/app/endpoints/query.py (2)

978-1005: Fix “attatchment” typos in validation errors

Both error messages use “attatchment” instead of “attachment”, and a previous review called this out. These strings surface in HTTP 422 responses and should be correct and consistent.

-            message = (
-                f"Invalid attatchment type {attachment.attachment_type}: "
+            message = (
+                f"Invalid attachment type {attachment.attachment_type}: "
@@
-            message = (
-                f"Invalid attatchment content type {attachment.content_type}: "
+            message = (
+                f"Invalid attachment content type {attachment.content_type}: "

You may also want to align these messages with the example texts in UnprocessableEntityResponse (“Invalid attribute value”) if you care about strict consistency.


792-797: Avoid logging full vector DB payloads at INFO

Both vector DB queries log the entire query_response at INFO, which can contain sensitive content and is also noisy:

logger.info(f"The query response total payload: {query_response}")

Prefer logging lightweight metadata (e.g., chunk count) at DEBUG instead, as in the diff above. This aligns with the earlier review feedback on the same issue.

Also applies to: 890-897

tests/unit/models/responses/test_successful_responses.py (1)

673-737: Align ConversationDeleteResponse tests with corrected success semantics

Once ConversationDeleteResponse is fixed to set success=deleted, the tests here must be updated accordingly:

  • test_constructor_not_deleted should expect success is False.
  • The “not found” OpenAPI example should assert "success": False and the corrected message "Conversation cannot be deleted".

Proposed changes:

     def test_constructor_not_deleted(self) -> None:
         """Test ConversationDeleteResponse when conversation cannot be deleted."""
         response = ConversationDeleteResponse(deleted=False, conversation_id="conv-123")
-        assert response.success is True
-        assert response.response == "Conversation cannot be deleted"
+        assert response.success is False
+        assert response.response == "Conversation cannot be deleted"
@@
         not_found_example = examples["not found"]
@@
-        assert not_found_example["value"]["success"] is True
-        assert (
-            not_found_example["value"]["response"] == "Conversation can not be deleted"
-        )
+        assert not_found_example["value"]["success"] is False
+        assert (
+            not_found_example["value"]["response"] == "Conversation cannot be deleted"
+        )

This keeps the tests consistent with the corrected model and with the updated examples in ConversationDeleteResponse.model_config.

🧹 Nitpick comments (2)
src/models/responses.py (1)

25-30: Avoid duplicating RAGChunk model across modules

RAGChunk is now defined here and also in src/utils/types.py with the same fields. Maintaining two nearly identical models will drift over time and complicates typing (e.g., TurnSummary.rag_chunks vs QueryResponse.rag_chunks). Consider making a single canonical definition (e.g., in utils.types) and importing/re-exporting it here, or at least aliasing one to the other to keep them in sync.

src/app/endpoints/query.py (1)

879-936: Second vector DB query duplicates work and repeats issues

The post‑agent block re‑lists vector DBs and re‑queries vector_io.query with almost identical parameters, again logging the full payload and re‑importing RAGChunk/ReferencedDocument locally. It also builds RAGChunk instances with content=str(chunk.content) and source from loosely typed attributes.

Even if you decide to keep the second query (e.g., for different params vs. Solr filtering), it should:

  • Reuse the module‑level RAGChunk / ReferencedDocument imports.
  • Avoid logging the full payload at INFO.
  • Normalize content and source types the same way as in the pre‑agent block (e.g., interleaved_content_as_str + str | None guarded from metadata).
  • Consider reusing the initial vector_db_ids instead of calling client.vector_dbs.list() again.

You may also want to reconsider whether two separate vector_io.query calls per request are necessary; a single call whose results are reused for both RAG context injection and final rag_chunks / referenced_documents would simplify the flow and reduce load on the vector backend.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60c3c4f and 02f8f19.

📒 Files selected for processing (18)
  • docs/config.html (12 hunks)
  • docs/config.md (6 hunks)
  • requirements.aarch64.txt (21 hunks)
  • requirements.hermetic.txt (1 hunks)
  • requirements.x86_64.txt (21 hunks)
  • src/app/endpoints/query.py (10 hunks)
  • src/models/requests.py (2 hunks)
  • src/models/responses.py (19 hunks)
  • src/utils/checks.py (1 hunks)
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml (1 hunks)
  • tests/e2e/features/conversation_cache_v2.feature (2 hunks)
  • tests/e2e/features/environment.py (1 hunks)
  • tests/integration/endpoints/test_query_v2_integration.py (10 hunks)
  • tests/unit/app/endpoints/test_rags.py (2 hunks)
  • tests/unit/app/endpoints/test_streaming_query_v2.py (15 hunks)
  • tests/unit/authentication/test_api_key_token.py (3 hunks)
  • tests/unit/models/responses/test_error_responses.py (4 hunks)
  • tests/unit/models/responses/test_successful_responses.py (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/models/requests.py
🧰 Additional context used
📓 Path-based instructions (7)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/e2e/features/environment.py
  • tests/integration/endpoints/test_query_v2_integration.py
  • tests/unit/models/responses/test_successful_responses.py
  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/models/responses/test_error_responses.py
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • tests/unit/app/endpoints/test_rags.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

  • tests/integration/endpoints/test_query_v2_integration.py
  • tests/unit/models/responses/test_successful_responses.py
  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/models/responses/test_error_responses.py
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • tests/unit/app/endpoints/test_rags.py
tests/e2e/features/**/*.feature

📄 CodeRabbit inference engine (CLAUDE.md)

Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Files:

  • tests/e2e/features/conversation_cache_v2.feature
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/utils/checks.py
  • src/app/endpoints/query.py
  • src/models/responses.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/query.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Pydantic configuration classes must extend ConfigurationBase; data models must extend BaseModel

Files:

  • src/models/responses.py
🧠 Learnings (9)
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e/features/environment.py
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/unit/authentication/test_api_key_token.py
📚 Learning: 2025-08-25T09:11:38.701Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 445
File: tests/e2e/features/environment.py:0-0
Timestamp: 2025-08-25T09:11:38.701Z
Learning: In the Behave Python testing framework, the Context object is created once for the entire test run and reused across all features and scenarios. Custom attributes added to the context persist until explicitly cleared, so per-scenario state should be reset in hooks to maintain test isolation.

Applied to files:

  • tests/e2e/features/environment.py
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-08-18T10:57:39.266Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:59-59
Timestamp: 2025-08-18T10:57:39.266Z
Learning: In the lightspeed-stack project, transitive dependencies like faiss-cpu are intentionally pinned as top-level dependencies to maintain better control over the dependency graph and avoid version conflicts when bundling ML/LLM tooling packages.

Applied to files:

  • requirements.aarch64.txt
  • requirements.x86_64.txt
📚 Learning: 2025-08-18T10:58:14.951Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:47-47
Timestamp: 2025-08-18T10:58:14.951Z
Learning: psycopg2-binary is required by some llama-stack providers in the lightspeed-stack project, so it cannot be replaced with psycopg v3 or moved to optional dependencies without breaking llama-stack functionality.

Applied to files:

  • requirements.aarch64.txt
  • requirements.x86_64.txt
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.

Applied to files:

  • requirements.aarch64.txt
  • requirements.x86_64.txt
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Use Python package manager `uv` with `uv run` prefix for all development commands

Applied to files:

  • requirements.hermetic.txt
🧬 Code graph analysis (5)
tests/e2e/features/environment.py (1)
tests/e2e/utils/utils.py (2)
  • switch_config (98-106)
  • restart_container (130-144)
tests/unit/models/responses/test_successful_responses.py (1)
src/models/responses.py (3)
  • RAGChunk (25-30)
  • ToolCall (33-38)
  • ToolsResponse (90-125)
tests/unit/app/endpoints/test_streaming_query_v2.py (2)
src/app/endpoints/streaming_query_v2.py (1)
  • streaming_query_endpoint_handler_v2 (318-345)
src/models/requests.py (1)
  • QueryRequest (73-275)
src/app/endpoints/query.py (2)
src/models/responses.py (4)
  • ToolCall (33-38)
  • model (1525-1529)
  • ReferencedDocument (326-339)
  • RAGChunk (25-30)
src/utils/types.py (2)
  • TurnSummary (135-234)
  • RAGChunk (127-132)
src/models/responses.py (2)
src/utils/types.py (1)
  • RAGChunk (127-132)
src/quota/quota_exceed_error.py (1)
  • QuotaExceedError (6-38)
🪛 GitHub Actions: Black
tests/integration/endpoints/test_query_v2_integration.py

[error] 84-84: Cannot parse for target version Python 3.13: 84:0: <<<<<<< HEAD

tests/unit/models/responses/test_successful_responses.py

[error] 1-1: Cannot parse for target version Python 3.13: 1:0: <<<<<<< HEAD


[warning] Black would reformat this file. Run 'black --write' to fix code style issues.

tests/unit/authentication/test_api_key_token.py

[error] 5-5: Cannot parse for target version Python 3.13: 5:0: <<<<<<< HEAD

tests/unit/models/responses/test_error_responses.py

[error] 14-14: Cannot parse for target version Python 3.13: 14:0: <<<<<<< HEAD


[warning] Black would reformat this file. Run 'black --write' to fix code style issues.

tests/unit/app/endpoints/test_streaming_query_v2.py

[error] 1-1: Cannot parse for target version Python 3.13: 1:0: <<<<<<< HEAD

tests/unit/app/endpoints/test_rags.py

[error] 193-193: Cannot parse for target version Python 3.13: 193:0: <<<<<<< HEAD

src/app/endpoints/query.py

[error] Black would reformat this file. Run 'black --write' to fix code style issues.

src/models/responses.py

[error] 1-1: Black would reformat this file. Run 'black --write' to fix code style issues.

🪛 GitHub Actions: Integration tests
tests/integration/endpoints/test_query_v2_integration.py

[error] 84-84: SyntaxError: invalid syntax due to merge conflict markers (<<<<<<< HEAD)

🪛 GitHub Actions: Pyright
src/app/endpoints/query.py

[error] 753-753: pyright: Operator '+' not supported for 'None' (reportOptionalOperand).


[error] 778-778: pyright: 'vector_db_ids' is possibly unbound (reportPossiblyUnboundVariable).


[error] 816-816: pyright: Argument missing for parameter 'doc_id' (reportCallIssue).


[error] 840-840: pyright: Argument of type 'float | str | List[object] | object | Literal[True]' cannot be assigned to parameter 'url' of type 'AnyStr@urljoin | None' in function 'urljoin'.


[error] 848-848: pyright: 'RAGChunk' is possibly unbound (reportPossiblyUnboundVariable).


[error] 849-849: pyright: Argument of type 'InterleavedContent' cannot be assigned to parameter 'content' of type 'str' in function 'init'.


[error] 850-850: pyright: Argument of type 'str | bool | float | List[object] | object | None' cannot be assigned to parameter 'source' of type 'str | None' in function 'init'.


[error] 875-875: pyright: Argument of type 'list[str | ToolgroupAgentToolGroupWithArgs] | Unknown | list[str] | None' cannot be assigned to parameter 'toolgroups' of type 'List[Toolgroup] | None'.


[error] 937-937: pyright: Argument missing for parameter 'tool_results' (reportCallIssue).

🪛 GitHub Actions: Python linter
tests/integration/endpoints/test_query_v2_integration.py

[error] 84-84: Pylint parsing failed: 'invalid syntax (tests.integration.endpoints.test_query_v2_integration, line 84)'

tests/unit/models/responses/test_successful_responses.py

[error] 56-56: Pylint parsing failed: 'unmatched ')' (tests.unit.models.responses.test_successful_responses, line 56)'

tests/unit/authentication/test_api_key_token.py

[error] 5-5: Pylint parsing failed: 'invalid syntax (tests.unit.authentication.test_api_key_token, line 5)'

tests/unit/models/responses/test_error_responses.py

[error] 14-14: Pylint parsing failed: 'invalid syntax (tests.unit.models.responses.test_error_responses, line 14)'

tests/unit/app/endpoints/test_streaming_query_v2.py

[error] 1-1: Pylint parsing failed: 'invalid syntax (tests.unit.app.endpoints.test_streaming_query_v2, line 1)'

tests/unit/app/endpoints/test_rags.py

[error] 193-193: Pylint parsing failed: 'expected an indented block after function definition on line 192 (tests.unit.app.endpoints.test_rags, line 193)'

🪛 GitHub Actions: Ruff
tests/integration/endpoints/test_query_v2_integration.py

[error] 84-84: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).

tests/unit/models/responses/test_successful_responses.py

[error] 1-100: Merge conflict markers detected in file (<<<<<<< HEAD / ======= / >>>>>>> upstream/main). Resolve merge conflicts to restore valid syntax.

tests/unit/authentication/test_api_key_token.py

[error] 5-5: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).

tests/unit/models/responses/test_error_responses.py

[error] 14-14: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).

tests/unit/app/endpoints/test_streaming_query_v2.py

[error] 1-100: Merge conflict markers detected in file (<<<<<<< HEAD / ======= / >>>>>>> upstream/main). Resolve merge conflicts to restore valid syntax.

tests/unit/app/endpoints/test_rags.py

[error] 192-192: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).

src/app/endpoints/query.py

[error] 14-14: F401: 'pydantic.ValidationError' imported but unused. Remove unused import.


[error] 1-1: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).

🪛 GitHub Actions: Type checks
src/app/endpoints/query.py

[error] 753-753: mypy: Unsupported left operand type for + ("None").


[error] 793-793: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | NotGiven'.


[error] 840-840: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'.

🪛 GitHub Actions: Unit tests
tests/unit/models/responses/test_error_responses.py

[error] 14-14: SyntaxError: invalid syntax due to merge conflict markers (<<<<< HEAD) in test file

tests/unit/app/endpoints/test_streaming_query_v2.py

[error] 1-1: SyntaxError: invalid syntax due to merge conflict markers (<<<<< HEAD) in test file

tests/unit/app/endpoints/test_rags.py

[error] 193-193: IndentationError / SyntaxError: invalid syntax due to merge conflict markers (e.g., '<<<<<<< HEAD') in test_rags.py

🪛 LanguageTool
docs/config.md

[style] ~323-~323: Consider using “inaccessible” to avoid wordiness.
Context: ...figured in the llama-stack run.yaml are not accessible to lightspeed-core agents. ======= MCP ...

(NOT_ABLE_PREMIUM)


[style] ~328-~328: Consider using “inaccessible” to avoid wordiness.
Context: ...figured in the llama-stack run.yaml are not accessible to lightspeed-core agents. >>>>>>> upst...

(NOT_ABLE_PREMIUM)

🪛 markdownlint-cli2 (0.18.1)
docs/config.md

10-10: Heading style
Expected: atx; Actual: setext

(MD003, heading-style)


59-59: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe

(MD055, table-pipe-style)


59-59: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe

(MD055, table-pipe-style)


59-59: Table column count
Expected: 3; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


60-60: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe

(MD055, table-pipe-style)


60-60: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe

(MD055, table-pipe-style)


60-60: Table column count
Expected: 3; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


61-61: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


318-318: Heading style
Expected: atx; Actual: setext

(MD003, heading-style)


351-351: Heading style
Expected: atx; Actual: setext

(MD003, heading-style)


472-472: Heading style
Expected: atx; Actual: setext

(MD003, heading-style)

🪛 YAMLlint (1.37.1)
tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml

[error] 19-19: syntax error: could not find expected ':'

(syntax)

🔇 Additional comments (20)
tests/unit/authentication/test_api_key_token.py (1)

134-139: Verify inconsistent detail format across error cases.

This test expects detail to be a plain string ("Invalid API Key"), while other error tests expect detail to be a dict with a "cause" key. Confirm this asymmetry is intentional in the implementation, or align the assertion style for consistency.

tests/integration/endpoints/test_query_v2_integration.py (1)

398-405: Resolve merge conflicts for tool call field name assertions.

Unresolved merge conflict markers at three locations prevent the file from being parsed:

  • Line 398-405: test_query_v2_endpoint_with_tool_calls
  • Line 469-476: test_query_v2_endpoint_with_mcp_list_tools
  • Line 543-547: test_query_v2_endpoint_with_multiple_tool_types

HEAD uses tool_calls[0].tool_name, while upstream/main uses tool_calls[0].name. Determine the correct field name by checking the ToolCallSummary model definition, then resolve all three conflicts consistently. Also remove the conditional if response.rag_chunks: guard introduced in HEAD (lines 401-402) if it was not in upstream/main.

requirements.aarch64.txt (7)

1234-1236: Clean up conflict markers in requirements.aarch64.txt.

Lines 1234-1236 contain stray conflict markers (<<<<<<, =======, >>>>>>>) around a comment. Regenerate the file with uv after resolving the conflict between upstream and HEAD.


2587-2610: Resolve protobuf version conflict in requirements.aarch64.txt (6.33.1 vs 6.33.2). Pick one version and ensure it's consistent with googleapis-common-protos and opentelemetry-proto dependencies. Update pyproject.toml if needed.


1508-1531: Versions shown are already in lockstep, not diverged.

The snippet shows versions moving together: litellm 1.80.7→1.80.8, llama-stack 0.2.22→0.3.0, and llama-stack-client 0.2.22→0.3.0. All three are synchronized. Verify the actual merged state of requirements.aarch64.txt to confirm this was resolved correctly.

Likely an incorrect or invalid review comment.


3001-3007: Verify python-jose necessity before merging; confirm it's not redundant with existing authlib/pyjwt stack.

The review raises a valid concern about duplicate JWT/cryptography dependencies—repo already has authlib and pyjwt. However, direct verification of python-jose usage in the codebase could not be completed due to technical constraints. Before merging:

  • Search the codebase for from jose import or import jose patterns to confirm if python-jose is actually used
  • If unused, remove python-jose and regenerate lock files
  • If required, document the rationale in code comments or dependency documentation
  • Ensure cryptography dependency pins remain consistent across the stack

Also check lines 639-644, 3525-3527, 3374-3379 for consistency.


1494-1501: Resolve kubernetes version conflict between 33.1.0 and 34.1.0.

This merge conflict requires a decision on which version to use. Consult the kubernetes-client/python CHANGELOG between v33.1.0 and v34.1.0 to identify any breaking changes, then verify compatibility with your kubernetes client usage patterns and regenerate with uv.


2098-2110: Resolve openai version drift (2.8.1 vs 2.9.0) by pinning a single version.

Version 2.9.0 includes streaming fixes (stream closure, response config handling) but maintains API compatibility with 2.8.1; no breaking changes to streaming behavior. However, verify jiter (Rust-based JSON parser dependency) compatibility in your target environments, as it can be problematic in some runtimes. Align the pin in pyproject.toml and recompile with uv.


1830-1837: Resolve the mcp version conflict (1.22.0 vs 1.23.1) in requirements.aarch64.txt.

This file contains a merge conflict. Determine which version is specified in pyproject.toml and update the requirements file accordingly, or manually choose the intended version and regenerate the locked requirements if this file is auto-generated.

requirements.x86_64.txt (5)

2639-2706: Retain psycopg2-binary pin (policy).

psycopg2-binary is correctly pinned; ensure it remains in pyproject and survives re‑compile since some llama‑stack providers require it. Based on learnings, ...


3001-3007: Verify python-jose dependency requirement before resolving merge conflict.

This file shows a merge conflict where HEAD adds python-jose==3.5.0 while upstream removes it. Before resolving, confirm whether python-jose is actually used in the codebase. Search for jose imports and usage (including jose.jwt and related auth functions). If the new auth implementation doesn't depend on it, remove the dependency from both the requirements file and pyproject.toml. If it is required, ensure the pinned version and hashes are intentional.

Note: This concern also applies to the related dependency occurrences at lines 639-644, 2771-2773, 3525-3527, and 3374-3379.


605-607: The review comment addresses a critical issue (unresolved merge conflicts in lockfiles making them unusable), but verification of the specific technical claims requires direct access to the repository code, which is currently unavailable.


1000-1117: Resolve the greenlet merge conflict by choosing version 3.3.0.

Remove the conflicting versions (3.2.4 vs 3.3.0) and standardize on greenlet 3.3.0, which is compatible with SQLAlchemy 2.0.44 and includes wheels and improved support for CPython 3.14. Also applies to lines 3607-3613.


3808-3818: Prefer urllib3==2.6.1; both versions compatible with requests 2.32.5 and kubernetes client.

urllib3 2.6.1 and 2.3.0 are both compatible with requests 2.32.5 (which supports urllib3 >=1.21.1 and <3). The kubernetes Python client imposes no specific urllib3 version constraint and uses the environment's installed version. Accept the upstream version (2.6.1) as it is newer and includes recent fixes. Before merging, verify no other transitive dependencies in the environment pin urllib3<2.6 (e.g., older botocore versions can conflict), though this is unlikely given the upstream choice.

src/models/responses.py (3)

332-339: ReferencedDocument extension looks consistent

The addition of doc_id and widening doc_url to AnyUrl | str | None match how it’s used in query.py (vector DB metadata and Solr docs). No issues from a modeling perspective.


370-452: QueryResponse RAG/tool fields wiring is sound

Adding rag_chunks: list[RAGChunk] and tool_calls: list[ToolCall] | None plus updating the examples keeps the model aligned with the new query pipeline (TurnSummary + tool execution metadata). Defaults ([] / None) also match the usage in query.py.


1102-1143: Abstract error response OpenAPI helpers look correct

The new AbstractErrorResponse.__init__, get_description(), and openapi_response() implementations are consistent with the subclasses and the tests (labelled examples, description-from-class-variable, JSON examples under detail). No functional issues here.

src/app/endpoints/query.py (3)

427-445: Improved database and rate‑limit error handling looks good

Catching SQLAlchemyError with logger.exception and mapping it to InternalServerErrorResponse.database_error(), plus translating RateLimitError into QuotaExceededResponse.model(), is consistent with the error model and the FastAPI HTTPException pattern you’re using elsewhere.


1008-1035: get_rag_toolgroups helper matches Toolgroup typing, but consider simplifying call sites

The helper correctly returns list[Toolgroup] | None, using ToolgroupAgentToolGroupWithArgs for the RAG knowledge search tool. This is compatible with the llama‑stack types.

Given the call sites now need to deal with None and concatenate with MCP toolgroup names, using this helper together with the “extend and or None” pattern (shown in the earlier diff) will avoid None + list issues and make the intent clearer.

No functional issues in the helper itself.


746-769: Fix vector DB / RAG setup: unbound vector_db_ids, type issues, and RAGChunk construction

This block has several correctness and typing issues that CI is flagging:

  • vector_db_ids is only assigned inside the else branch of the no_tools check, but it's used here unconditionally (if vector_db_ids:). When query_request.no_tools is True, this will be empty/undefined, leading to the pyright "possibly unbound" and a potential runtime error.
  • toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups is problematic because get_rag_toolgroups can return None, and you're adding a list of str to a list of Toolgroup (type mismatch and None + list risk). This is what mypy/pyright are complaining about.
  • params = {"k": 5, "score_threshold": 0.0} does not match the union type expected by AsyncVectorIoResource.query; CI reports an incompatible dict type.
  • urljoin("https://mimir.corp.redhat.com", parent_id) is called with parent_id from chunk metadata, which is typed as a very broad object union; pyright rightfully complains about AnyStr incompatibility.
  • RAGChunk(content=chunk.content, source=source, score=score) passes an InterleavedContent and an untyped metadata value into fields typed as str / str | None, which triggers the content/source type errors.

Consider applying the suggested fixes to resolve the unbound variable, type mismatches for params, urljoin, and RAGChunk fields, and remove the INFO‑level logging of the full vector DB payload. Re‑run mypy/pyright after applying.

Comment on lines 7 to 13
import traceback
from datetime import UTC, datetime
from typing import Annotated, Any, Optional, cast
from urllib.parse import urljoin

from fastapi import APIRouter, Depends, HTTPException, Request
from litellm.exceptions import RateLimitError
from pydantic import ValidationError
from llama_stack_client import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clean up imports and centralize RAGChunk import

  • ValidationError (Line 14) is unused; Ruff flags this, so it should be removed.
  • RAGChunk is used in retrieve_response but only via function‑local imports; import it once at module level alongside ToolCall and ReferencedDocument and drop the in‑function imports to avoid duplication and potential “possibly unbound” issues from static analyzers.
  • The OFFLINE = True flag (Lines 79‑82) is currently a hard‑coded module global with a TODO. For production behavior, it should eventually come from configuration (e.g., configuration flag/env) rather than a constant; OK as a temporary stopgap, but don’t forget to wire it up.

Also applies to: 32-55, 79-82

🧰 Tools
🪛 GitHub Actions: Ruff

[error] 14-14: F401: 'pydantic.ValidationError' imported but unused. Remove unused import.

🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 7 to 15 (and apply similar cleanup at
32-55 and 79-82), remove the unused pydantic.ValidationError import, add a
module‑level import of RAGChunk alongside the existing ToolCall and
ReferencedDocument imports so retrieve_response no longer does local imports,
and delete the in‑function imports to avoid duplication and possible “possibly
unbound” analyzer warnings; leave the OFFLINE = True temporary flag but add a
TODO comment (or wire it to config/env) so production behavior is sourced from
configuration instead of a hardcoded constant.

Comment on lines 937 to 889
summary = TurnSummary(
llm_response=(
content_to_str(response.output_message.content)
interleaved_content_as_str(response.output_message.content)
if (
getattr(response, "output_message", None) is not None
and getattr(response.output_message, "content", None) is not None
)
else ""
),
tool_calls=[],
tool_results=[],
rag_chunks=[],
rag_chunks=rag_chunks,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Include tool_results when constructing TurnSummary

TurnSummary (from utils.types) expects both tool_calls and tool_results, but you only pass tool_calls and rag_chunks here. Pyright reports “argument missing for parameter 'tool_results'” and at runtime you’d get a TypeError.

Initialize it explicitly:

-    summary = TurnSummary(
+    summary = TurnSummary(
         llm_response=(
             interleaved_content_as_str(response.output_message.content)
@@
-        ),
-        tool_calls=[],
-        rag_chunks=rag_chunks,
-    )
+        ),
+        tool_calls=[],
+        tool_results=[],
+        rag_chunks=rag_chunks,
+    )
🧰 Tools
🪛 GitHub Actions: Pyright

[error] 937-937: pyright: Argument missing for parameter 'tool_results' (reportCallIssue).

🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 937 to 948, the TurnSummary
construction is missing the required tool_results argument; update the
constructor call to include a tool_results keyword argument (pass the collected
tool results if available, otherwise an empty list) so TurnSummary receives both
tool_calls and tool_results explicitly and the call matches the type signature.

Comment on lines 84 to 92
<<<<<<< HEAD
=======
# Mock conversations.create for new conversation creation
# Returns ID in llama-stack format (conv_ prefix + 48 hex chars)
mock_conversation = mocker.MagicMock()
mock_conversation.id = "conv_" + "a" * 48 # conv_aaa...aaa (proper format)
mock_client.conversations.create = mocker.AsyncMock(return_value=mock_conversation)

>>>>>>> upstream/main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Resolve merge conflict in conversations.create mock setup.

Unresolved merge conflict markers prevent the file from being parsed or executed. The conflict shows divergent mock setups:

  • HEAD: Missing or empty mock for conversations.create
  • upstream/main: Explicitly mocks conversations.create to return a conversation ID in llama-stack format (conv_ prefix + 48 hex characters)

Choose the appropriate mock setup based on the expected behavior of the Llama Stack client. The upstream/main version appears more complete.

-<<<<<<< HEAD
-=======
-    # Mock conversations.create for new conversation creation
-    # Returns ID in llama-stack format (conv_ prefix + 48 hex chars)
-    mock_conversation = mocker.MagicMock()
-    mock_conversation.id = "conv_" + "a" * 48  # conv_aaa...aaa (proper format)
-    mock_client.conversations.create = mocker.AsyncMock(return_value=mock_conversation)
-
->>>>>>> upstream/main
+    # Mock conversations.create for new conversation creation
+    # Returns ID in llama-stack format (conv_ prefix + 48 hex chars)
+    mock_conversation = mocker.MagicMock()
+    mock_conversation.id = "conv_" + "a" * 48  # conv_aaa...aaa (proper format)
+    mock_client.conversations.create = mocker.AsyncMock(return_value=mock_conversation)
+
🧰 Tools
🪛 GitHub Actions: Black

[error] 84-84: Cannot parse for target version Python 3.13: 84:0: <<<<<<< HEAD

🪛 GitHub Actions: Integration tests

[error] 84-84: SyntaxError: invalid syntax due to merge conflict markers (<<<<<<< HEAD)

🪛 GitHub Actions: Python linter

[error] 84-84: Pylint parsing failed: 'invalid syntax (tests.integration.endpoints.test_query_v2_integration, line 84)'

🪛 GitHub Actions: Ruff

[error] 84-84: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).

🤖 Prompt for AI Agents
In tests/integration/endpoints/test_query_v2_integration.py around lines 84 to
92, there are unresolved Git conflict markers and a missing mock for
conversations.create; remove the conflict markers and adopt the upstream/main
mock: create a MagicMock conversation with id set to "conv_" + 48 hex chars
(e.g., "a"*48) and set mock_client.conversations.create to an AsyncMock
returning that mock conversation so the test uses a valid llama-stack
conversation id format.

Comment on lines 171 to 176
<<<<<<< HEAD
assert response.conversation_id == "response-123"
=======
# Conversation ID is normalized (without conv_ prefix) from conversations.create()
assert response.conversation_id == "a" * 48
>>>>>>> upstream/main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Resolve merge conflict in conversation_id assertion.

Unresolved merge conflict markers prevent the file from being parsed. The conflict shows divergent expectations for the conversation_id format:

  • HEAD: Expects conversation_id == "response-123" (from mock_response.id)
  • upstream/main: Expects conversation_id == "a" * 48 (normalized format without conv_ prefix)

Resolve this conflict consistently with the mock setup at lines 84-92. If upstream/main's conversations.create mock is adopted, the normalized assertion should be used.

-<<<<<<< HEAD
-    assert response.conversation_id == "response-123"
-=======
-    # Conversation ID is normalized (without conv_ prefix) from conversations.create()
-    assert response.conversation_id == "a" * 48
->>>>>>> upstream/main
+    # Conversation ID is normalized (without conv_ prefix) from conversations.create()
+    assert response.conversation_id == "a" * 48
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<<<<<<< HEAD
assert response.conversation_id == "response-123"
=======
# Conversation ID is normalized (without conv_ prefix) from conversations.create()
assert response.conversation_id == "a" * 48
>>>>>>> upstream/main
# Conversation ID is normalized (without conv_ prefix) from conversations.create()
assert response.conversation_id == "a" * 48
🤖 Prompt for AI Agents
In tests/integration/endpoints/test_query_v2_integration.py around lines 171 to
176, remove the unresolved git conflict markers and replace the two conflicting
assertion variants with the single correct assertion that matches the
conversations.create mock declared at lines 84-92; specifically, use the
normalized conversation id expected from the mock (the 48-character "a" string)
— i.e., delete the HEAD/UPSTREAM markers and the wrong assertion for
"response-123" and leave only assert response.conversation_id == "a" * 48, then
run the test suite to confirm the expectation aligns with the mock.

Comment on lines 5 to 11
<<<<<<< HEAD
from fastapi import Request, HTTPException
import pytest
=======
import pytest
from fastapi import HTTPException, Request
>>>>>>> upstream/main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Resolve unresolved merge conflicts blocking CI.

The file contains three unresolved merge conflict markers causing Ruff, Black, and Pylint to fail. These must be resolved before the PR can proceed.

Apply this diff to resolve all conflicts (preferring the more defensive upstream/main assertion style with explicit type checking):

-<<<<<<< HEAD
-from fastapi import Request, HTTPException
-import pytest
-=======
 import pytest
 from fastapi import HTTPException, Request
->>>>>>> upstream/main
     assert exc_info.value.status_code == 401
-<<<<<<< HEAD
-    assert exc_info.value.detail["cause"] == "No Authorization header found"
-=======
     detail = exc_info.value.detail
     assert isinstance(detail, dict)
     assert detail["cause"] == "No Authorization header found"
->>>>>>> upstream/main
     assert exc_info.value.status_code == 401
-<<<<<<< HEAD
-    assert exc_info.value.detail["cause"] == "No token found in Authorization header"
-=======
     detail = exc_info.value.detail
     assert isinstance(detail, dict)
     assert detail["cause"] == "No token found in Authorization header"
->>>>>>> upstream/main

The upstream/main style is preferred because it:

  1. Groups stdlib imports before third-party (PEP 8)
  2. Includes explicit isinstance(detail, dict) assertions for better test diagnostics if the detail structure changes unexpectedly

Also applies to: 79-85, 108-114

🧰 Tools
🪛 GitHub Actions: Black

[error] 5-5: Cannot parse for target version Python 3.13: 5:0: <<<<<<< HEAD

🪛 GitHub Actions: Python linter

[error] 5-5: Pylint parsing failed: 'invalid syntax (tests.unit.authentication.test_api_key_token, line 5)'

🪛 GitHub Actions: Ruff

[error] 5-5: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).

@Anxhela21 Anxhela21 force-pushed the anx/streaming_query branch 2 times, most recently from 60c3c4f to c971439 Compare December 10, 2025 14:46
@Anxhela21 Anxhela21 changed the base branch from main to TamiTakamiya/async-agent December 10, 2025 14:54
@Anxhela21 Anxhela21 changed the base branch from TamiTakamiya/async-agent to main December 10, 2025 14:54
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (11)
src/utils/types.py (1)

36-50: Align get_tool_calls signature, docstring, and guard condition.

The method is typed as get_tool_calls(self, output_message: CompletionMessage) -> list[ToolCall], but the docstring still documents CompletionMessage | None and you guard with if output_message and output_message.tool_calls:. Given the non‑optional type, that if output_message is redundant and the docstring is misleading.

Consider tightening this to keep everything in sync:

-    def get_tool_calls(self, output_message: CompletionMessage) -> list[ToolCall]:
+    def get_tool_calls(self, output_message: CompletionMessage) -> list[ToolCall]:
@@
-        Parameters:
-            output_message (CompletionMessage | None): Completion
-            message potentially containing `tool_calls`.
+        Parameters:
+            output_message (CompletionMessage): Completion message potentially
+            containing `tool_calls`.
@@
-        if output_message and output_message.tool_calls:
-            return output_message.tool_calls
-        return []
+        return output_message.tool_calls or []

(or, if None is actually possible at call sites, make the parameter CompletionMessage | None and keep the guard.)

src/utils/endpoints.py (1)

345-375: Inconsistent return order between get_agent and get_temp_agent.

get_agent returns (agent, conversation_id, session_id) (line 342), but get_temp_agent returns (agent, session_id, conversation_id) (line 375). This inconsistency is error-prone for callers.

Additionally, the docstring (lines 359-360) says it returns tuple[AsyncAgent, str] with "(agent, session_id)" but the actual return type annotation shows tuple[AsyncAgent, str, str].

-    Returns:
-        tuple[AsyncAgent, str]: A tuple containing the agent and session_id.
+    Returns:
+        tuple[AsyncAgent, str, str]: A tuple containing (agent, conversation_id, session_id).
     """
     ...
-    return agent, session_id, conversation_id
+    return agent, conversation_id, session_id
tests/unit/app/endpoints/test_streaming_query.py (2)

147-172: Assertions inside pytest.raises block are never executed

In test_streaming_query_endpoint_handler_configuration_not_loaded, the status/detail assertions are placed inside the with pytest.raises(HTTPException) as e: block, after the awaited call. Once the exception is raised, code after the call inside the context is not executed, so those assertions never run.

Move the assertions outside the with block, e.g.:

-    with pytest.raises(HTTPException) as e:
-        await streaming_query_endpoint_handler(request, query_request, auth=MOCK_AUTH)
-        assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
-        assert e.value.detail["response"] == "Configuration is not loaded"  # type: ignore
+    with pytest.raises(HTTPException) as e:
+        await streaming_query_endpoint_handler(request, query_request, auth=MOCK_AUTH)
+
+    assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
+    assert e.value.detail["response"] == "Configuration is not loaded"  # type: ignore

175-205: Connection‑error test likely not exercising the real failure path

test_streaming_query_endpoint_on_connection_error currently:

  • Patches mock_client.models.side_effect = APIConnectionError(...), and
  • Patches client.AsyncLlamaStackClientHolder.get_client twice on the same target, and
  • Only asserts that a StreamingResponse is returned (no check that an error event is actually streamed).

If the production code calls await client.models.list(...) (as other tests suggest by setting mock_client.models.list.return_value), the side effect on models will never be triggered, so this test will not verify the connection‑error behavior.

Consider:

-    mock_client = mocker.AsyncMock()
-    mock_client.models.side_effect = APIConnectionError(request=query_request)  # type: ignore
-    mock_lsc = mocker.patch("client.AsyncLlamaStackClientHolder.get_client")
-    mock_lsc.return_value = mock_client
-    mock_async_lsc = mocker.patch("client.AsyncLlamaStackClientHolder.get_client")
-    mock_async_lsc.return_value = mock_client
+    mock_client = mocker.AsyncMock()
+    mock_client.models.list = mocker.AsyncMock(
+        side_effect=APIConnectionError(request=query_request)  # type: ignore
+    )
+    mocker.patch(
+        "client.AsyncLlamaStackClientHolder.get_client",
+        return_value=mock_client,
+    )

and then asserting that the streamed content actually contains an error event (status 503, appropriate message) rather than only checking the response type.

docs/openapi.json (5)

1366-1373: 422 example contradicts schema: QueryRequest only requires 'query'.

Examples say missing ['query','model','provider'], but model/provider are optional in the schema. Update examples to only require 'query' (or change the schema if intentional).

- "cause": "Missing required attributes: ['query', 'model', 'provider']",
+ "cause": "Missing required attributes: ['query']",

Applies to v1 query 422, v2 query 422, and v2 streaming_query 422.

Also applies to: 3586-3593, 3868-3875


2608-2613: 'not found' delete example sets success=true; fix success=false and grammar.

A failed delete should not return success=true; also prefer “cannot”.

- "response": "Conversation can not be deleted",
- "success": true
+ "response": "Conversation cannot be deleted",
+ "success": false

Reflect the same in components/ConversationDeleteResponse examples.

Also applies to: 5005-5011


1374-1380: Typo: 'attatchment' → 'attachment'.

Correct the spelling across all error examples.

- "cause": "Invalid attatchment type: must be one of [...]
+ "cause": "Invalid attachment type: must be one of [...]

Also applies to: 1654-1661, 3594-3601, 3877-3881, 7702-7707


4284-4290: metrics error responses: text/plain uses JSON object schemas.

For 401/403/500/503 you declare text/plain but reference JSON object schemas. Either:

  • Return JSON only (application/json), or
  • Keep text/plain with a simple string schema.

Recommend JSON-only for errors to match the rest of the API.

- "text/plain": { "schema": { "$ref": "#/components/schemas/UnauthorizedResponse" } }
+ // remove text/plain; keep application/json only

Repeat for 403/500/503 blocks.

Also applies to: 4306-4311, 4328-4333, 4351-4355


67-71: Root 401 text/html schema references a JSON object; use string.

Change text/html schema to a simple string for the error page (or remove if you always emit JSON for errors).

- "text/html": { "schema": { "$ref": "#/components/schemas/UnauthorizedResponse" } }
+ "text/html": { "schema": { "type": "string" } }
src/app/endpoints/query_v2.py (1)

131-136: Potential AttributeError if id attribute is missing.

On line 132, getattr(output_item, "id") is called without a default value, which will raise AttributeError if the attribute doesn't exist. Other similar calls in this function use fallback patterns (e.g., line 90-92 for function_call).

Apply this diff to add a fallback:

         return ToolCallSummary(
-            id=str(getattr(output_item, "id")),
+            id=str(getattr(output_item, "id", "") or ""),
             name=DEFAULT_RAG_TOOL,
             args=args,
             response=response_payload,
         )
src/models/responses.py (1)

1-3: Black formatting check failed.

The pipeline indicates Black formatting issues in this file. Run black src/models/responses.py to fix formatting.

#!/bin/bash
# Check Black formatting issues
uv tool run black --check --diff src/models/responses.py 2>&1 | head -50
♻️ Duplicate comments (18)
tests/unit/authentication/test_api_key_token.py (1)

5-5: Tests read cleanly after conflict resolution; direct detail["cause"] checks are fine here.

The FastAPI imports are now normalized, and the updated assertions on exc_info.value.detail["cause"] keep the tests concise while still validating the exact error causes. Given that you control the exception construction in the auth dependency, this level of defensiveness is sufficient for unit tests.

Also applies to: 74-74, 97-97

requirements.hermetic.txt (1)

1-2: Clarify the rationale for pinning older versions of build tools, particularly pip 24.2 which contains known security vulnerabilities.

The versions pinned in requirements.hermetic.txt are indeed downgrades: uv 0.8.15 (September 3, 2025) is approximately 3 months older than uv 0.9.16 (December 6, 2025), and pip 24.2 (July 28, 2024) is over a year older than pip 25.3 (October 24-25, 2025).

The pip 24.2 downgrade is particularly concerning. This version contains multiple unpatched security vulnerabilities:

  • CVE-2025-8869: Symlink/tar extraction vulnerability allowing file extraction outside target directory (unfixed in pip <25.0)
  • Arbitrary code execution via self_outdated_check: pip loads untrusted code after installation (unfixed in pip <25.0)

If these downgrades are necessary for compatibility with specific components, that rationale should be documented in a comment or README. If not, consider upgrading to the latest stable versions.

src/utils/checks.py (1)

79-91: New parent-directory check is breaking tests; also keep error message aligned with what is checked

The added parent-directory checks change behavior in a way that’s currently failing integration tests:

  • When must_exists=False and must_be_writable=True, a missing parent now raises InvalidConfigurationError ("... cannot be created - parent directory does not exist"). The pipeline failure for '/tmp/data/feedback' shows this is hit in normal test configuration, so either:
    • tests/infra need to ensure the parent directory (e.g. /tmp/data) is created up front, or
    • this check should be relaxed or adjusted (e.g. test writability of the nearest existing ancestor, or let later code os.makedirs() the full path and surface that error instead).

Additionally, the branch that checks os.access(parent_path, os.W_OK) still reports "{desc} '{path}' is not writable", even though the actual check is on the parent directory. To keep the error self-explanatory, consider something like:

-            if not os.access(parent_path, os.W_OK):
-                raise InvalidConfigurationError(f"{desc} '{path}' is not writable")
+            if not os.access(parent_path, os.W_OK):
+                raise InvalidConfigurationError(
+                    f"{desc} '{path}' cannot be created - parent directory '{parent_path}' is not writable"
+                )

This both fixes the message and makes it clear why creation of path would fail.

requirements.aarch64.txt (1)

1409-1411: urllib3 downgrade reintroduces known security issues; resolve via authoritative constraints, not this lockfile.

You’ve moved to kubernetes==34.1.0 while downgrading urllib3 to 2.3.0. Earlier analysis on this PR already noted that 2.3.x carries known redirect/info‑disclosure and DoS‑style vulnerabilities that were fixed in later 2.x releases, and recommended landing on a secure urllib3 version that also satisfies the kubernetes client’s constraint.

Right now this looks like the compatibility problem was “fixed” by picking an insecure urllib3. Instead:

  • Decide the authoritative constraint (in pyproject.toml or your constraints file) for both kubernetes and urllib3 so you can use a version of urllib3 that includes the security fixes (per the earlier bot comment, a >=2.5.x/2.6.x line, or whatever the current secure guidance is) while remaining within the kubernetes client’s supported range.
  • Re‑run uv pip compile to regenerate this file instead of editing requirements.aarch64.txt by hand.

Until that’s done, this environment is exposed to the urllib3 issues that motivated the original upgrade.

To double‑check the current safe combo for this exact kubernetes client version, please run a web search like:

What urllib3 version range is officially supported by the Python kubernetes client version 34.1.0, and what is the earliest urllib3 2.x release that includes fixes for the 2.3.x redirect/info‑disclosure and decompression/chained-encoding DoS issues?

Also applies to: 3601-3603

src/app/endpoints/query.py (4)

796-796: Avoid logging entire query response payloads at INFO level.

This was flagged in a past review. Logging the full query_response payload may expose sensitive document content.


929-941: Fix typo: "attatchment" should be "attachment".

This was flagged in a past review. The typo appears in error messages at lines 930 and 940.


836-852: Fix type errors in RAGChunk construction.

Pipeline reports multiple type errors:

  • Line 840: urljoin receives object instead of str
  • Line 848: RAGChunk is possibly unbound (local import issue)
  • Line 849: chunk.content is InterleavedContent, not str
  • Line 850: source type mismatch

The RAGChunk import at lines 799-800 is inside a conditional block and may not execute. Also, chunk.metadata.get() returns object, not str.

-        if chunk.metadata:
-            if OFFLINE:
-                parent_id = chunk.metadata.get("parent_id")
-                if parent_id:
-                    source = urljoin("https://mimir.corp.redhat.com", parent_id)
+        if chunk.metadata:
+            if OFFLINE:
+                parent_id = chunk.metadata.get("parent_id")
+                if parent_id and isinstance(parent_id, str):
+                    source = urljoin("https://mimir.corp.redhat.com", parent_id)
             else:
-                source = chunk.metadata.get("reference_url")
+                ref_url = chunk.metadata.get("reference_url")
+                source = ref_url if isinstance(ref_url, str) else None
         # ...
         rag_chunks.append(
             RAGChunk(
-                content=chunk.content,
+                content=str(chunk.content) if chunk.content else "",
                 source=source,
                 score=score,
             )
         )

Also, move the RAGChunk import to module level (this was flagged in past review).


14-14: Remove unused import ValidationError.

Pipeline failure confirms this import is unused. Remove it to fix the Ruff F401 error.

-from pydantic import ValidationError
src/app/endpoints/streaming_query.py (5)

81-84: OFFLINE flag duplicated from query.py.

This was flagged in a past review. Extract to a shared constants module.


925-930: ResponseGeneratorContext does not have vector_io_rag_chunks or vector_io_referenced_docs attributes.

Pipeline reports Cannot assign to attribute "vector_io_rag_chunks" and "vector_io_referenced_docs". The hasattr check will always return False for a dataclass/Pydantic model without these fields defined.

This was flagged in a past review. Add these fields to ResponseGeneratorContext.


1022-1022: Avoid logging entire query response payloads.

This was flagged in a past review. Move to DEBUG level and log metadata only.

Also applies to: 1264-1264


1046-1051: Hardcoded URL should be extracted to configuration.

This was flagged in a past review. The URL https://mimir.corp.redhat.com appears multiple times.

Also applies to: 1067-1069


1243-1298: Significant code duplication with query_vector_io_for_chunks.

This was flagged in a past review. The vector DB querying logic is duplicated between query_vector_io_for_chunks (lines 982-1091) and retrieve_response (lines 1243-1298). Additionally, query_vector_io_for_chunks is called at line 906-908 AND retrieve_response queries vector DB again, resulting in duplicate queries per request.

src/app/endpoints/conversations_v2.py (1)

99-101: Typo "Converastion" still present.

The typo "Converastion" should be "Conversation" at lines 99, 126, and 161.

docs/config.html (1)

205-247: Clarify auth config references and RHIdentity semantics in schema doc

In the AuthenticationConfiguration table, the jwk_config and rh_identity_config rows have empty Type/Description cells, even though dedicated sections for JwkConfiguration and RHIdentityConfiguration exist below. Consider:

  • Setting the Type to the concrete schema name (JwkConfiguration, RHIdentityConfiguration), and
  • Adding a brief description (e.g., “JWT-based auth via JWK” / “Red Hat Identity-based auth”).

In the RHIdentityConfiguration section, it would also help to explicitly document what happens when required_entitlements is omitted or an empty list (e.g., whether entitlement checks are skipped vs. requests fail). This keeps runtime behavior unambiguous for operators.

Also applies to: 1086-1103

tests/e2e/configs/run-azure.yaml (1)

93-103: Parameterize Braintrust OpenAI API key via env var instead of literal

In the Azure E2E config, the braintrust scoring provider uses openai_api_key: '********'. Similar to the CI config, it’s preferable to reference an environment variable (or remove the provider if unused) so the key is not hard‑coded and can be provided securely at runtime, e.g.:

  scoring:
    - provider_id: braintrust
      provider_type: inline::braintrust
      config:
        openai_api_key: ${env.BRAINTRUST_OPENAI_API_KEY}

This keeps the config consistent with how other credentials (e.g., AZURE_API_KEY) are handled.

src/models/responses.py (2)

746-762: success field still hardcoded to True.

This issue was flagged in a previous review but remains unfixed. When deleted=False, success should also be False.

         super().__init__(
             conversation_id=conversation_id,  # type: ignore[call-arg]
-            success=True,  # type: ignore[call-arg]
+            success=deleted,  # type: ignore[call-arg]
             response=response_msg,  # type: ignore[call-arg]
         )

775-782: Example inconsistency remains unfixed.

The "not found" example still shows success: True and uses "can not be deleted" (with space) while the code uses "cannot be deleted" (no space).

                 {
                     "label": "not found",
                     "value": {
                         "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
-                        "success": True,
-                        "response": "Conversation can not be deleted",
+                        "success": False,
+                        "response": "Conversation cannot be deleted",
                     },
                 },
🧹 Nitpick comments (21)
tests/integration/test_openapi_json.py (2)

87-99: Type annotation on spec could be more precise/consistent

Using bare dict works, but other helpers in this file use dict[str, Any]. For consistency and stricter typing, consider annotating spec as dict[str, Any] or Mapping[str, Any] instead of plain dict.


185-187: Fixture argument type hints now use plain dict

Switching spec_from_file/spec_from_url annotations from dict[str, Any] to dict is harmless at runtime but slightly reduces type precision and differs from other helpers. Consider standardizing on one style (e.g., dict[str, Any] everywhere or Mapping[str, Any]) for readability.

Also applies to: 255-257

docker-compose.yaml (1)

26-26: Debug logging flag is reasonable; consider making it easily tunable.

Enabling LLAMA_STACK_LOGGING=all=debug on the llama‑stack service is helpful while stabilizing the new flows. If this compose file is also used beyond local dev, you might optionally gate this via an env default (e.g., ${LLAMA_STACK_LOGGING:-all=debug}) so you can dial it back without editing the file, and confirm that all=debug matches the logging syntax for the llama‑stack version you now support.

scripts/query_llm.py (1)

16-16: Docstring is correct but could be slightly more descriptive.

The new main() docstring is accurate but very generic. Optional: expand it to mention that it sends a single /v1/query request to the local LLM endpoint and prints the response plus timing, which would make the script’s intent clearer when read in isolation.

scripts/generate_openapi_schema.py (1)

26-27: Consider retaining parameter and return value documentation in docstrings.

The docstrings for read_version_from_openapi and read_version_from_pyproject have been condensed to single-line descriptions. While concise, they now lack parameter and return type documentation that would help future maintainers understand the function contracts.

Per coding guidelines requiring Google Python docstring conventions, consider including Args and Returns sections:

 def read_version_from_openapi(filename: str) -> str:
-    """Read version from OpenAPI.json file."""
+    """Read version from OpenAPI.json file.
+    
+    Args:
+        filename: Path to the OpenAPI.json file.
+        
+    Returns:
+        Version string extracted from the 'info.version' field.
+    """

As per coding guidelines, all functions should have docstrings with brief descriptions following Google Python docstring conventions.

Also applies to: 38-39

docs/config.md (1)

300-304: Optional: Consider more concise wording.

The MCP server description has been updated for clarity. The static analysis tool suggests using "inaccessible" instead of "not accessible" on line 304 for more concise wording, though the current phrasing is perfectly clear and correct.

If you prefer more concise wording:

-Only MCP servers defined in the lightspeed-stack.yaml configuration are
-available to the agents. Tools configured in the llama-stack run.yaml
-are not accessible to lightspeed-core agents.
+Only MCP servers defined in the lightspeed-stack.yaml configuration are
+available to the agents. Tools configured in the llama-stack run.yaml
+are inaccessible to lightspeed-core agents.

However, the current wording is clear and acceptable as-is.

src/utils/types.py (2)

92-163: RAG chunk extraction silently drops some valid-but-unexpected JSON; consider a more forgiving fallback.

_extract_rag_chunks_from_response handles the expected shapes well (dict with "chunks", or list of dicts), and falls back to a single chunk when JSON parsing fails or when a structural exception is raised. But there are two notable edge cases where you currently drop the response entirely:

  • response_content parses as a dict without a "chunks" key (e.g. {"text": "foo"}) – no chunks are appended and no exception is raised, so the outer except doesn’t trigger.
  • It parses as a list of non‑dict items (e.g. ["foo", "bar"]) – they’re skipped by the isinstance(chunk, dict) guard without a fallback.

If you want to retain as much signal as possible, a small tweak like this would keep behavior robust while preserving the structured cases:

-                if isinstance(data, dict) and "chunks" in data:
-                    for chunk in data["chunks"]:
-                        self.rag_chunks.append(
-                            RAGChunk(
-                                content=chunk.get("content", ""),
-                                source=chunk.get("source"),
-                                score=chunk.get("score"),
-                            )
-                        )
-                elif isinstance(data, list):
-                    # Handle list of chunks
-                    for chunk in data:
-                        if isinstance(chunk, dict):
-                            self.rag_chunks.append(
-                                RAGChunk(
-                                    content=chunk.get("content", str(chunk)),
-                                    source=chunk.get("source"),
-                                    score=chunk.get("score"),
-                                )
-                            )
+                if isinstance(data, dict) and "chunks" in data:
+                    for chunk in data["chunks"]:
+                        self.rag_chunks.append(
+                            RAGChunk(
+                                content=chunk.get("content", ""),
+                                source=chunk.get("source"),
+                                score=chunk.get("score"),
+                            )
+                        )
+                elif isinstance(data, list):
+                    # Handle list of chunks (dicts or plain values)
+                    for chunk in data:
+                        if isinstance(chunk, dict):
+                            self.rag_chunks.append(
+                                RAGChunk(
+                                    content=chunk.get("content", str(chunk)),
+                                    source=chunk.get("source"),
+                                    score=chunk.get("score"),
+                                )
+                            )
+                        else:
+                            self.rag_chunks.append(
+                                RAGChunk(
+                                    content=str(chunk),
+                                    source=DEFAULT_RAG_TOOL,
+                                    score=None,
+                                )
+                            )
+                else:
+                    # Unknown JSON shape: keep the whole payload as a single chunk
+                    if response_content.strip():
+                        self.rag_chunks.append(
+                            RAGChunk(
+                                content=response_content,
+                                source=DEFAULT_RAG_TOOL,
+                                score=None,
+                            )
+                        )

That way, any non‑empty response still yields at least one RAGChunk, even when the tool payload format drifts slightly.


15-29: Minor guideline nits: logging and type hints on Singleton.__call__.

Two small style/guideline points:

  • Module has no logger = logging.getLogger(__name__) despite the project guideline to standardize logging per–module.
  • Singleton.__call__ is currently untyped (*args, **kwargs only, # type: ignore); guidelines call for complete type annotations on functions, even if the return type is just Singleton/object.

These don’t block functionality, but cleaning them up would keep this module aligned with the rest of the stack.

Also applies to: 89-96

src/app/endpoints/query.py (1)

818-821: Hardcoded URL should be extracted to configuration.

The URL https://mimir.corp.redhat.com is hardcoded. This should be moved to configuration for flexibility across environments.

+# At module level or in configuration
+MIMIR_BASE_URL = configuration.mimir_base_url or "https://mimir.corp.redhat.com"
+
 # In the code:
-                            doc_url="https://mimir.corp.redhat.com"
-                            + reference_doc,
+                            doc_url=urljoin(MIMIR_BASE_URL, reference_doc),
src/app/endpoints/streaming_query.py (1)

1126-1126: Function retrieve_response has too many statements (75/50).

Pipeline reports this function exceeds the statement limit. Consider extracting the RAG context injection logic into a helper function.

async def _build_rag_context(
    client: AsyncLlamaStackClient,
    query_request: QueryRequest,
    vector_db_ids: list[str],
) -> tuple[list[RAGChunk], str]:
    """Build RAG context from vector DB query."""
    rag_chunks: list[RAGChunk] = []
    rag_context = ""
    # ... extracted logic from lines 1243-1313
    return rag_chunks, rag_context
run.yaml (1)

121-121: Hardcoded Solr URL may need environment configuration.

The solr_url: "http://localhost:8983/solr" is hardcoded. For production deployments across different environments, consider using an environment variable.

       config:
-        solr_url: "http://localhost:8983/solr"
+        solr_url: ${env.SOLR_URL:-http://localhost:8983/solr}
tests/integration/endpoints/test_query_v2_integration.py (1)

307-372: Re‑evaluate LCORE‑1025 skips for tool‑call integration tests

Both test_query_v2_endpoint_with_tool_calls and test_query_v2_endpoint_with_multiple_tool_types are permanently skipped via @pytest.mark.skip(reason="LCORE-1025: ToolCallSummary.response type mismatch"). These tests exercise the new RAG/tool-call surface and are valuable regression coverage.

If LCORE‑1025 is now resolved (the ToolCall / ToolCallSummary shapes look stabilized elsewhere in this PR), consider removing the skips so these paths are exercised again. If the issue is still open, keeping the skip is fine but it’s worth confirming the ticket status and planning to re‑enable once fixed.

Also applies to: 439-507

tests/unit/app/endpoints/test_streaming_query.py (1)

372-757: Inconsistent async test marking may cause tests to be skipped or mis‑run

Several async tests (test_retrieve_response_vector_db_available, test_retrieve_response_no_available_shields, test_retrieve_response_one_available_shields, test_retrieve_response_two_available_shields, test_retrieve_response_four_available_shields, and the attachment variants) are defined as async def but lack @pytest.mark.asyncio, while other async tests in this module are explicitly marked.

Depending on your pytest-asyncio configuration, these functions may not run as intended or may be skipped. For consistency and clarity, consider adding @pytest.mark.asyncio to these async tests as well.

tests/e2e/configs/run-ci.yaml (1)

99-102: Avoid hard‑coding Braintrust OpenAI API key placeholder in CI config

The braintrust scoring provider is configured with a literal openai_api_key: '********'. For CI/e2e configs this is better modeled as an environment‑driven value (or omitted entirely if Braintrust isn’t used in tests), e.g.:

scoring:
  - provider_id: braintrust
    provider_type: inline::braintrust
    config:
      openai_api_key: ${env.BRAINTRUST_OPENAI_API_KEY}

This avoids committing even placeholder secret‑looking values and makes it clear where the real key should come from at runtime.

src/app/endpoints/streaming_query_v2.py (1)

154-162: Consider catching specific exception types.

The broad except Exception clause catches all exceptions, which can mask unexpected errors. Since you're accessing attributes, AttributeError would be more appropriate.

             # Emit start on response.created
             if event_type == "response.created":
                 try:
                     conv_id = getattr(chunk, "response").id
-                except Exception:  # pylint: disable=broad-except
+                except AttributeError:
                     logger.warning("Missing response id!")
                     conv_id = ""
                 yield stream_start_event(conv_id)
                 continue
docs/openapi.json (1)

6631-6645: Constrain media_type to known values.

Provide an enum for media_type to guide clients and validation.

 "media_type": {
-  "anyOf": [ { "type": "string" }, { "type": "null" } ],
+  "anyOf": [
+    { "type": "string", "enum": ["application/json", "text/plain"] },
+    { "type": "null" }
+  ],
   "title": "Media Type",
   "description": "Media type for the response format",
   "examples": ["application/json", "text/plain"]
 }
src/app/endpoints/query_v2.py (3)

138-145: Missing fallback for id attribute in web_search_call.

Similar to the file_search_call case, getattr(output_item, "id") on line 141 lacks a default value. For consistency with the function_call pattern (lines 90-92), consider adding a fallback.

     if item_type == "web_search_call":
         args = {"status": getattr(output_item, "status", None)}
+        call_id = getattr(output_item, "id", None) or getattr(output_item, "call_id", None)
         return ToolCallSummary(
-            id=str(getattr(output_item, "id")),
+            id=str(call_id),
             name="web_search",
             args=args,
             response=None,
         )

409-415: TODO: Document parsing not yet implemented.

The function returns an empty list with a clear TODO explaining the needed work. This is acceptable as a placeholder, but tracking the implementation would be valuable.

Would you like me to open an issue to track the implementation of document parsing from the Responses API?


519-539: Consider making max_num_results configurable.

The max_num_results: 10 is hardcoded. Consider making this configurable via the AppConfig if different use cases require different limits.

src/models/responses.py (1)

367-376: Inconsistent default patterns for rag_chunks vs tool_calls.

rag_chunks defaults to [] (empty list) while tool_calls defaults to None. Consider using consistent defaults for similar collection fields.

     rag_chunks: list[RAGChunk] = Field(
-        [],
+        default_factory=list,
         description="List of RAG chunks used to generate the response",
     )

     tool_calls: list[ToolCall] | None = Field(
-        None,
+        default_factory=list,
         description="List of tool calls made during response generation",
     )
src/models/config.py (1)

1281-1284: Consider adding error handling for file operations.

The dump method opens and writes to a file without exception handling. Consider wrapping in try-except for better error reporting.

     def dump(self, filename: str = "configuration.json") -> None:
         """Dump actual configuration into JSON file."""
-        with open(filename, "w", encoding="utf-8") as fout:
-            fout.write(self.model_dump_json(indent=4))
+        try:
+            with open(filename, "w", encoding="utf-8") as fout:
+                fout.write(self.model_dump_json(indent=4))
+        except OSError as e:
+            raise ValueError(f"Failed to dump configuration to {filename}: {e}") from e

Comment on lines 1519 to 1527
"application/json": {
"schema": {}
},
"text/event-stream": {
"schema": {
"type": "string",
"format": "text/event-stream"
"type": "string"
},
"example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"No Violation\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 1, \"token\": \"\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 2, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 3, \"token\": \"!\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 4, \"token\": \" How\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 5, \"token\": \" can\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 6, \"token\": \" I\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 7, \"token\": \" assist\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 8, \"token\": \" you\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 9, \"token\": \" today\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 10, \"token\": \"?\"}}\n\ndata: {\"event\": \"turn_complete\", \"data\": {\"token\": \"Hello! How can I assist you today?\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 11, \"output_tokens\": 19}, \"available_quotas\": {}}\n\n"
"example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 0, \"output_tokens\": 0}, \"available_quotas\": {}}\n\n"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix SSE content types for /v1/streaming_query (remove stray application/json, keep text/event-stream).

200 → content lists application/json with empty schema plus text/event-stream for SSE. OpenAPI generators will model a JSON response; that’s incorrect for SSE. Remove application/json for 200 and keep only text/event-stream, and ensure examples match the declared schema.

Apply this diff minimally:

       "responses": {
         "200": {
-          "description": "Streaming response (Server-Sent Events)",
-          "content": {
-            "application/json": {
-              "schema": {}
-            },
-            "text/event-stream": {
+          "description": "Streaming response (Server-Sent Events)",
+          "content": {
+            "text/event-stream": {
               "schema": {
                 "type": "string"
               },
-              "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 0, \"output_tokens\": 0}, \"available_quotas\": {}}\n\n"
+              "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": false, \"input_tokens\": 0, \"output_tokens\": 0, \"available_quotas\": {} }}\n\n"
             }
           }
         },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"application/json": {
"schema": {}
},
"text/event-stream": {
"schema": {
"type": "string",
"format": "text/event-stream"
"type": "string"
},
"example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"No Violation\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 1, \"token\": \"\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 2, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 3, \"token\": \"!\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 4, \"token\": \" How\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 5, \"token\": \" can\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 6, \"token\": \" I\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 7, \"token\": \" assist\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 8, \"token\": \" you\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 9, \"token\": \" today\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 10, \"token\": \"?\"}}\n\ndata: {\"event\": \"turn_complete\", \"data\": {\"token\": \"Hello! How can I assist you today?\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 11, \"output_tokens\": 19}, \"available_quotas\": {}}\n\n"
"example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 0, \"output_tokens\": 0}, \"available_quotas\": {}}\n\n"
}
"text/event-stream": {
"schema": {
"type": "string"
},
"example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": false, \"input_tokens\": 0, \"output_tokens\": 0, \"available_quotas\": {} }}\n\n"
}
🤖 Prompt for AI Agents
In docs/openapi.json around lines 1519 to 1527, the 200 response for
/v1/streaming_query incorrectly lists "application/json" with an empty schema
alongside "text/event-stream", which causes OpenAPI generators to treat the
response as JSON; remove the "application/json" content entry so only
"text/event-stream" remains for the 200 response, and ensure the remaining
"text/event-stream" entry's schema/type and example accurately reflect SSE
(string) format.

Comment on lines 1526 to 1527
"example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 0, \"output_tokens\": 0}, \"available_quotas\": {}}\n\n"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

SSE example sets truncated=null but schema is boolean.

Align the SSE “end” example with QueryResponse which defines truncated as boolean.

- "truncated": null
+ "truncated": false

Also move available_quotas inside the same data object consistently (your v1 example already does after previous comment’s fix).

Also applies to: 3740-3742

🤖 Prompt for AI Agents
In docs/openapi.json around lines 1526-1527 (and also apply the same change at
lines 3740-3742), the SSE "end" example currently sets "truncated": null and
places "available_quotas" outside the "data" object which conflicts with the
QueryResponse schema; change "truncated" to a boolean (true or false as
appropriate for the example) and move "available_quotas" inside the same "data"
object so the example matches the schema structure and types.

Comment on lines 3735 to 3750
"200": {
"description": "Streaming response with Server-Sent Events",
"content": {
"application/json": {
"schema": {
"type": "string",
"example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 0, \"output_tokens\": 0}, \"available_quotas\": {}}\n\n"
}
},
"text/plain": {
"schema": {
"type": "string",
"example": "Hello world!\n\n---\n\nReference: https://example.com/doc"
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

/v2/streaming_query returns SSE; content-types are wrong.

200 → content declares application/json (string) and text/plain. Use text/event-stream only, like v1, and align the example.

Suggested fix:

-        "content": {
-          "application/json": {
-            "schema": {
-              "type": "string",
-              "example": "data:{...SSE...}"
-            }
-          },
-          "text/plain": {
-            "schema": {
-              "type": "string",
-              "example": "Hello world!\n\n---\n\nReference: https://example.com/doc"
-            }
-          }
-        }
+        "content": {
+          "text/event-stream": {
+            "schema": { "type": "string" },
+            "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": false, \"input_tokens\": 0, \"output_tokens\": 0, \"available_quotas\": {} }}\n\n"
+          }
+        }
🤖 Prompt for AI Agents
In docs/openapi.json around lines 3735 to 3750, the 200 response for
/v2/streaming_query incorrectly lists application/json and text/plain; this
endpoint returns Server-Sent Events and should use only text/event-stream.
Remove the application/json and text/plain content entries, replace them with a
single "text/event-stream" content entry whose schema is a string and whose
example shows SSE frames (data: ...\n\n) matching the v1 format; keep the
descriptive "Streaming response with Server-Sent Events" description intact.

Comment on lines 4365 to 3929
"api_key": {
"type": "string",
"minLength": 1,
"format": "password",
"title": "API key",
"writeOnly": true,
"examples": [
"some-api-key"
]
"title": "Api Key",
"default": "some-api-key"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

API keys in docs: remove default and mark writeOnly.

Avoid shipping a default-looking API key and mark it writeOnly for generators.

- "api_key": { "type": "string", "title": "Api Key", "default": "some-api-key" }
+ "api_key": { "type": "string", "title": "API Key", "minLength": 1, "writeOnly": true }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"api_key": {
"type": "string",
"minLength": 1,
"format": "password",
"title": "API key",
"writeOnly": true,
"examples": [
"some-api-key"
]
"title": "Api Key",
"default": "some-api-key"
}
"api_key": {
"type": "string",
"title": "API Key",
"minLength": 1,
"writeOnly": true
}
🤖 Prompt for AI Agents
In docs/openapi.json around lines 4365 to 4369, the schema for "api_key"
currently includes a default value and isn't marked writeOnly; remove the
"default": "some-api-key" entry and add "writeOnly": true to the property so
generators don't treat it as a readable default or expose it in responses.

docs/openapi.md Outdated
Comment on lines 3271 to 3275
| Status Code | Description | Component |
|-------------|-------------|-----------|
| 200 | Streaming response with Server-Sent Events | string
string |
| 401 | Unauthorized | [UnauthorizedResponse](#unauthorizedresponse)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix malformed Markdown table for streaming response.

The responses table has formatting issues: missing trailing pipes and incorrect column structure. This will cause rendering problems in documentation.

 ### ✅ Responses
 
 | Status Code | Description | Component |
 |-------------|-------------|-----------|
-| 200 | Streaming response with Server-Sent Events | string
-string |
+| 200 | Streaming response with Server-Sent Events | string |
 | 401 | Unauthorized | [UnauthorizedResponse](#unauthorizedresponse)

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

3273-3273: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe

(MD055, table-pipe-style)


3274-3274: Table pipe style
Expected: leading_and_trailing; Actual: trailing_only; Missing leading pipe

(MD055, table-pipe-style)


3274-3274: Table column count
Expected: 3; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


3275-3275: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe

(MD055, table-pipe-style)

🤖 Prompt for AI Agents
In docs/openapi.md around lines 3271 to 3275, the Markdown table rows are
malformed (missing trailing pipes and an extra line break inside a cell); fix by
ensuring each row has exactly three pipe-separated columns and a trailing pipe —
specifically replace the 200 row with a single line: Status Code column "200",
Description column "Streaming response with Server-Sent Events", Component
column "string" (all separated by pipes and ending with a trailing pipe), and
add the missing trailing pipe to the 401 row as well so the table renders
correctly.

Comment on lines +1067 to +1065
source = urljoin(
"https://mimir.corp.redhat.com", parent_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix urljoin type error with parent_id.

Pipeline reports urljoin receives object type instead of str. Add type guard.

                         if OFFLINE:
                             parent_id = chunk.metadata.get("parent_id")
-                            if parent_id:
+                            if parent_id and isinstance(parent_id, str):
                                 source = urljoin(
                                     "https://mimir.corp.redhat.com", parent_id
                                 )

Apply this fix at both occurrences (lines 1065-1069 and 1273-1277).

Also applies to: 1275-1276

🧰 Tools
🪛 GitHub Actions: Pyright

[error] 1068-1068: Pyright: Argument of type "float | str | List[object] | object | Literal[True]" cannot be assigned to parameter "url" of type "AnyStr@urljoin | None" in function "urljoin" (reportArgumentType)

🪛 GitHub Actions: Type checks

[error] 1067-1067: mypy: Value of type variable "AnyStr" of "urljoin" cannot be "object"

🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 1065-1069 and again around
1273-1277, urljoin is being called with parent_id which the type checker reports
as object; add a type guard that ensures parent_id is a str before calling
urljoin (e.g., if isinstance(parent_id, str) use urljoin, otherwise convert to
str(parent_id) or skip the join if None/unexpected), and apply the same
guard/handling at both occurrences so urljoin always receives a string.

Comment on lines +1076 to +1079
final_rag_chunks.append(
RAGChunk(
content=chunk.content,
source=source,
score=score,
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix RAGChunk constructor type errors.

Pipeline reports:

  • chunk.content is InterleavedContent, not str
  • source type mismatch
                     final_rag_chunks.append(
                         RAGChunk(
-                            content=chunk.content,
-                            source=source,
+                            content=str(chunk.content) if chunk.content else "",
+                            source=source if isinstance(source, str) else None,
                             score=score,
                         )
                     )

Apply at both locations (lines 1076-1082 and 1284-1290).

Also applies to: 1284-1290

🧰 Tools
🪛 GitHub Actions: Pyright

[error] 1078-1078: Pyright: Argument of type "InterleavedContent" cannot be assigned to parameter "content" of type "str" in function "init" (reportArgumentType)


[error] 1079-1079: Pyright: Argument of type "str | bool | float | List[object] | object | None" cannot be assigned to parameter "source" of type "str | None" in function "init" (reportArgumentType)

🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 1076-1082 (and similarly at
1284-1290), the RAGChunk constructor is being passed chunk.content which is an
InterleavedContent object and a source whose type doesn't match RAGChunk's
expected type; convert the content to a plain string (e.g., use
chunk.content.text or str(chunk.content) / chunk.content.to_string()) and ensure
source is normalized to the expected primitive (e.g., source.url or str(source)
or source.get("source") depending on the source shape) before constructing
RAGChunk so the types match the RAGChunk signature.

Comment on lines 1318 to 1285
documents=documents,
stream=True,
# toolgroups=toolgroups,
toolgroups=toolgroups,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix toolgroups type for create_turn call.

Pipeline reports type incompatibility. The toolgroups variable may be list[str] when only MCP toolgroups are present, but create_turn expects List[Toolgroup] | None.

Ensure proper typing throughout the toolgroups assignment chain, using cast if necessary:

from typing import cast
from llama_stack_client.types.agents.turn_create_params import Toolgroup

# When assigning mcp_toolgroups:
-mcp_toolgroups = [mcp_server.name for mcp_server in configuration.mcp_servers]
+mcp_toolgroups: list[Toolgroup] = [
+    mcp_server.name for mcp_server in configuration.mcp_servers
+]

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 GitHub Actions: Pyright

[error] 1320-1320: Pyright: Argument of type "list[str | ToolgroupAgentToolGroupWithArgs] | Unknown | list[str] | None" cannot be assigned to parameter "toolgroups" of type "List[Toolgroup] | None" in function "create_turn" (reportArgumentType)


[error] 1320-1320: Pyright: Type parameter invariance: list[str] is not assignable to List[Toolgroup] (reportArgumentType)

🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 1318-1320, the variable
`toolgroups` may be a list[str] but `create_turn` requires List[Toolgroup] |
None; ensure `toolgroups` is typed appropriately before the call by converting
any MCP string entries into Toolgroup instances (or wrapping/constructing the
correct Toolgroup objects) or, if you have validated they are already
Toolgroup-like, use typing.cast[List[Toolgroup]](toolgroups) to satisfy the
signature; update upstream assignments and annotate the variable so the type is
consistently List[Toolgroup] | None prior to passing it to create_turn.

Comment on lines 516 to 538
retrieved_entry = retrieved_entries[0]
# Check all fields match
assert retrieved_entry.query == entry_with_docs.query
assert retrieved_entry.response == entry_with_docs.response
assert retrieved_entry.provider == entry_with_docs.provider
assert retrieved_entry.model == entry_with_docs.model
assert retrieved_entry.started_at == entry_with_docs.started_at
assert retrieved_entry.completed_at == entry_with_docs.completed_at
# Check referenced documents - URLs may be strings after serialization
assert retrieved_entry.referenced_documents is not None
assert entry_with_docs.referenced_documents is not None
assert len(retrieved_entry.referenced_documents) == len(entry_with_docs.referenced_documents)
# pylint: disable-next=unsubscriptable-object
assert (
str(retrieved_entry.referenced_documents[0].doc_url)
== str(entry_with_docs.referenced_documents[0].doc_url) # pylint: disable=unsubscriptable-object
)
# pylint: disable-next=unsubscriptable-object
assert (
retrieved_entry.referenced_documents[0].doc_title
== entry_with_docs.referenced_documents[0].doc_title # pylint: disable=unsubscriptable-object
)
assert retrieved_entries[0].referenced_documents[0].doc_title == "Test Doc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix Black formatting issue and consider simplifying pylint workarounds.

The pipeline indicates Black formatting failed for this file. Please run uv tool run black . to fix formatting.

The field-by-field validation approach is appropriate for testing serialization round-trips where object equality may fail due to type coercion. However, the multiple # pylint: disable-next=unsubscriptable-object comments suggest type narrowing issues. Since you've already verified referenced_documents is not None on both objects, consider extracting the documents to local variables to help type inference:

     # Check referenced documents - URLs may be strings after serialization
     assert retrieved_entry.referenced_documents is not None
     assert entry_with_docs.referenced_documents is not None
     assert len(retrieved_entry.referenced_documents) == len(entry_with_docs.referenced_documents)
-    # pylint: disable-next=unsubscriptable-object
-    assert (
-        str(retrieved_entry.referenced_documents[0].doc_url)
-        == str(entry_with_docs.referenced_documents[0].doc_url)  # pylint: disable=unsubscriptable-object
-    )
-    # pylint: disable-next=unsubscriptable-object
-    assert (
-        retrieved_entry.referenced_documents[0].doc_title
-        == entry_with_docs.referenced_documents[0].doc_title  # pylint: disable=unsubscriptable-object
-    )
+    retrieved_doc = retrieved_entry.referenced_documents[0]
+    expected_doc = entry_with_docs.referenced_documents[0]
+    assert str(retrieved_doc.doc_url) == str(expected_doc.doc_url)
+    assert retrieved_doc.doc_title == expected_doc.doc_title
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
retrieved_entry = retrieved_entries[0]
# Check all fields match
assert retrieved_entry.query == entry_with_docs.query
assert retrieved_entry.response == entry_with_docs.response
assert retrieved_entry.provider == entry_with_docs.provider
assert retrieved_entry.model == entry_with_docs.model
assert retrieved_entry.started_at == entry_with_docs.started_at
assert retrieved_entry.completed_at == entry_with_docs.completed_at
# Check referenced documents - URLs may be strings after serialization
assert retrieved_entry.referenced_documents is not None
assert entry_with_docs.referenced_documents is not None
assert len(retrieved_entry.referenced_documents) == len(entry_with_docs.referenced_documents)
# pylint: disable-next=unsubscriptable-object
assert (
str(retrieved_entry.referenced_documents[0].doc_url)
== str(entry_with_docs.referenced_documents[0].doc_url) # pylint: disable=unsubscriptable-object
)
# pylint: disable-next=unsubscriptable-object
assert (
retrieved_entry.referenced_documents[0].doc_title
== entry_with_docs.referenced_documents[0].doc_title # pylint: disable=unsubscriptable-object
)
assert retrieved_entries[0].referenced_documents[0].doc_title == "Test Doc"
retrieved_entry = retrieved_entries[0]
# Check all fields match
assert retrieved_entry.query == entry_with_docs.query
assert retrieved_entry.response == entry_with_docs.response
assert retrieved_entry.provider == entry_with_docs.provider
assert retrieved_entry.model == entry_with_docs.model
assert retrieved_entry.started_at == entry_with_docs.started_at
assert retrieved_entry.completed_at == entry_with_docs.completed_at
# Check referenced documents - URLs may be strings after serialization
assert retrieved_entry.referenced_documents is not None
assert entry_with_docs.referenced_documents is not None
assert len(retrieved_entry.referenced_documents) == len(entry_with_docs.referenced_documents)
retrieved_doc = retrieved_entry.referenced_documents[0]
expected_doc = entry_with_docs.referenced_documents[0]
assert str(retrieved_doc.doc_url) == str(expected_doc.doc_url)
assert retrieved_doc.doc_title == expected_doc.doc_title
assert retrieved_entries[0].referenced_documents[0].doc_title == "Test Doc"
🤖 Prompt for AI Agents
In tests/unit/cache/test_postgres_cache.py around lines 516 to 538, Black
formatting failed and the test uses repeated pylint disables for
unsubscriptable-object; run `uv tool run black .` to fix formatting, then
simplify the assertions by first assigning
retrieved_entry.referenced_documents[0] and
entry_with_docs.referenced_documents[0] to local variables (after the explicit
None checks) so type checkers can infer their types and you can remove the `#
pylint: disable-next=unsubscriptable-object` comments; keep the same
field-by-field asserts (query, response, provider, model, started_at,
completed_at, length check and doc fields) but reference the local variables for
doc_url/doc_title comparisons and the final doc_title assertion.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (7)
run.yaml (1)

140-143: Update provider_shield_id to reference a valid Llama-Guard model.

The provider_shield_id is set to "gpt-3.5-turbo", which is an OpenAI model, not a Llama-Guard model. For the llama-guard provider, provider_shield_id must be a Llama-Guard model identifier (e.g., "meta-llama/Llama-Guard-3-8B").

src/app/endpoints/query.py (3)

878-889: Add missing tool_results parameter to TurnSummary.

The TurnSummary constructor is missing the required tool_results parameter. This will cause a runtime TypeError.

     summary = TurnSummary(
         llm_response=(
             interleaved_content_as_str(response.output_message.content)
             if (
                 getattr(response, "output_message", None) is not None
                 and getattr(response.output_message, "content", None) is not None
             )
             else ""
         ),
         tool_calls=[],
+        tool_results=[],
         rag_chunks=rag_chunks,
     )

745-758: Fix type error: get_rag_toolgroups may return None.

Line 752 concatenates the result of get_rag_toolgroups(vector_db_ids) with mcp_toolgroups, but get_rag_toolgroups can return None. This will cause a TypeError at runtime.

         toolgroups = None
         if vector_db_ids:
-            toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups
+            rag_toolgroups = get_rag_toolgroups(vector_db_ids)
+            toolgroups = (rag_toolgroups or []) + mcp_toolgroups
         elif mcp_toolgroups:
             toolgroups = mcp_toolgroups

798-799: Move imports to module level.

Importing RAGChunk and ReferencedDocument inside the function is inefficient. Move these to the module-level imports at the top of the file.

+from models.responses import (
+    ForbiddenResponse,
+    QueryResponse,
+    RAGChunk,
+    ReferencedDocument,
+    ToolCall,
+    UnauthorizedResponse,
+)
...
             if query_response.chunks:
-                from models.responses import RAGChunk, ReferencedDocument
-
                 retrieved_chunks = query_response.chunks
src/app/endpoints/streaming_query.py (3)

77-80: Extract OFFLINE flag to shared constants module.

This constant is duplicated from query.py. Extract it to a shared constants module (e.g., src/app/constants.py) and import from both files to ensure consistency and follow DRY principles.


1045-1046: Extract hardcoded Mimir base URL to configuration.

The URL https://mimir.corp.redhat.com is hardcoded here (and duplicated at lines 1065, 1241). Extract this to a configuration value for flexibility across environments.


1208-1275: Significant code duplication: vector DB querying logic.

Lines 1208-1275 duplicate the vector DB query logic from query_vector_io_for_chunks (lines 979-1088). Both blocks:

  • Query vector_io with identical params setup
  • Process chunks with OFFLINE flag logic
  • Extract metadata and build RAGChunk objects

Consider consolidating by having retrieve_response call query_vector_io_for_chunks to get the chunks, then format the RAG context from those results.

-    # Get RAG chunks before sending to LLM (reuse logic from query_vector_io_for_chunks)
-    rag_chunks = []
-    try:
-        if vector_db_ids:
-            # ... 60+ lines of duplicate logic
-    except Exception as e:
-        logger.warning("Failed to query vector database for chunks: %s", e)
+    # Reuse query_vector_io_for_chunks for consistency
+    rag_chunks, _ = await query_vector_io_for_chunks(client, query_request)
     
     # Format RAG context for injection into user message
     rag_context = ""
🧹 Nitpick comments (1)
lightspeed-stack.yaml (1)

24-28: Consider persistent storage for production deployments.

The conversation cache uses /tmp/data/conversation-cache.db which won't survive container restarts. This is fine for development/testing as indicated by the comment, but production deployments should use a persistent volume mount or external database.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c971439 and 4ca24a7.

📒 Files selected for processing (8)
  • lightspeed-stack.yaml (1 hunks)
  • run.yaml (3 hunks)
  • src/app/endpoints/query.py (8 hunks)
  • src/app/endpoints/streaming_query.py (14 hunks)
  • src/models/requests.py (2 hunks)
  • src/models/responses.py (1 hunks)
  • tests/unit/app/endpoints/test_conversations_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_query.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/models/requests.py
  • src/models/responses.py
🧰 Additional context used
📓 Path-based instructions (5)
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

  • tests/unit/app/endpoints/test_query.py
  • tests/unit/app/endpoints/test_conversations_v2.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/unit/app/endpoints/test_query.py
  • tests/unit/app/endpoints/test_conversations_v2.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
🧠 Learnings (1)
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).

Applied to files:

  • src/app/endpoints/streaming_query.py
🧬 Code graph analysis (1)
src/app/endpoints/streaming_query.py (4)
src/models/responses.py (2)
  • RAGChunk (149-154)
  • ReferencedDocument (179-191)
src/utils/types.py (1)
  • TurnSummary (89-163)
src/models/requests.py (1)
  • QueryRequest (73-233)
src/app/endpoints/query.py (1)
  • get_rag_toolgroups (951-978)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (5)
tests/unit/app/endpoints/test_conversations_v2.py (1)

103-103: LGTM! Test updated to match model type changes.

The assertion correctly uses str() to compare doc_url, accommodating the type change in ReferencedDocument where doc_url is now str | Optional[AnyUrl].

tests/unit/app/endpoints/test_query.py (1)

1014-1014: LGTM! Tests updated to match model type changes.

The test assertions correctly use str() for doc_url comparisons, aligning with the ReferencedDocument type change. The malformed URL test (lines 1041-1043) properly verifies that invalid URLs are stored as raw strings rather than being rejected.

Also applies to: 1041-1043, 1066-1066, 1093-1095

run.yaml (1)

155-160: LGTM! Granite embedding model configuration is consistent.

The granite-embedding model configuration correctly specifies the embedding dimension (384) which matches the vector_io configuration at line 125. The model identifier ibm-granite/granite-embedding-30m-english is properly formatted for the sentence-transformers provider.

src/app/endpoints/streaming_query.py (1)

174-246: LGTM! Streaming end event properly includes RAG data.

The stream_end_event function correctly includes both rag_chunks and vector_io_referenced_docs in the final streaming response. The implementation properly converts RAG chunks to dict format and combines metadata_map documents with vector_io documents.

src/app/endpoints/query.py (1)

856-867: LGTM! RAG context injection implementation.

The RAG context formatting and injection logic properly:

  • Limits to top 5 chunks
  • Formats each chunk with source attribution
  • Prepends relevant documentation to the user query
  • Logs the injection for debugging

This approach cleanly augments the user message with retrieved context before the LLM call.

Comment on lines +118 to +130
- provider_id: solr-vector
provider_type: remote::solr_vector_io
config:
solr_url: "http://localhost:8983/solr"
collection_name: "portal-rag"
vector_field: "chunk_vector"
content_field: "chunk"
embedding_dimension: 384
inference_provider_id: sentence-transformers
persistence:
type: sqlite
db_path: .llama/distributions/ollama/portal_rag_kvstore.db
namespace: portal-rag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Hardcoded Solr URL should use environment variable.

The solr_url: "http://localhost:8983/solr" is hardcoded and won't work in containerized or distributed deployments. Consider using an environment variable reference similar to line 19's pattern.

       config:
-        solr_url: "http://localhost:8983/solr"
+        solr_url: ${env.SOLR_URL:-http://localhost:8983/solr}
         collection_name: "portal-rag"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- provider_id: solr-vector
provider_type: remote::solr_vector_io
config:
solr_url: "http://localhost:8983/solr"
collection_name: "portal-rag"
vector_field: "chunk_vector"
content_field: "chunk"
embedding_dimension: 384
inference_provider_id: sentence-transformers
persistence:
type: sqlite
db_path: .llama/distributions/ollama/portal_rag_kvstore.db
namespace: portal-rag
- provider_id: solr-vector
provider_type: remote::solr_vector_io
config:
solr_url: ${env.SOLR_URL:-http://localhost:8983/solr}
collection_name: "portal-rag"
vector_field: "chunk_vector"
content_field: "chunk"
embedding_dimension: 384
inference_provider_id: sentence-transformers
persistence:
type: sqlite
db_path: .llama/distributions/ollama/portal_rag_kvstore.db
namespace: portal-rag
🤖 Prompt for AI Agents
In run.yaml around lines 118 to 130, the solr_url is hardcoded to
"http://localhost:8983/solr"; replace that literal with an environment-variable
reference (e.g. follow the pattern used on line 19) so the URL is configurable
at runtime. Update the solr_url field to read from a SOLR_URL (or similarly
named) env var, preserve quoting/format, and optionally provide a sensible
default/fallback if your template system supports it; ensure documentation and
deployment manifests (docker-compose / k8s) set that env var.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
run.yaml (1)

29-35: Fix duplicate top‑level models and vector_dbs keys in YAML.

You now have two models: mappings (lines 29–35 and 140–150) and two vector_dbs: mappings (lines 134–139 and 155), which YAMLlint flags as duplicate keys. Most parsers will silently let the last one win, so the earlier definitions are ignored.

Consolidate these into single blocks, e.g.:

  • Move all model declarations into the bottom models: list and delete the earlier models: at lines 29–35 (or vice‑versa).
  • Keep only one vector_dbs: mapping (likely the new portal-rag block) and remove the trailing vector_dbs: [].

Until this is fixed, the effective config will be ambiguous and may not match your intent.

Also applies to: 134-150, 154-154

src/app/endpoints/streaming_query.py (3)

54-71: Resolve RAGChunk/ReferencedDocument import issues and redefinitions.

  • models.responses apparently does not export RAGChunk (Ruff reports “no name 'RAGChunk' in module 'models.responses'”), and you also import ReferencedDocument again from utils.endpoints. This leads to F811/F821 redefinition errors.
  • You only need a single canonical ReferencedDocument type (and likely only from models.responses).

Suggestion:

-from models.responses import (
-    ForbiddenResponse,
-    RAGChunk,
-    ReferencedDocument,
-    UnauthorizedResponse,
-    ...
-)
-from utils.endpoints import (
-    ReferencedDocument,
-    check_configuration_loaded,
-    ...
-)
+from models.responses import (
+    ForbiddenResponse,
+    UnauthorizedResponse,
+    RAGChunk,            # if actually defined there; otherwise import from its real module
+    ReferencedDocument,
+    ...
+)
+from utils.endpoints import (
+    check_configuration_loaded,
+    cleanup_after_streaming,
+    create_rag_chunks_dict,
+    get_agent,
+    get_system_prompt,
+    validate_model_provider_override,
+)

Adjust the RAGChunk import to match where it is really defined. This will clear the redefinition/undefined-name warnings.


902-1064: Broken context wiring and dead nested response_generator block.

Inside streaming_query_endpoint_handler_base:

  • You define a second async response_generator (lines 926–1047) that references many undefined names (interleaved_content_as_str, get_session, create_referenced_documents_with_metadata, CacheEntry) and a context variable that is never created.
  • The old ResponseGeneratorContext initialization is fully commented out (1049–1060), yet you still call create_response_generator_func(context) (line 1063), which will raise NameError at runtime.
  • The nested response_generator is never used; the actual generator comes from create_response_generator_func.

This block needs to be either completed or removed. Given you already have create_agent_response_generator, the simplest fix is to delete the unused nested generator and reinstate a valid ResponseGeneratorContext:

-        metadata_map: dict[str, dict[str, Any]] = {}
-
-        async def response_generator(...):
-            ...
-            cache_entry = CacheEntry(...)
-        # Create context object for response generator
-#         context = ResponseGeneratorContext(
-#             ...
-#         )
+        metadata_map: dict[str, dict[str, Any]] = {}
+
+        context = ResponseGeneratorContext(
+            conversation_id=conversation_id,
+            user_id=user_id,
+            skip_userid_check=_skip_userid_check,
+            model_id=model_id,
+            provider_id=provider_id,
+            llama_stack_model_id=llama_stack_model_id,
+            query_request=query_request,
+            started_at=started_at,
+            client=client,
+            metadata_map=metadata_map,
+        )

…and delete the entire nested async def response_generator(...) block if it is not intended to be used anymore.

As written, this function will not work and is the main source of the context/undefined-name errors from Ruff and mypy.


1357-1455: Normalize vector DB toolgroup handling and RAG chunk typing in retrieve_response.

In retrieve_response (streaming path):

  • toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups will fail if get_rag_toolgroups returns None (mypy error and potential TypeError).
  • params and urljoin suffer from the same typing issues as query_vector_io_for_chunks.
  • RAGChunk(content=chunk.content, source=source, ...) again passes non‑string content and potentially non‑string source.

You can mirror the patterns used in the earlier suggestion:

-        vector_dbs = await client.vector_dbs.list()
-        vector_db_ids = [vdb.identifier for vdb in vector_dbs]
-        mcp_toolgroups = [mcp_server.name for mcp_server in configuration.mcp_servers]
-
-        toolgroups = None
-        if vector_db_ids:
-            toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups
-        elif mcp_toolgroups:
-            toolgroups = mcp_toolgroups
+        vector_dbs = await client.vector_dbs.list()
+        vector_db_ids = [vdb.identifier for vdb in vector_dbs]
+        mcp_toolgroups = [mcp_server.name for mcp_server in configuration.mcp_servers]
+
+        rag_toolgroups = get_rag_toolgroups(vector_db_ids) if vector_db_ids else None
+        toolgroups = (rag_toolgroups or []) + mcp_toolgroups if mcp_toolgroups else rag_toolgroups
@@
-            params = {"k": 5, "score_threshold": 0.0}
+            params: dict[str, Any] = {"k": 5, "score_threshold": 0.0}
@@
-                            parent_id = chunk.metadata.get("parent_id")
-                            if parent_id:
-                                source = urljoin(
-                                    "https://mimir.corp.redhat.com", parent_id
-                                )
+                            parent_id = chunk.metadata.get("parent_id")
+                            if parent_id and isinstance(parent_id, str):
+                                source = urljoin(
+                                    "https://mimir.corp.redhat.com", parent_id
+                                )
@@
-                    rag_chunks.append(
-                        RAGChunk(
-                            content=chunk.content,
-                            source=source,
-                            score=score,
-                        )
-                    )
+                    rag_chunks.append(
+                        RAGChunk(
+                            content=str(chunk.content) if chunk.content else "",
+                            source=source if isinstance(source, str) else None,
+                            score=score,
+                        )
+                    )

This addresses the mypy errors and ensures RAGChunk instances have the expected types.

src/app/endpoints/query.py (1)

582-613: Fix return type and docstring of parse_metadata_from_text_item.

Two small issues:

  • The function is annotated as Optional[ReferencedDocument] but returns a list[ReferencedDocument] when text_item is not a TextContentItem:
docs: list[ReferencedDocument] = []
if not isinstance(text_item, TextContentItem):
    return docs  # wrong type
  • The docstring says it returns ReferencedDocument but doesn’t mention None.

Suggested fix:

-    docs: list[ReferencedDocument] = []
-    if not isinstance(text_item, TextContentItem):
-        return docs
+    if not isinstance(text_item, TextContentItem):
+        return None
@@
-            if url and title:
-                return ReferencedDocument(doc_url=url, doc_title=title, doc_id=None)
+            if url and title:
+                return ReferencedDocument(doc_url=url, doc_title=title, doc_id=None)

Optionally update the docstring “Returns” section to mention that None is returned when no valid metadata is found.

♻️ Duplicate comments (5)
src/app/endpoints/query.py (4)

78-82: Shared OFFLINE flag should be centralized (optional).

You now have an OFFLINE constant here and in streaming_query.py with identical semantics. Consider moving this to a shared config/constants module and reading from configuration/env rather than hard‑coding it in two places, to avoid divergence later.


874-885: Ensure TurnSummary is fully populated and referenced documents include RAG docs.

You now construct:

summary = TurnSummary(
    llm_response=...,
    tool_calls=[],
    rag_chunks=rag_chunks,
)
referenced_documents = parse_referenced_documents(response)
referenced_documents.extend(doc_ids_from_chunks)

Two points:

  • If TurnSummary also expects tool_results, confirm that its default is optional; otherwise add tool_results=[] explicitly (similar to other code paths in the repo).
  • Extending referenced_documents with doc_ids_from_chunks is correct, but ensure doc_ids_from_chunks elements are built with the same ReferencedDocument type (which they are) and that the “Mimir” base URL is eventually configurable rather than hard‑coded.

If tool_results is required, adjust:

 summary = TurnSummary(
     llm_response=...,
-    tool_calls=[],
-    rag_chunks=rag_chunks,
+    tool_calls=[],
+    tool_results=[],
+    rag_chunks=rag_chunks,
 )

Also applies to: 887-891


915-942: Fix “attatchment” typo in validation error messages.

The error messages in validate_attachments_metadata still contain the typo:

"Invalid attatchment type ..."
"Invalid attatchment content type ..."

This was previously flagged. Update to:

-            message = (
-                f"Invalid attatchment type {attachment.attachment_type}: "
+            message = (
+                f"Invalid attachment type {attachment.attachment_type}: "
@@
-            message = (
-                f"Invalid attatchment content type {attachment.content_type}: "
+            message = (
+                f"Invalid attachment content type {attachment.content_type}: "

This improves API error clarity and keeps tests/docs consistent.


766-826: Fix vector IO params typing, logging, and exception message formatting in retrieve_response.

Issues in this block:

  • params = {"k": 5, "score_threshold": 0.0} is inferred as dict[str, float], which doesn’t match the more general dict[str, bool | float | str | Iterable | object | None] expected by vector_io.query (mypy error).
  • Logging uses f-strings with potentially large query_response payloads at INFO level (may leak content and is noisy).
  • Exception logging uses f-strings inside logger methods (logger.info(f"...")) instead of %s formatting (style, not correctness).

Suggested cleanup:

-            params = {"k": 5, "score_threshold": 0.0}
-            logger.info(f"Initial params: {params}")
-            logger.info(f"query_request.solr: {query_request.solr}")
+            params: dict[str, Any] = {"k": 5, "score_threshold": 0.0}
+            logger.info("Initial params: %s", params)
+            logger.info("query_request.solr: %s", query_request.solr)
@@
-                params["solr"] = query_request.solr
-                logger.info(f"Final params with solr filters: {params}")
+                params["solr"] = query_request.solr
+                logger.info("Final params with solr filters: %s", params)
@@
-            logger.info(f"Final params being sent to vector_io.query: {params}")
+            logger.info("Final params being sent to vector_io.query: %s", params)
@@
-            logger.info(f"The query response total payload: {query_response}")
+            logger.debug(
+                "Vector IO query returned %d chunks",
+                len(query_response.chunks) if query_response.chunks else 0,
+            )
@@
-    except Exception as e:
-        logger.warning(f"Failed to query vector database for chunks: {e}")
-        logger.debug(f"Vector DB query error details: {traceback.format_exc()}")
+    except Exception as e:
+        logger.warning("Failed to query vector database for chunks: %s", e)
+        logger.debug("Vector DB query error details: %s", traceback.format_exc())

This addresses the mypy error and improves logging hygiene.

src/app/endpoints/streaming_query.py (1)

1153-1262: Type and logging issues in query_vector_io_for_chunks.

Several problems here affect type‑checking and logging:

  • params = {"k": 5, "score_threshold": 0.0} is inferred as dict[str, float], which doesn’t match the vector_io params type.
  • parent_id = chunk.metadata.get("parent_id") is typed as object; calling urljoin(..., parent_id) without a guard causes mypy/pyright errors and could pass non‑strings at runtime.
  • RAGChunk(content=chunk.content, source=source, ...) passes an InterleavedContent (or similar) object into a str field, and source may not be a str.
  • logger.info("The query response total payload: %s", query_response) logs the entire payload at INFO, which is noisy and may leak content (similar to past comments).

Suggested fixes:

-            params = {"k": 5, "score_threshold": 0.0}
+            params: dict[str, Any] = {"k": 5, "score_threshold": 0.0}
@@
-            logger.info("The query response total payload: %s", query_response)
+            logger.debug(
+                "Vector IO query returned %d chunks",
+                len(query_response.chunks) if query_response.chunks else 0,
+            )
@@
-                        if OFFLINE:
-                            parent_id = chunk.metadata.get("parent_id")
-                            if parent_id:
-                                source = urljoin(
-                                    "https://mimir.corp.redhat.com", parent_id
-                                )
+                        if OFFLINE:
+                            parent_id = chunk.metadata.get("parent_id")
+                            if parent_id and isinstance(parent_id, str):
+                                source = urljoin(
+                                    "https://mimir.corp.redhat.com", parent_id
+                                )
@@
-                    final_rag_chunks.append(
-                        RAGChunk(
-                            content=chunk.content,
-                            source=source,
-                            score=score,
-                        )
-                    )
+                    final_rag_chunks.append(
+                        RAGChunk(
+                            content=str(chunk.content) if chunk.content else "",
+                            source=source if isinstance(source, str) else None,
+                            score=score,
+                        )
+                    )

Also consider moving the hard‑coded Mimir base URL into configuration (as prior reviews suggested).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ca24a7 and a534ece.

📒 Files selected for processing (7)
  • run.yaml (2 hunks)
  • src/app/endpoints/query.py (8 hunks)
  • src/app/endpoints/streaming_query.py (10 hunks)
  • src/models/requests.py (2 hunks)
  • src/models/responses.py (1 hunks)
  • tests/unit/app/endpoints/test_conversations_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_query.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/models/requests.py
  • src/models/responses.py
🧰 Additional context used
📓 Path-based instructions (5)
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/app/endpoints/test_query.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/app/endpoints/test_query.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query.py
🧠 Learnings (1)
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).

Applied to files:

  • src/app/endpoints/streaming_query.py
🪛 GitHub Actions: Black
src/app/endpoints/streaming_query.py

[error] 1-1: Black formatting check failed. 1 file would be reformatted.

🪛 GitHub Actions: Integration tests
src/app/endpoints/query.py

[error] 1-1: ModuleNotFoundError: No module named 'llama_stack_client.types.agents' while importing module 'app.endpoints.query'.

🪛 GitHub Actions: Python linter
src/app/endpoints/streaming_query.py

[warning] 54-54: Reimport 'ForbiddenResponse' (imported line 54) (reimported)


[warning] 54-54: Reimport 'UnauthorizedResponse' (imported line 54) (reimported)


[error] 54-54: No name 'RAGChunk' in module 'models.responses' (no-name-in-module)


[warning] 70-70: Reimport 'ReferencedDocument' (imported line 54) (reimported)


[error] 159-159: Too many arguments (7/5) (too-many-arguments)


[error] 159-159: Too many positional arguments (7/5) (too-many-positional-arguments)


[warning] 164-164: Unused argument 'referenced_documents' (unused-argument)


[error] 1016-1016: Undefined variable 'get_session' (undefined-variable)


[error] 1029-1029: Undefined variable 'create_referenced_documents_with_metadata' (undefined-variable)


[error] 1037-1037: Undefined variable 'CacheEntry' (undefined-variable)


[warning] 1015-1015: Unused variable 'topic_summary' (unused-variable)


[warning] 1037-1037: Unused variable 'cache_entry' (unused-variable)


[error] 1063-1063: Undefined variable 'context' (undefined-variable)


[warning] 1435-1435: Catching too general exception Exception (broad-exception-caught)


[error] 1265-1265: Too many statements (74/50) (too-many-statements)

🪛 GitHub Actions: Ruff
src/app/endpoints/streaming_query.py

[error] 55-55: F811: Redefinition of unused 'ForbiddenResponse'.


[error] 67-67: F811: Redefinition of unused 'UnauthorizedResponse'.


[error] 71-71: F821: Undefined or redefined name 'ReferencedDocument'.


[error] 960-960: F821: Undefined name 'interleaved_content_as_str'.


[error] 1016-1016: F821: Undefined name 'get_session'.


[error] 1023-1023: F841: Local variable 'topic_summary' assigned to but never used.


[error] 1029-1029: F821: Undefined name 'create_referenced_documents_with_metadata'.


[error] 1037-1037: F841: Local variable 'cache_entry' assigned to but never used.


[error] 1037-1037: F821: Undefined name 'CacheEntry'.


[error] 1063-1063: F811: Redefinition of unused 'response_generator'.


[error] 1063-1063: F821: Undefined name 'context'.

src/app/endpoints/query.py

[error] 22-22: F401: Remove unused import: 'Document' from llama_stack_client.types.agents.turn_create_params.

🪛 GitHub Actions: Type checks
tests/unit/app/endpoints/test_conversations_v2.py

[error] Process completed with exit code 1.

tests/unit/app/endpoints/test_query.py

[error] Process completed with exit code 1.

src/app/endpoints/streaming_query.py

[error] 809-809: mypy: Argument 4 to 'stream_end_event' has incompatible type 'Any | list[Any]'; expected 'dict[str, int]'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 960-960: mypy: Name 'interleaved_content_as_str' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1016-1016: mypy: Name 'get_session' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1029-1029: mypy: Name 'create_referenced_documents_with_metadata' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1037-1037: mypy: Name 'CacheEntry' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1063-1063: mypy: Name 'context' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1190-1190: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | Omit'. (Note: dict is invariant; consider using Mapping) (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1238-1238: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1400-1400: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | Omit'. (Note: dict is invariant; consider using Mapping) (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1400-1400: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1414-1414: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] Process completed with exit code 1.

src/app/endpoints/query.py

[error] 748-748: mypy: Unsupported left operand type for + ('None'). (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 788-788: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | Omit'. (Note: dict is invariant; consider using Mapping) (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 835-835: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] Process completed with exit code 1.

run.yaml

[error] Process completed with exit code 1.

🪛 GitHub Actions: Unit tests
tests/unit/app/endpoints/test_query.py

[error] 22-22: ImportError while importing test module due to missing dependency ('llama_stack_client.types.agents').

src/app/endpoints/query.py

[error] 20-20: ModuleNotFoundError: No module named 'llama_stack_client.types.agents'. This import error occurs while importing Turn from llama_stack_client.types.agents in app.endpoints.query, causing test collection to fail during the 'uv run pytest tests/unit --cov=src --cov=runner --cov-report term-missing' step.

🪛 YAMLlint (1.37.1)
run.yaml

[error] 140-140: duplication of key "models" in mapping

(key-duplicates)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E: server mode / azure
🔇 Additional comments (4)
tests/unit/app/endpoints/test_conversations_v2.py (1)

93-97: Updated doc_url assertion matches widened type.

Using str(ref_docs[0]["doc_url"]) == "http://example.com" is consistent with ReferencedDocument.doc_url now being either a plain string or AnyUrl; the test is correct and resilient.

Also applies to: 111-116

tests/unit/app/endpoints/test_query.py (1)

1009-1023: Doc URL expectations correctly adapted to string/URL union.

The updated assertions (str(doc.doc_url) == ...) and the “malformed URL” test expecting a non‑None ReferencedDocument with raw doc_url align with the new ReferencedDocument.doc_url semantics. The test logic itself looks sound; remaining test failures stem from import errors in app.endpoints.query.

Also applies to: 1041-1051, 1070-1075, 1097-1103

src/app/endpoints/query.py (2)

299-310: query_endpoint_handler_base now depends on extended retrieve_response contract — ensure all call sites match.

You’ve updated the base handler to expect retrieve_response_func to return (summary, conversation_id, referenced_documents, token_usage) and to pass provider_id as a keyword argument. That’s correct for the new retrieve_response in this module, but any other callers (e.g., alternative backends) must maintain this contract.

Double‑check any other query_endpoint_handler_base(..., retrieve_response_func=...) usages to ensure they:

  • Accept provider_id as kwarg in their retrieve_response implementation.
  • Return the 4‑tuple so token_usage and referenced_documents are well‑defined.

Within this file, the wiring looks coherent.

Also applies to: 333-350


7-25: The import paths in the current code are correct for llama-stack-client==0.3.0. According to official documentation, the imports should use llama_stack_client.types.agents (not the alpha namespace):

from llama_stack_client.types.agents import Turn, TurnCreateResponse

The current code's import structure aligns with this. The ModuleNotFoundError cited in the review comment is not due to incorrect import paths—reverting to types.alpha.agents would introduce the actual error. If CI is failing, the issue lies elsewhere (e.g., dependency installation, version mismatch, or a different module).

Regarding the unused Document import: if this is truly unused, it should be removed per code cleanliness standards, but this is a minor concern unrelated to the import path issue.

Likely an incorrect or invalid review comment.

Comment on lines +741 to 752
# Include RAG toolgroups when vector DBs are available
vector_dbs = await client.vector_dbs.list()
vector_db_ids = [vdb.identifier for vdb in vector_dbs]
mcp_toolgroups = [mcp_server.name for mcp_server in configuration.mcp_servers]

toolgroups = None
if vector_db_ids:
toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups
elif mcp_toolgroups:
toolgroups = mcp_toolgroups

# Convert empty list to None for consistency with existing behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard get_rag_toolgroups result when building toolgroups.

Here:

vector_dbs = await client.vector_dbs.list()
vector_db_ids = [vdb.identifier for vdb in vector_dbs]
mcp_toolgroups = [mcp_server.name for mcp_server in configuration.mcp_servers]

toolgroups = None
if vector_db_ids:
    toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups
elif mcp_toolgroups:
    toolgroups = mcp_toolgroups

get_rag_toolgroups can return None, so None + mcp_toolgroups will raise a TypeError and mypy flags this.

Use an intermediate variable and coerce None to []:

-        toolgroups = None
-        if vector_db_ids:
-            toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups
-        elif mcp_toolgroups:
-            toolgroups = mcp_toolgroups
+        toolgroups = None
+        if vector_db_ids:
+            rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
+            toolgroups = rag_toolgroups + mcp_toolgroups
+        elif mcp_toolgroups:
+            toolgroups = mcp_toolgroups

This will satisfy mypy and avoid runtime failures when no RAG toolgroups are available.

🧰 Tools
🪛 GitHub Actions: Type checks

[error] 748-748: mypy: Unsupported left operand type for + ('None'). (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')

🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 741 to 752, the code adds the return
value of get_rag_toolgroups (which can be None) to mcp_toolgroups causing a
TypeError and mypy failure; call get_rag_toolgroups into an intermediate
variable, coerce None to an empty list (e.g., rag_toolgroups =
get_rag_toolgroups(vector_db_ids) or []), then set toolgroups = rag_toolgroups +
mcp_toolgroups (or toolgroups = mcp_toolgroups if no vector_db_ids), and finally
convert an empty toolgroups list to None to preserve existing behavior.

Comment on lines +828 to +851
for i, chunk in enumerate(retrieved_chunks):
# Extract source from chunk metadata based on OFFLINE flag
source = None
if chunk.metadata:
if OFFLINE:
parent_id = chunk.metadata.get("parent_id")
if parent_id:
source = urljoin("https://mimir.corp.redhat.com", parent_id)
else:
source = chunk.metadata.get("reference_url")

# Get score from retrieved_scores list if available
score = retrieved_scores[i] if i < len(retrieved_scores) else None

rag_chunks.append(
RAGChunk(
content=chunk.content,
source=source,
score=score,
)
)

logger.info(f"Retrieved {len(rag_chunks)} chunks from vector DB")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize urljoin parent_id type and RAGChunk field types.

Within the loop that builds rag_chunks:

if OFFLINE:
    parent_id = chunk.metadata.get("parent_id")
    if parent_id:
        source = urljoin("https://mimir.corp.redhat.com", parent_id)
...
rag_chunks.append(
    RAGChunk(
        content=chunk.content,
        source=source,
        score=score,
    )
)

Problems:

  • parent_id is typed as object; passing it directly to urljoin triggers mypy/pyright errors and might not be a string at runtime.
  • RAGChunk.content is presumably str, but chunk.content is an interleaved content object.

Fix:

-            if OFFLINE:
-                parent_id = chunk.metadata.get("parent_id")
-                if parent_id:
-                    source = urljoin("https://mimir.corp.redhat.com", parent_id)
+            if OFFLINE:
+                parent_id = chunk.metadata.get("parent_id")
+                if parent_id and isinstance(parent_id, str):
+                    source = urljoin("https://mimir.corp.redhat.com", parent_id)
@@
-        rag_chunks.append(
-            RAGChunk(
-                content=chunk.content,
-                source=source,
-                score=score,
-            )
-        )
+        rag_chunks.append(
+            RAGChunk(
+                content=str(chunk.content) if chunk.content else "",
+                source=source if isinstance(source, str) else None,
+                score=score,
+            )
+        )

This resolves the urljoin type error and ensures RAGChunk fields have the expected primitive types.

Also applies to: 835-835

🧰 Tools
🪛 GitHub Actions: Type checks

[error] 835-835: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')

🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 828-851, normalize types before
constructing RAGChunk: ensure parent_id is coerced to a string before calling
urljoin (e.g. parent_id = chunk.metadata.get("parent_id"); if parent_id is not
None: parent_id = str(parent_id); source =
urljoin("https://mimir.corp.redhat.com", parent_id)), ensure content is a
primitive string (extract a .text attribute if present or fallback to
str(chunk.content) so RAGChunk.content is str), and coerce score to a float or
None (e.g. score = float(retrieved_scores[i]) if i < len(retrieved_scores) and
retrieved_scores[i] is not None else None) before creating the RAGChunk.

Comment on lines 159 to 167
def stream_end_event(
metadata_map: dict,
summary: TurnSummary,
token_usage: TokenCounter,
available_quotas: dict[str, int],
referenced_documents: list[ReferencedDocument],
media_type: str = MEDIA_TYPE_JSON,
vector_io_referenced_docs: list[ReferencedDocument] | None = None,
) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix stream_end_event signature usage and argument ordering.

stream_end_event now expects:

def stream_end_event(
    metadata_map: dict,
    summary: TurnSummary,
    token_usage: TokenCounter,
    available_quotas: dict[str, int],
    referenced_documents: list[ReferencedDocument],
    media_type: str = MEDIA_TYPE_JSON,
    vector_io_referenced_docs: list[ReferencedDocument] | None = None,
) -> str:

But create_agent_response_generator calls it as:

yield stream_end_event(
    context.metadata_map,
    token_usage,
    available_quotas,
    referenced_documents,
    media_type,
)

So summary is never passed and the remaining args are shifted (mypy error plus wrong runtime behavior).

Update the call to match the signature:

-        yield stream_end_event(
-            context.metadata_map,
-            token_usage,
-            available_quotas,
-            referenced_documents,
-            media_type,
-        )
+        yield stream_end_event(
+            context.metadata_map,
+            summary,
+            token_usage,
+            available_quotas,
+            referenced_documents,
+            media_type,
+            vector_io_referenced_docs=None,
+        )

(or pass through any vector‑IO docs via context if you add them there).

Also applies to: 793-811

🧰 Tools
🪛 GitHub Actions: Python linter

[error] 159-159: Too many arguments (7/5) (too-many-arguments)


[error] 159-159: Too many positional arguments (7/5) (too-many-positional-arguments)


[warning] 164-164: Unused argument 'referenced_documents' (unused-argument)

🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 159-167, the
stream_end_event signature expects (metadata_map, summary, token_usage,
available_quotas, referenced_documents, media_type=...,
vector_io_referenced_docs=None) but calls in create_agent_response_generator
(and also at lines ~793-811) pass arguments in the wrong order omitting summary;
update those calls to pass the TurnSummary value as the second positional
argument so the call becomes: stream_end_event(context.metadata_map, summary,
token_usage, available_quotas, referenced_documents, media_type,
vector_io_referenced_docs) — or if vector_io_referenced_docs lives on context,
pass context.vector_io_referenced_docs for that parameter; ensure all calls
match the signature and that mypy/type checks are satisfied.

Signed-off-by: Anxhela Coba <acoba@redhat.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (8)
src/app/endpoints/query.py (7)

817-818: Remove redundant local import.

RAGChunk and ReferencedDocument are imported locally inside the function, which is inefficient. Import them at module level instead. This was previously flagged.


897-908: Include tool_results in TurnSummary construction.

The TurnSummary construction is missing the required tool_results parameter. This was previously flagged and will cause a TypeError at runtime.

🔎 Previously suggested fix
     summary = TurnSummary(
         llm_response=(
             content_to_str(response.output_message.content)
             if (
                 getattr(response, "output_message", None) is not None
                 and getattr(response.output_message, "content", None) is not None
             )
             else ""
         ),
         tool_calls=[],
+        tool_results=[],
         rag_chunks=rag_chunks,
     )

948-949: Fix typo: "attatchment" should be "attachment".

There's a typo in the error message at line 948 (and also line 958). This was previously flagged but remains unresolved.


764-777: Fix type error: get_rag_toolgroups can return None.

At line 771, get_rag_toolgroups(vector_db_ids) can return None, causing a TypeError when concatenating with mcp_toolgroups. This issue was previously flagged but remains unresolved.

🔎 Previously suggested fix
-        toolgroups = None
-        if vector_db_ids:
-            toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups
-        elif mcp_toolgroups:
-            toolgroups = mcp_toolgroups
+        toolgroups = None
+        if vector_db_ids:
+            rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
+            toolgroups = rag_toolgroups + mcp_toolgroups
+        elif mcp_toolgroups:
+            toolgroups = mcp_toolgroups

814-814: Avoid logging entire query response payload at INFO level.

Logging the full query_response may expose sensitive document content. This was previously flagged but remains unresolved.


850-871: Fix type errors in RAGChunk construction.

Pipeline failures indicate:

  1. Line 858: parent_id is typed as object, causing a type error with urljoin which expects str.
  2. chunk.content may not be a primitive string type expected by RAGChunk.content.
🔎 Previously suggested fix
             if OFFLINE:
                 parent_id = chunk.metadata.get("parent_id")
-                if parent_id:
-                    source = urljoin("https://mimir.corp.redhat.com", parent_id)
+                if parent_id and isinstance(parent_id, str):
+                    source = urljoin("https://mimir.corp.redhat.com", parent_id)
@@
         rag_chunks.append(
             RAGChunk(
-                content=chunk.content,
-                source=source,
+                content=str(chunk.content) if chunk.content else "",
+                source=source if isinstance(source, str) else None,
                 score=score,
             )
         )

799-812: Fix type incompatibility for params dictionary.

Pipeline reports that params at line 811 has type dict[str, float] but vector_io.query expects dict[str, bool | float | str | Iterable[object] | object | None] | Omit. The issue arises because params["solr"] is assigned a complex object (line 804).

🔎 Proposed fix

Explicitly type params to satisfy the broader type requirement:

-            params = {"k": 5, "score_threshold": 0.0}
+            params: dict[str, Any] = {"k": 5, "score_threshold": 0.0}
run.yaml (1)

101-113: Hardcoded Solr URL should use environment variable.

The solr_url: "http://localhost:8983/solr" at line 104 remains hardcoded and won't work in containerized or distributed deployments. This was previously flagged but has not been addressed.

🧹 Nitpick comments (1)
src/app/endpoints/query.py (1)

79-82: Move OFFLINE flag to configuration.

The module-level OFFLINE = True constant should be sourced from configuration or environment variables rather than hardcoded. This will allow runtime control without code changes.

🔎 Suggested approach

Add to your configuration class or read from environment:

# In configuration module
OFFLINE_MODE = os.getenv("LIGHTSPEED_OFFLINE_MODE", "true").lower() == "true"

Then import and use in this module:

from configuration import configuration

# Later in code
if configuration.offline_mode:
    # ...
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a534ece and 35182dc.

📒 Files selected for processing (5)
  • run.yaml
  • src/app/endpoints/query.py
  • src/models/requests.py
  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/app/endpoints/test_query.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/models/requests.py
  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/app/endpoints/test_query.py
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/app/endpoints/query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/query.py
🧠 Learnings (1)
📚 Learning: 2025-12-18T10:21:03.056Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:03.056Z
Learning: In run.yaml for Llama Stack 0.3.x, do not add a telemetry provider block under providers. To enable telemetry, set telemetry.enabled: true directly (no provider block required). This pattern applies specifically to run.yaml configuration in this repository.

Applied to files:

  • run.yaml
🧬 Code graph analysis (1)
src/app/endpoints/query.py (2)
src/models/responses.py (1)
  • ReferencedDocument (328-340)
src/utils/types.py (1)
  • RAGChunk (128-133)
🪛 GitHub Actions: Integration tests
src/app/endpoints/query.py

[error] 20-20: ImportError: No module named 'llama_stack_client.types.agents'.

🪛 GitHub Actions: Ruff
src/app/endpoints/query.py

[error] 22-22: F401: Unused import Document from llama_stack_client.types.agents.turn_create_params. Remove unused import.

🪛 GitHub Actions: Type checks
src/app/endpoints/query.py

[error] 771-771: mypy: Unsupported left operand type for + ("None").


[error] 811-811: mypy: Argument "params" to "query" of "AsyncVectorIoResource" has incompatible type "dict[str, float]"; expected "dict[str, bool | float | str | Iterable[object] | object | None] | Omit".


[error] 811-811: mypy: Note: "dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance.


[error] 858-858: mypy: Value of type variable "AnyStr" of "urljoin" cannot be "object".

🪛 GitHub Actions: Unit tests
src/app/endpoints/query.py

[error] 20-20: ModuleNotFoundError: No module named 'llama_stack_client.types.agents' during pytest collection. Command: 'uv run pytest tests/unit --cov=src --cov=runner --cov-report term-missing'.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-pr
  • GitHub Check: E2E: server mode / vertexai
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E: server mode / azure
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (1)
src/app/endpoints/query.py (1)

875-887: RAG context injection implementation looks good.

The logic correctly formats the top 5 RAG chunks and injects them into the user message. The approach of appending context to the query is appropriate for providing relevant documentation to the LLM.

Comment on lines +20 to 25
from llama_stack_client.types.agents.turn import Turn
from llama_stack_client.types.agents.turn_create_params import (
Document,
Toolgroup,
ToolgroupAgentToolGroupWithArgs,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n src/app/endpoints/query.py | head -50

Repository: lightspeed-core/lightspeed-stack

Length of output: 208


🏁 Script executed:

rg "Document" src/app/endpoints/query.py -n

Repository: lightspeed-core/lightspeed-stack

Length of output: 916


🏁 Script executed:

fd "llama_stack" --type f | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 971


🏁 Script executed:

rg "from llama_stack_client" src/ -n | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 1789


🏁 Script executed:

cat -n src/app/endpoints/query.py | head -80

Repository: lightspeed-core/lightspeed-stack

Length of output: 3331


🏁 Script executed:

# Check the correct import paths more thoroughly
rg "Turn|Document" src/app/endpoints/query.py -B2 -A2 -n | head -60

Repository: lightspeed-core/lightspeed-stack

Length of output: 2104


🏁 Script executed:

# Verify the actual import path for Turn and Document in llama_stack_client
rg "from llama_stack_client.types" src/ --type py | grep -E "(turn|agents)" | sort -u

Repository: lightspeed-core/lightspeed-stack

Length of output: 730


🏁 Script executed:

# Check if Document is actually used anywhere in query.py
rg -n "Document" src/app/endpoints/query.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 916


🏁 Script executed:

# Also check Toolgroup and ToolgroupAgentToolGroupWithArgs usage
rg -n "Toolgroup|ToolgroupAgentToolGroupWithArgs" src/app/endpoints/query.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 289


🏁 Script executed:

# Verify the full file length to see if Document is used elsewhere
wc -l src/app/endpoints/query.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 106


🏁 Script executed:

# Double-check that Document isn't used anywhere else in the file besides import and comments
rg -n "Document[^a-zA-Z]" src/app/endpoints/query.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 838


🏁 Script executed:

# Verify the correct import path by checking what llama_stack_client actually exports
python3 << 'EOF'
# Check if we can find documentation or inspect the actual module structure
import subprocess
result = subprocess.run(
    ['rg', '-n', 'class Turn|class Document', 'src/', '--type', 'py', '-A', '1'],
    capture_output=True,
    text=True
)
print("Local class definitions:")
print(result.stdout[:2000] if result.stdout else "No local definitions found")
EOF

Repository: lightspeed-core/lightspeed-stack

Length of output: 221


Fix import paths for llama_stack_client types.

The import paths at lines 20-25 use incorrect module paths. They should use llama_stack_client.types.alpha.agents instead of llama_stack_client.types.agents. This is consistent with other files in the codebase (src/models/requests.py, src/utils/token_counter.py, src/metrics/utils.py, etc.).

Additionally, remove the unused Document import on line 22—it is not used anywhere in the file except in a commented line.

Corrected imports should be:

  • from llama_stack_client.types.alpha.agents.turn import Turn
  • from llama_stack_client.types.alpha.agents.turn_create_params import Toolgroup, ToolgroupAgentToolGroupWithArgs
🧰 Tools
🪛 GitHub Actions: Integration tests

[error] 20-20: ImportError: No module named 'llama_stack_client.types.agents'.

🪛 GitHub Actions: Ruff

[error] 22-22: F401: Unused import Document from llama_stack_client.types.agents.turn_create_params. Remove unused import.

🪛 GitHub Actions: Unit tests

[error] 20-20: ModuleNotFoundError: No module named 'llama_stack_client.types.agents' during pytest collection. Command: 'uv run pytest tests/unit --cov=src --cov=runner --cov-report term-missing'.

🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 20 to 25, the imports use the wrong
module path and include an unused Document import; update the import paths to
use llama_stack_client.types.alpha.agents and remove Document. Specifically
replace the two imports so they read: import Turn from
llama_stack_client.types.alpha.agents.turn and import Toolgroup and
ToolgroupAgentToolGroupWithArgs from
llama_stack_client.types.alpha.agents.turn_create_params, and remove the unused
Document name from the import list.

title = data.get("title")
if url and title:
return ReferencedDocument(doc_url=url, doc_title=title)
return ReferencedDocument(doc_url=url, doc_title=title, doc_id=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find ReferencedDocument class definition
rg "class ReferencedDocument" -A 15 --type py

Repository: lightspeed-core/lightspeed-stack

Length of output: 938


🏁 Script executed:

# Check line 631 in context
head -650 src/app/endpoints/query.py | tail -30

Repository: lightspeed-core/lightspeed-stack

Length of output: 1195


🏁 Script executed:

# Search for the typo "attatchment" mentioned in the scratchpad
rg "attatchment" src/app/endpoints/query.py -n

Repository: lightspeed-core/lightspeed-stack

Length of output: 238


🏁 Script executed:

# Check if ReferencedDocument has any ConfigDict or extra field configuration
rg "class ReferencedDocument" -B 5 -A 20 src/models/responses.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 807


🏁 Script executed:

# Check if BaseModel is configured to allow extra fields globally
rg "ConfigDict\|config_dict\|extra.*allow" src/models/responses.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Check context around lines 912-913 for doc_ids_from_chunks
sed -n '900,920p' src/app/endpoints/query.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 793


🏁 Script executed:

# Check more context around the ReferencedDocument instantiation and see where doc_ids_from_chunks is used
sed -n '920,960p' src/app/endpoints/query.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 1750


🏁 Script executed:

# Check if doc_id is defined anywhere in ReferencedDocument or if there's a mismatch
rg "doc_id" src/models/responses.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


Remove doc_id=None parameter from ReferencedDocument instantiation at line 631.

The ReferencedDocument model in src/models/responses.py defines only doc_url and doc_title fields. Passing doc_id=None at line 631 will cause a Pydantic ValidationError at runtime.

🤖 Prompt for AI Agents
In src/app/endpoints/query.py around line 631, the instantiation of
ReferencedDocument passes an extraneous doc_id=None which does not exist on the
model and will cause a Pydantic ValidationError; remove the doc_id=None argument
so you only pass doc_url=url and doc_title=title when constructing
ReferencedDocument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants