Skip to content

Conversation

@ZhengKai91
Copy link

@ZhengKai91 ZhengKai91 commented Dec 7, 2025

This pull request introduces a robust mechanism to handle Structured Outputs (Pydantic BaseModel enforcement) for LLMs that do not natively support the response_format parameter (e.g., Deepseek Chat, certain open-source models).

Why

Currently, when attempting structured output using the response_format argument (in client.py)with models that don't support it (like deepseek/deepseek-chat), the call to litellm.acompletion fails with a litellm.BadRequestError:

litellm.BadRequestError: DeepseekException - {"error":{"message":"This response_format type is unavailable now","type":"invalid_request_error","param":null,"code":"invalid_request_error"}}
This issue severely limits the range of models compatible with structured output tasks.

What Changed

The core logic has been refactored into a new class, StructuredOutputHandler, adopting a proven strategy implemented by projects like CrewAI (Kudos to the CrewAI team for pioneering this pattern! 👏).

  1. New Class Structure & Schema Injection
    The logic is encapsulated in the StructuredOutputHandler class.

We stop passing the response_format argument to litellm.acompletion.

Instead, the Pydantic Model's schema is parsed, optimized, and injected as a strict System Prompt instruction to guide the LLM's output format (format_messages function).

  1. Robust Post-processing and Retries
    We introduce a Converter class that manages the post-processing workflow:

It attempts to parse the raw text output into the target Pydantic BaseModel.

If parsing fails (due to partial or invalid JSON), the logic attempts to extract the JSON robustly (handle_partial_json).

If necessary, it initiates retries (up to 3 attempts) by calling the LLM again, asking it to fix the improperly formatted output.

  1. Output Normalization
    The successfully validated Pydantic BaseModel instance is converted to a Python dictionary (model.model_dump()) and replaces the original text content in the final response object.

Test Plan

The easiest way to verify this fix is by testing with a non-natively-supported model like Deepseek.

Set the environment variable DEEPSEEK_API_KEY.

Modify the model parameter in examples/quickstart.py from the current default to deepseek/deepseek-chat.

Run the example.

Expected Result: The program should now successfully execute the structured inference task and return a validated Pydantic-based dictionary, without encountering the BadRequestError related to the response_format parameter.


Summary by cubic

Adds a generic structured output path that enforces Pydantic models for LLMs without native response_format support (e.g., Deepseek), and updates the client to fall back to this handler on errors. Structured inference now works across more models without BadRequestError.

  • New Features

    • Introduced StructuredOutputHandler to enforce Pydantic BaseModel outputs via system-prompt schema injection.
    • Normalizes schemas (resolves $refs, converts oneOf→anyOf, marks all properties required) for strict output.
    • Adds robust parsing with partial JSON extraction and up to 3 retries; final response content is a model_dump() dict.
  • Bug Fixes

    • client.py catches litellm.BadRequestError and routes the call through StructuredOutputHandler.
    • Enables structured outputs on deepseek/deepseek-chat and other non-supporting models.

Written for commit 49f7054. Summary will update automatically on new commits.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Prompt for AI agents (all 1 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="stagehand/llm/client.py">

<violation number="1" location="stagehand/llm/client.py:140">
P1: Catching all `BadRequestError` is too broad. This fallback to `StructuredOutputHandler` will fail if the error was unrelated to `response_format` (e.g., invalid model name, rate limits) or if `response_format` wasn&#39;t in the request. Consider checking if `response_format` is present in `filtered_params` before attempting the fallback, or catching a more specific error condition.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

self.metrics_callback(response, inference_time_ms, function_name)

return response
except litellm.BadRequestError as e:
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Catching all BadRequestError is too broad. This fallback to StructuredOutputHandler will fail if the error was unrelated to response_format (e.g., invalid model name, rate limits) or if response_format wasn't in the request. Consider checking if response_format is present in filtered_params before attempting the fallback, or catching a more specific error condition.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At stagehand/llm/client.py, line 140:

<comment>Catching all `BadRequestError` is too broad. This fallback to `StructuredOutputHandler` will fail if the error was unrelated to `response_format` (e.g., invalid model name, rate limits) or if `response_format` wasn&#39;t in the request. Consider checking if `response_format` is present in `filtered_params` before attempting the fallback, or catching a more specific error condition.</comment>

<file context>
@@ -134,6 +137,13 @@ async def create_response(
                 self.metrics_callback(response, inference_time_ms, function_name)
 
             return response
+        except litellm.BadRequestError as e:
+            handler = StructuredOutputHandler(litellm)
+            response = await handler.handle_structured_inference(**filtered_params)
</file context>
Fix with Cubic

@ZhengKai91
Copy link
Author

@filip-michalsky Would you be able to review this PR? It’d be great to get your feedback before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant