Introduce Generic Structured Output Handling (CrewAI Inspired) #244

ZhengKai91 · 2025-12-07T14:05:40Z

This pull request introduces a robust mechanism to handle Structured Outputs (Pydantic BaseModel enforcement) for LLMs that do not natively support the response_format parameter (e.g., Deepseek Chat, certain open-source models).

Why

Currently, when attempting structured output using the response_format argument （in client.py）with models that don't support it (like deepseek/deepseek-chat), the call to litellm.acompletion fails with a litellm.BadRequestError:

litellm.BadRequestError: DeepseekException - {"error":{"message":"This response_format type is unavailable now","type":"invalid_request_error","param":null,"code":"invalid_request_error"}}
This issue severely limits the range of models compatible with structured output tasks.

What Changed

The core logic has been refactored into a new class, StructuredOutputHandler, adopting a proven strategy implemented by projects like CrewAI (Kudos to the CrewAI team for pioneering this pattern! 👏).

New Class Structure & Schema Injection
The logic is encapsulated in the StructuredOutputHandler class.

We stop passing the response_format argument to litellm.acompletion.

Instead, the Pydantic Model's schema is parsed, optimized, and injected as a strict System Prompt instruction to guide the LLM's output format (format_messages function).

Robust Post-processing and Retries
We introduce a Converter class that manages the post-processing workflow:

It attempts to parse the raw text output into the target Pydantic BaseModel.

If parsing fails (due to partial or invalid JSON), the logic attempts to extract the JSON robustly (handle_partial_json).

If necessary, it initiates retries (up to 3 attempts) by calling the LLM again, asking it to fix the improperly formatted output.

Output Normalization
The successfully validated Pydantic BaseModel instance is converted to a Python dictionary (model.model_dump()) and replaces the original text content in the final response object.

Test Plan

The easiest way to verify this fix is by testing with a non-natively-supported model like Deepseek.

Set the environment variable DEEPSEEK_API_KEY.

Modify the model parameter in examples/quickstart.py from the current default to deepseek/deepseek-chat.

Run the example.

Expected Result: The program should now successfully execute the structured inference task and return a validated Pydantic-based dictionary, without encountering the BadRequestError related to the response_format parameter.

Summary by cubic

Adds a generic structured output path that enforces Pydantic models for LLMs without native response_format support (e.g., Deepseek), and updates the client to fall back to this handler on errors. Structured inference now works across more models without BadRequestError.

New Features
- Introduced StructuredOutputHandler to enforce Pydantic BaseModel outputs via system-prompt schema injection.
- Normalizes schemas (resolves $refs, converts oneOf→anyOf, marks all properties required) for strict output.
- Adds robust parsing with partial JSON extraction and up to 3 retries; final response content is a model_dump() dict.
Bug Fixes
- client.py catches litellm.BadRequestError and routes the call through StructuredOutputHandler.
- Enables structured outputs on deepseek/deepseek-chat and other non-supporting models.

^{Written for commit 49f7054. Summary will update automatically on new commits.}

cubic-dev-ai

1 issue found across 2 files

Prompt for AI agents (all 1 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="stagehand/llm/client.py">

<violation number="1" location="stagehand/llm/client.py:140">
P1: Catching all `BadRequestError` is too broad. This fallback to `StructuredOutputHandler` will fail if the error was unrelated to `response_format` (e.g., invalid model name, rate limits) or if `response_format` wasn&#39;t in the request. Consider checking if `response_format` is present in `filtered_params` before attempting the fallback, or catching a more specific error condition.</violation>
</file>

Since this is your first cubic review, here's how it works:

cubic automatically reviews your code and comments on bugs and improvements
Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
Ask questions if you need clarification on any suggestion

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

cubic-dev-ai · 2025-12-07T14:11:41Z

stagehand/llm/client.py

                self.metrics_callback(response, inference_time_ms, function_name)

            return response
+        except litellm.BadRequestError as e:


P1: Catching all BadRequestError is too broad. This fallback to StructuredOutputHandler will fail if the error was unrelated to response_format (e.g., invalid model name, rate limits) or if response_format wasn't in the request. Consider checking if response_format is present in filtered_params before attempting the fallback, or catching a more specific error condition.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At stagehand/llm/client.py, line 140: <comment>Catching all `BadRequestError` is too broad. This fallback to `StructuredOutputHandler` will fail if the error was unrelated to `response_format` (e.g., invalid model name, rate limits) or if `response_format` wasn't in the request. Consider checking if `response_format` is present in `filtered_params` before attempting the fallback, or catching a more specific error condition.</comment> <file context> @@ -134,6 +137,13 @@ async def create_response( self.metrics_callback(response, inference_time_ms, function_name) return response + except litellm.BadRequestError as e: + handler = StructuredOutputHandler(litellm) + response = await handler.handle_structured_inference(**filtered_params) </file context>

ZhengKai91 · 2025-12-08T07:46:15Z

@filip-michalsky Would you be able to review this PR? It’d be great to get your feedback before merging.

fix: support deepseek model

49f7054

cubic-dev-ai bot reviewed Dec 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce Generic Structured Output Handling (CrewAI Inspired) #244

Introduce Generic Structured Output Handling (CrewAI Inspired) #244

Uh oh!

ZhengKai91 commented Dec 7, 2025 •

edited by cubic-dev-ai bot

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Dec 7, 2025 •

edited

Loading

Uh oh!

ZhengKai91 commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Introduce Generic Structured Output Handling (CrewAI Inspired) #244

Are you sure you want to change the base?

Introduce Generic Structured Output Handling (CrewAI Inspired) #244

Uh oh!

Conversation

ZhengKai91 commented Dec 7, 2025 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What Changed

Test Plan

Summary by cubic

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZhengKai91 commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZhengKai91 commented Dec 7, 2025 •

edited by cubic-dev-ai bot

Loading

cubic-dev-ai bot Dec 7, 2025 •

edited

Loading