Introduce Generic Structured Output Handling (CrewAI Inspired) #244
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces a robust mechanism to handle Structured Outputs (Pydantic BaseModel enforcement) for LLMs that do not natively support the response_format parameter (e.g., Deepseek Chat, certain open-source models).
Why
Currently, when attempting structured output using the response_format argument (in client.py)with models that don't support it (like deepseek/deepseek-chat), the call to litellm.acompletion fails with a litellm.BadRequestError:
litellm.BadRequestError: DeepseekException - {"error":{"message":"This response_format type is unavailable now","type":"invalid_request_error","param":null,"code":"invalid_request_error"}}
This issue severely limits the range of models compatible with structured output tasks.
What Changed
The core logic has been refactored into a new class, StructuredOutputHandler, adopting a proven strategy implemented by projects like CrewAI (Kudos to the CrewAI team for pioneering this pattern! 👏).
The logic is encapsulated in the StructuredOutputHandler class.
We stop passing the response_format argument to litellm.acompletion.
Instead, the Pydantic Model's schema is parsed, optimized, and injected as a strict System Prompt instruction to guide the LLM's output format (format_messages function).
We introduce a Converter class that manages the post-processing workflow:
It attempts to parse the raw text output into the target Pydantic BaseModel.
If parsing fails (due to partial or invalid JSON), the logic attempts to extract the JSON robustly (handle_partial_json).
If necessary, it initiates retries (up to 3 attempts) by calling the LLM again, asking it to fix the improperly formatted output.
The successfully validated Pydantic BaseModel instance is converted to a Python dictionary (model.model_dump()) and replaces the original text content in the final response object.
Test Plan
The easiest way to verify this fix is by testing with a non-natively-supported model like Deepseek.
Set the environment variable DEEPSEEK_API_KEY.
Modify the model parameter in examples/quickstart.py from the current default to deepseek/deepseek-chat.
Run the example.
Expected Result: The program should now successfully execute the structured inference task and return a validated Pydantic-based dictionary, without encountering the BadRequestError related to the response_format parameter.
Summary by cubic
Adds a generic structured output path that enforces Pydantic models for LLMs without native response_format support (e.g., Deepseek), and updates the client to fall back to this handler on errors. Structured inference now works across more models without BadRequestError.
New Features
Bug Fixes
Written for commit 49f7054. Summary will update automatically on new commits.