Gemini 3 Pro support and cross-model conversation compatibility #2158
+1,100
−181
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves:
Summary
This PR does two main things:
thought_signaturesin function calling.The goal is to make different providers interoperable: allowing them to safely share the same
to_input_list()items, while each provider only receives the metadata it understands.Examples
Besides unit tests, I performed live tests for all the following scenarios:
LiteLLM + Gemini
Gemini ChatCompletions (OpenAI-compatible endpoint)
Cross-model conversations (same raw items handled by different models)
Handoffs (disabled
nest_handoff_history)1. Gemini 3 Pro function calling (
thought_signatures)Gemini 3 Pro now requires a
thought_signatureattached to function call in the same turn.Docs: https://ai.google.dev/gemini-api/docs/thought-signatures
This PR supports both integration paths with non-streaming and streaming modes:
The conversation flow is: LiteLLM ↔ ChatCompletions ↔ our raw items.
LiteLLM layer
LiteLLM places Gemini’s
thought_signatureinsideprovider_specific_fields.This PR handles the conversion between:
LiteLLM’s
provider_specific_fields["thought_signature"]↔
Google ChatCompletions format
extra_content={"google": {"thought_signature": ...}}ChatCompletions layer
This PR handles the conversion between:
Google ChatCompletions format
extra_content={"google": {"thought_signature": ...}}↔
our raw item’s internal new field
provider_data["thought_signature"]Cleaning up LiteLLM’s
__thought__suffixLiteLLM adds a
__thought__suffix to Gemini tool call ids (see:BerriAI/litellm#16895). This suffix is not needed since we
have
thought_signatureand it causes call_id validation problems when the items are passed to other models.Therefore, this PR removes it.
2. Enables cross-model conversations
To support cross-model conversations, this PR introduces a new
provider_datafieldon raw response items. This field holds metadata not compatible with the OpenAI Responses API, allowing us to:
For non–OpenAI Responses API models, we now store this into raw item:
This design is like PydanticAI, which uses a similar structure. The difference: PydanticAI stores metadata for all models,
whereas this PR stores
provider_dataonly for non-OpenAI providers.With
provider_dataand the model name passed into the converters, agents can now safely switch models while reusing the same raw items fromto_input_list(). This includes:It also works with handoffs when
nest_handoff_history=False.Implementation Details
Because items in a conversation can come from different providers, and each provider has different requirements, this PR passes the target model name into several conversion helpers:
Converter.items_to_messages(..., model=...)LitellmConverter.convert_message_to_openai(..., model=...)ChatCmplStreamHandler.handle_stream(..., model=...)Converter.message_to_output_items(..., provider_data=...)This lets us branch on behavior for different providers in a controlled way and avoid regressions by handling provider-specific cases. This is especially important for reasoning models, where each provider handles encrypted tokens differently.
There are libraries like PydanticAI and LangChain define their own internal standard formats to enable cross-model conversations:
By contrast, LiteLLM has not fully abstracted away these differences. It focuses on making each model call work with provider-specific workarounds, without defining a normalized history format for cross-model conversations. Therefore, we need explicit model-aware at this layer to make cross-model possible.
For example, when we store Claude's
thinking_blockssignature inside our reasoning item'sencrypted_contentfield, we also need to know that it came from a Claude model. Otherwise, we will send this Claude-only encrypted content to another provider, which cannot safely interpret it.The guiding principle in this PR is to treat OpenAI Responses API items as the baseline format, and use
provider_datato extend them with provider-specific metadata when needed.For OpenAI Responses API:
When sending items to the OpenAI Responses API, we must not send provider-specific metadata or fake ids.
This PR adds:
OpenAIResponsesModel._remove_openai_responses_api_incompatible_fields(...)provider_data.idwhen it equalsFAKE_RESPONSES_ID.provider_data(these are provider-specific).provider_datafield from all items.This keeps the payload clean and compatible with the Responses API, even if the items previously flowed through non-OpenAI providers.
Design notes: reasoning items vs provider_data
This PR does not introduce a separate reasoning item (e.g. Claude thinking_blocks does) for Gemini function call's
thought_signatures. Instead it stores the signatures inprovider_dataon the function call item.The main reasons:
This design is again similar to PydanticAI’s approach and also mirrors the underlying Gemini parts structure: signatures are attached to the parts they describe instead of creating an extra reasoning item with no text.
I also study at the Gemini API raw format, there are four raw part structure with thought_signature:
functionCall: {...}withthought_signature: "xxx"→ handled in this PR: keep the thought_signature with the function call.text: "...."withthought_signature: "xxx"→ could attach to the output item (no extra reasoning item needed).text: ""withthought_signature: "xxx"→ (empty text) this is the case where a standalone reasoning item makes sense.text: "summary..."withthought: true→ (this is thinking summary) this is another case where a standalone reasoning item make sense.This PR implements case (1), which is sufficient for Gemini’s current function calling requirement.
Other cases can be added later if needed.
This PR should have no side effects on projects that only use the OpenAI Responses API, and I believe it establishes a better groundwork for handling various provider-specific cases.