Gemini 3 Pro support and cross-model conversation compatibility #2158

ihower · 2025-12-07T06:22:17Z

Resolves:

Gemini 3 is not supported via LiteLLM #2109 (LiteLLM integration)
Missing thought_signature when using Gemini 3 pro preview #2137 (ChatCompletions endpoint)

Summary

This PR does two main things:

Adds support for Gemini 3 Pro thought_signatures in function calling.
Enables cross-model conversations (OpenAI ↔ Gemini, Gemini ↔ Claude, etc.) by storing provider-specific metadata on raw items and passing the target model name into the conversion helpers.

The goal is to make different providers interoperable: allowing them to safely share the same to_input_list() items, while each provider only receives the metadata it understands.

Examples

Besides unit tests, I performed live tests for all the following scenarios:

LiteLLM + Gemini
- LiteLLM Gemini (non-streaming)
- LiteLLM Gemini (streaming)
Gemini ChatCompletions (OpenAI-compatible endpoint)
- ChatCompletions Gemini (non-streaming)
- ChatCompletions Gemini (streaming)
Cross-model conversations (same raw items handled by different models)
Handoffs (disabled nest_handoff_history)
- Handoff GPT-5 → Gemini
- Handoff Gemini → GPT-5

1. Gemini 3 Pro function calling (`thought_signatures`)

Gemini 3 Pro now requires a thought_signature attached to function call in the same turn.
Docs: https://ai.google.dev/gemini-api/docs/thought-signatures

This PR supports both integration paths with non-streaming and streaming modes:

LiteLLM integration (requires upgrading LiteLLM to version 1.80.5 or later)
Google’s Gemini OpenAI-compatible API endpoint

The conversation flow is: LiteLLM ↔ ChatCompletions ↔ our raw items.

LiteLLM layer

LiteLLM places Gemini’s thought_signature inside provider_specific_fields.

This PR handles the conversion between:

LiteLLM’s provider_specific_fields["thought_signature"]
↔
Google ChatCompletions format extra_content={"google": {"thought_signature": ...}}

ChatCompletions layer

This PR handles the conversion between:

Google ChatCompletions format extra_content={"google": {"thought_signature": ...}}
↔
our raw item’s internal new field provider_data["thought_signature"]

Cleaning up LiteLLM’s `thought` suffix

LiteLLM adds a __thought__ suffix to Gemini tool call ids (see:
BerriAI/litellm#16895). This suffix is not needed since we
have thought_signature and it causes call_id validation problems when the items are passed to other models.

Therefore, this PR removes it.

2. Enables cross-model conversations

To support cross-model conversations, this PR introduces a new provider_data field
on raw response items. This field holds metadata not compatible with the OpenAI Responses API, allowing us to:

Identify which provider produced each item.
Decide which fields are safe to send to other providers.
Keep OpenAI Responses API payloads clean and compatible.

For non–OpenAI Responses API models, we now store this into raw item:

provider_data = {
    "model": ...,
    "response_id": ...,  # Previously discarded when using non-Resposnes API; now preserved for inspection and debugging.
    # other provider-specific metadata, e.g. Gemini's "thought_signature": ...
}

This design is like PydanticAI, which uses a similar structure. The difference: PydanticAI stores metadata for all models,
whereas this PR stores provider_data only for non-OpenAI providers.

With provider_data and the model name passed into the converters, agents can now safely switch models while reusing the same raw items from to_input_list(). This includes:

Gemini ↔ OpenAI
Claude ↔ OpenAI
Gemini ↔ Claude

It also works with handoffs when nest_handoff_history=False.

Implementation Details

Because items in a conversation can come from different providers, and each provider has different requirements, this PR passes the target model name into several conversion helpers:

Converter.items_to_messages(..., model=...)
LitellmConverter.convert_message_to_openai(..., model=...)
ChatCmplStreamHandler.handle_stream(..., model=...)
Converter.message_to_output_items(..., provider_data=...)

This lets us branch on behavior for different providers in a controlled way and avoid regressions by handling provider-specific cases. This is especially important for reasoning models, where each provider handles encrypted tokens differently.

There are libraries like PydanticAI and LangChain define their own internal standard formats to enable cross-model conversations:

By contrast, LiteLLM has not fully abstracted away these differences. It focuses on making each model call work with provider-specific workarounds, without defining a normalized history format for cross-model conversations. Therefore, we need explicit model-aware at this layer to make cross-model possible.

For example, when we store Claude's thinking_blocks signature inside our reasoning item's encrypted_content field, we also need to know that it came from a Claude model. Otherwise, we will send this Claude-only encrypted content to another provider, which cannot safely interpret it.

The guiding principle in this PR is to treat OpenAI Responses API items as the baseline format, and use provider_data to extend them with provider-specific metadata when needed.

For OpenAI Responses API:

When sending items to the OpenAI Responses API, we must not send provider-specific metadata or fake ids.
This PR adds: OpenAIResponsesModel._remove_openai_responses_api_incompatible_fields(...)

Quickly returns the input unchanged if no item has provider_data.
Otherwise, processes items:
- Removes id when it equals FAKE_RESPONSES_ID.
- Drops reasoning items that have provider_data (these are provider-specific).
- Removes the provider_data field from all items.

This keeps the payload clean and compatible with the Responses API, even if the items previously flowed through non-OpenAI providers.

Design notes: reasoning items vs provider_data

This PR does not introduce a separate reasoning item (e.g. Claude thinking_blocks does) for Gemini function call's thought_signatures. Instead it stores the signatures in provider_data on the function call item.

The main reasons:

Gemini thought signatures are function-call-bound and have no summary text. They belong strictly to the function call. Turning them into a separate reasoning item would add complexity without any benefit.
Keeping the signature directly on the function call: Matches how Gemini emits its metadata and Keeps the data model simple.

This design is again similar to PydanticAI’s approach and also mirrors the underlying Gemini parts structure: signatures are attached to the parts they describe instead of creating an extra reasoning item with no text.

I also study at the Gemini API raw format, there are four raw part structure with thought_signature:

functionCall: {...} with thought_signature: "xxx" → handled in this PR: keep the thought_signature with the function call.
text: "...." with thought_signature: "xxx" → could attach to the output item (no extra reasoning item needed).
text: "" with thought_signature: "xxx" → (empty text) this is the case where a standalone reasoning item makes sense.
text: "summary..." with thought: true → (this is thinking summary) this is another case where a standalone reasoning item make sense.

This PR implements case (1), which is sufficient for Gemini’s current function calling requirement.
Other cases can be added later if needed.

This PR should have no side effects on projects that only use the OpenAI Responses API, and I believe it establishes a better groundwork for handling various provider-specific cases.

- Bump litellm dependency to >= 1.80.7 for Gemini thought signatures support - Add Gemini 3 Pro thought_signature support for function calling - Handle both LiteLLM provider_specific_fields and Gemini extra_content formats - Clean up __thought__ suffix on tool call ids for Gemini models - Attach provider_data to all non-Responses output items - Store model, response_id and provider specific metadata - Store Gemini thought_signature on function call items - Use provider_data.model to decide what data is safe to send per provider - Keep handoff transcripts stable by hiding provider_data in history output

seratch · 2025-12-08T01:52:40Z

Thanks for sending this PR! Overall, the design is clean and the code looks good to go. If anyone could try this branch out and share early feedback before releasing it, it would be greatly appreciated.

ihower mentioned this pull request Dec 7, 2025

Missing thought_signature when using Gemini 3 pro preview #2137

Open

chore: bump litellm to 1.80.8

b69ac28

seratch added enhancement New feature or request feature:chat-completions feature:lite-llm labels Dec 8, 2025

seratch added this to the 0.7.x milestone Dec 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gemini 3 Pro support and cross-model conversation compatibility #2158

Gemini 3 Pro support and cross-model conversation compatibility #2158

ihower commented Dec 7, 2025 •

edited

Loading

Uh oh!

seratch commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gemini 3 Pro support and cross-model conversation compatibility #2158

Are you sure you want to change the base?

Gemini 3 Pro support and cross-model conversation compatibility #2158

Conversation

ihower commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Examples

1. Gemini 3 Pro function calling (thought_signatures)

LiteLLM layer

ChatCompletions layer

Cleaning up LiteLLM’s __thought__ suffix

2. Enables cross-model conversations

Implementation Details

Design notes: reasoning items vs provider_data

Uh oh!

seratch commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ihower commented Dec 7, 2025 •

edited

Loading

1. Gemini 3 Pro function calling (`thought_signatures`)

Cleaning up LiteLLM’s `thought` suffix