[WIP] Multi-modal Optimizer + Context for Optimization #50

allenanie · 2025-10-03T19:43:45Z

Adding multi-modal support. Also introducing a context section.

For context, the design intention is that if the user provides context, it will appear in the user message; if no context is provided, the section will not be there.

…h`'s mock test to expect different kind of input

…into features/multimodal_opt

Copilot

Pull Request Overview

This PR implements multi-modal support for optimizers and introduces a context section to provide additional information during optimization. The changes enable image input handling, context passing, and improved structure for optimization prompts.

Key changes include:

Multi-modal payload support for handling images alongside text queries
Context section implementation for passing additional optimization context
Optimizer API enhancements to support image and context inputs

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/unit_tests/test_priority_search.py	Added multi-modal message handling for test compatibility
opto/optimizers/utils.py	Added image encoding utility for base64 conversion
opto/optimizers/optoprime_v2.py	Main multi-modal and context implementation with API changes
opto/optimizers/opro_v2.py	Extended OPRO optimizer with context support
opto/features/flows/types.py	Added multi-modal payload types and query normalization
opto/features/flows/compose.py	Updated TracedLLM to handle multi-modal payloads
docs/tutorials/minibatch.ipynb	Updated escape sequences in notebook output
.github/workflows/ci.yml	Commented out optimizer test suite

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

opto/optimizers/optoprime_v2.py

opto/optimizers/opro_v2.py

opto/optimizers/optoprime_v2.py

allenanie · 2025-10-22T23:09:19Z

TODO:

~~Support Image URL directly loading (Adith)~~
~~Add support for in-memory image (like RGB/numpy) (Ching-An)~~
Node supports image (multi-modal) -- when we traverse it, we add the image as payload into the optimizer. Add a function to node to determine if it's image.

…cal file.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

allenanie · 2025-11-10T22:17:26Z

@copilot open a new pull request to apply changes based on the comments in this thread

Copilot · 2025-11-10T22:17:35Z

@allenanie I've opened a new pull request, #54, to work on those changes. Once the pull request is ready, I'll request review from you.

[WIP] Add multi-modal optimizer and context support

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 14 comments.

Comments suppressed due to low confidence (1)

opto/optimizers/optoprime_v2.py:236

Call to method OptoPrime.extract_llm_suggestion with too few arguments; should be no fewer than 2.

        return OptoPrime.extract_llm_suggestion(response)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

opto/features/flows/types.py

opto/optimizers/optoprime_v2.py

opto/optimizers/opro_v2.py

opto/optimizers/optoprime_v2.py

opto/features/flows/types.py

opto/features/flows/compose.py

opto/optimizers/optoprime_v2.py

opto/features/flows/types.py

opto/optimizers/utils.py

… pillow. Add history manager.

allenanie · 2025-12-12T17:26:18Z

TODO:

Add API usage / common usage pattern (documentation)
~~Image Context API broken, need to migrate to the current backbone~~
~~feedback as image~~ (basically ready)
Optimizer extraction of image output (replace parameter node if it's an image)

…ental

…age/text insertion. TODO: need to re-divide system prompt and user prompt because multi-modal content cannot go into system prompt.

TODO: documentation, writing test cases.

allenanie · 2025-12-19T16:40:17Z

Documenting some decisions here:

OpenAI released ResponseAPI and announced migration and retirement/sunset of CompletionAPI. This triggered changes in the broad industry. LiteLLM introduced a beta version of ResponseAPI. It's unclear if Google/Anthropic will follow.

Although LiteLLM's responseAPI is usable, it's support for multi-modality (image generation) is quite poor (at least for Gemini). We are making Gemini first-party support for OpenTrace going forward, therefore, for this PR, we are staying with LiteLLM's completionAPI, with the option to upgrade to ResponseAPI in the future.

…ause these meanings are shifting (the "premium" model of 2025 will be the "cheap" model of 2027, which causes confusion and unreliability for the users).

…(automatically generated to increase coverage)

allenanie · 2025-12-27T05:34:54Z

To increase backward compatibility, the llm.py is designed in the following way:
(mm_beta means multi-modal beta version):

When mm_beta (multimodal) is enabled, we either use:

LiteLLM's response API (most compatible with OpenAI models, but also can work with others)

When mm_beta is disabled, for backward compatibility, we use:

LiteLLM's completion API (default)

For any Google models (starts with gemini in the model name), we use:
Google's generate_content API (LiteLLM's Gemini support is insufficient for our use case)

Even with this small change, a lot of details were handled:

OpenAI returns base64 string. Google GenAI library returns bytes.
Google GenAI library expects system_instruction explicitly passed in. OpenAI uses message role role="system".
(and other small quality of life updates)

In addition to llm.py changes, we update the AssistantTurn construction. Now it can take a raw response from the LLM API call and directly map the returned result into our class construct.

This is not strictly necessary, but this helps us simplify Optimizer's design since it no longer needs to interact with raw LLM API response object.

… Gemini-compatible history.

allenanie · 2025-12-27T07:07:14Z

Multiturn conversation is tested.

See test test_real_google_genai_multi_turn_with_images_updated in test_optimizer_backbone.py

We store conversation history as structured data in AssistantTurn and UserTurn. They are added to ConversationHistory object. When we need to pass them back into the LLM API call, we call history.to_messages() to automatically get the input, or explicitly call history.to_gemini_format() or history.to_litellm_format().

to_messages() will do a smart check to see the model used by the last AssistantTurn and automatically determine which format function is use. However, this is not 100% reliable (for example, if CustomLLM backend is used but a Gemini model is called, then this automatic conversion will fail because CustomLLM expects an OpenAI-compatible server).

allenanie · 2025-12-27T07:11:06Z

So far, all supporting functions for multi-modal capabilities are finished:

Tests are finished:

Remaining todos:

Integrate this into the optimizer class (i.e., support image parameter extraction)
Write a notebook to demonstrate usage of the backbone as well as new optimizer.

refactor and moved gemini input message history conversion to `ConversationHistory`

allenanie added 11 commits October 3, 2025 15:40

initial changes

8961b02

make context optional

949d8ff

finish adding image support to optoprime_v2

e36dd7c

Finish updating OPRO to accept additional context

7dcb880

add context prompt into pickle save/load. Modify `test_priority_searc…

8a31e8b

…h`'s mock test to expect different kind of input

merge experimental branch updates

f8a0958

comment out the small-LLM test

c271d9c

Merge branch 'experimental' of https://github.com/AgentOpt/OpenTrace …

709a08a

…into features/multimodal_opt

add multi-modal support for the LLM model as well

148d4bc

update the image-context prompt on optimizer. Make LLM module better.

b340d91

fix a bug on QueryModel not handling Node as input

acae873

allenanie requested a review from Copilot October 13, 2025 17:48

Copilot AI reviewed Oct 13, 2025

View reviewed changes

chinganc self-assigned this Oct 13, 2025

allenanie and others added 3 commits November 10, 2025 17:11

add three types of image loading: from numpy array, from url, from lo…

1791b15

…cal file.

Merge branch 'experimental' into features/multimodal_opt

502dd7a

Update opto/optimizers/optoprime_v2.py

51dfc4b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Initial plan

978fdf6

Copilot AI mentioned this pull request Nov 10, 2025

[WIP] Add multi-modal optimizer and context support #54

Merged

allenanie and others added 4 commits November 10, 2025 17:17

Merge pull request #54 from AgentOpt/copilot/sub-pr-50

1fda4d7

[WIP] Add multi-modal optimizer and context support

Update opto/optimizers/optoprime_v2.py

34f9747

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update opto/optimizers/optoprime_v2.py

0118979

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update opto/optimizers/opro_v2.py

a32136e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

allenanie requested a review from Copilot November 10, 2025 22:20

Copilot started reviewing on behalf of allenanie November 10, 2025 22:20 View session

Copilot finished reviewing on behalf of allenanie November 10, 2025 22:23

Copilot AI reviewed Nov 10, 2025

View reviewed changes

AgentOpt deleted a comment from Copilot AI Nov 26, 2025

allenanie added 8 commits November 30, 2025 20:57

fix a few optimizer test issues, update setup dependencies to include…

40dfdab

… pillow. Add history manager.

add protected rounds

85edcdc

add content block

82a590d

intermediate commit

f3668fc

update actual call to ensure API correctness

6bace1c

fix opro_v3 issues

79b9d58

update python dependency

3d3f8a1

update the tests (it was broken before)

240f319

allenanie added 9 commits December 16, 2025 15:21

add new workflow tests conditions to avoid long build time on experim…

a077717

…ental

partial refactoring. Reworked add_context to support interleaved im…

e01ddc8

…age/text insertion. TODO: need to re-divide system prompt and user prompt because multi-modal content cannot go into system prompt.

continued refactoring...

a8a0c3a

Adding prompt template to accommondate multi-modal fill-in

7ff308d

final update to both backbone and adding context.

5ee4860

TODO: documentation, writing test cases.

updating to take feedback as image.

f451250

image as feedback is now roughly correct

1122d8b

remove not needed files

781b7ce

add back minibatch.ipynb

095e7f7

allenanie added 2 commits December 22, 2025 11:39

refactored LLMFactory -- removing "cheap", "premium" descriptions bec…

7221e61

…ause these meanings are shifting (the "premium" model of 2025 will be the "cheap" model of 2027, which causes confusion and unreliability for the users).

Add GoogleGenAILLM backend. Reworked the LLM logic. Added test cases …

8becf51

…(automatically generated to increase coverage)

allenanie added 2 commits December 27, 2025 00:41

fix test errors

8777528

updated google genai llm to turn litellm format history messages into…

2252ccd

… Gemini-compatible history.

amend

c171201

refactor and moved gemini input message history conversion to `ConversationHistory`

allenanie force-pushed the features/multimodal_opt branch from a62c202 to c171201 Compare December 27, 2025 15:27

allenanie force-pushed the experimental branch from ef542aa to c0a0282 Compare December 27, 2025 15:27

[WIP] Multi-modal Optimizer + Context for Optimization #50

Are you sure you want to change the base?

[WIP] Multi-modal Optimizer + Context for Optimization #50

Uh oh!

Conversation

allenanie commented Oct 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

allenanie commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allenanie commented Nov 10, 2025

Uh oh!

Copilot AI commented Nov 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

allenanie commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allenanie commented Dec 19, 2025

Uh oh!

allenanie commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allenanie commented Dec 27, 2025

Uh oh!

allenanie commented Dec 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

allenanie commented Oct 22, 2025 •

edited

Loading

allenanie commented Dec 12, 2025 •

edited

Loading

allenanie commented Dec 27, 2025 •

edited

Loading