-
Notifications
You must be signed in to change notification settings - Fork 3
[WIP] Multi-modal Optimizer + Context for Optimization #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: experimental
Are you sure you want to change the base?
Conversation
…h`'s mock test to expect different kind of input
…into features/multimodal_opt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements multi-modal support for optimizers and introduces a context section to provide additional information during optimization. The changes enable image input handling, context passing, and improved structure for optimization prompts.
Key changes include:
- Multi-modal payload support for handling images alongside text queries
- Context section implementation for passing additional optimization context
- Optimizer API enhancements to support image and context inputs
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit_tests/test_priority_search.py | Added multi-modal message handling for test compatibility |
| opto/optimizers/utils.py | Added image encoding utility for base64 conversion |
| opto/optimizers/optoprime_v2.py | Main multi-modal and context implementation with API changes |
| opto/optimizers/opro_v2.py | Extended OPRO optimizer with context support |
| opto/features/flows/types.py | Added multi-modal payload types and query normalization |
| opto/features/flows/compose.py | Updated TracedLLM to handle multi-modal payloads |
| docs/tutorials/minibatch.ipynb | Updated escape sequences in notebook output |
| .github/workflows/ci.yml | Commented out optimizer test suite |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
TODO:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@copilot open a new pull request to apply changes based on the comments in this thread |
|
@allenanie I've opened a new pull request, #54, to work on those changes. Once the pull request is ready, I'll request review from you. |
[WIP] Add multi-modal optimizer and context support
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 14 comments.
Comments suppressed due to low confidence (1)
opto/optimizers/optoprime_v2.py:236
- Call to method OptoPrime.extract_llm_suggestion with too few arguments; should be no fewer than 2.
return OptoPrime.extract_llm_suggestion(response)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… pillow. Add history manager.
|
TODO:
|
…age/text insertion. TODO: need to re-divide system prompt and user prompt because multi-modal content cannot go into system prompt.
TODO: documentation, writing test cases.
|
Documenting some decisions here: OpenAI released ResponseAPI and announced migration and retirement/sunset of CompletionAPI. This triggered changes in the broad industry. LiteLLM introduced a beta version of ResponseAPI. It's unclear if Google/Anthropic will follow. Although LiteLLM's responseAPI is usable, it's support for multi-modality (image generation) is quite poor (at least for Gemini). We are making Gemini first-party support for OpenTrace going forward, therefore, for this PR, we are staying with LiteLLM's completionAPI, with the option to upgrade to ResponseAPI in the future. |
…ause these meanings are shifting (the "premium" model of 2025 will be the "cheap" model of 2027, which causes confusion and unreliability for the users).
…(automatically generated to increase coverage)
|
To increase backward compatibility, the When
When
For any Google models (starts with Even with this small change, a lot of details were handled:
In addition to This is not strictly necessary, but this helps us simplify Optimizer's design since it no longer needs to interact with raw LLM API response object. |
… Gemini-compatible history.
|
Multiturn conversation is tested. See test We store conversation history as structured data in
|
|
So far, all supporting functions for multi-modal capabilities are finished: Tests are finished: Remaining todos:
|
a62c202 to
c171201
Compare
ef542aa to
c0a0282
Compare
Adding multi-modal support. Also introducing a context section.
For context, the design intention is that if the user provides context, it will appear in the user message; if no context is provided, the section will not be there.