diff --git a/README.md b/README.md index 2483c17c0..3e5b92e44 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,7 @@ For full documentation, guides, and API references, see the official [OpenAI Age **NOTE:** For a version that does not use the OpenAI Agents SDK, see the [branch without-agents-sdk](https://github.com/openai/openai-realtime-agents/tree/without-agents-sdk). There are two main patterns demonstrated: -1. **Chat-Supervisor:** A realtime-based chat agent interacts with the user and handles basic tasks, while a more intelligent, text-based supervisor model (e.g., `gpt-4.1`) is used extensively for tool calls and more complex responses. This approach provides an easy onramp and high-quality answers, with a small increase in latency. +1. **Chat-Supervisor:** A realtime-based chat agent interacts with the user and handles basic tasks, while a more intelligent, text-based supervisor model (e.g., `gpt-5.2`) is used extensively for tool calls and more complex responses. This approach provides an easy onramp and high-quality answers, with a small increase in latency. 2. **Sequential Handoff:** Specialized agents (powered by realtime api) transfer the user between them to handle specific user intents. This is great for customer service, where user intents can be handled sequentially by specialist models that excel in a specific domains. This helps avoid the model having all instructions and tools in a single agent, which can degrade performance. ## Setup @@ -29,7 +29,7 @@ There are two main patterns demonstrated: # Agentic Pattern 1: Chat-Supervisor -This is demonstrated in the [chatSupervisor](src/app/agentConfigs/chatSupervisor/index.ts) Agent Config. The chat agent uses the realtime model to converse with the user and handle basic tasks, like greeting the user, casual conversation, and collecting information, and a more intelligent, text-based supervisor model (e.g. `gpt-4.1`) is used extensively to handle tool calls and more challenging responses. You can control the decision boundary by "opting in" specific tasks to the chat agent as desired. +This is demonstrated in the [chatSupervisor](src/app/agentConfigs/chatSupervisor/index.ts) Agent Config. The chat agent uses the realtime model to converse with the user and handle basic tasks, like greeting the user, casual conversation, and collecting information, and a more intelligent, text-based supervisor model (e.g. `gpt-5.2`) is used extensively to handle tool calls and more challenging responses. You can control the decision boundary by "opting in" specific tasks to the chat agent as desired. Video walkthrough: [https://x.com/noahmacca/status/1927014156152058075](https://x.com/noahmacca/status/1927014156152058075) @@ -42,7 +42,7 @@ Video walkthrough: [https://x.com/noahmacca/status/1927014156152058075](https:// sequenceDiagram participant User participant ChatAgent as Chat Agent
(gpt-4o-realtime-mini) - participant Supervisor as Supervisor Agent
(gpt-4.1) + participant Supervisor as Supervisor Agent
(gpt-5.2) participant Tool as Tool alt Basic chat or info collection @@ -64,7 +64,7 @@ sequenceDiagram ## Benefits - **Simpler onboarding.** If you already have a performant text-based chat agent, you can give that same prompt and set of tools to the supervisor agent, and make some tweaks to the chat agent prompt, you'll have a natural voice agent that will perform on par with your text agent. - **Simple ramp to a full realtime agent**: Rather than switching your whole agent to the realtime api, you can move one task at a time, taking time to validate and build trust for each before deploying to production. -- **High intelligence**: You benefit from the high intelligence, excellent tool calling and instruction following of models like `gpt-4.1` in your voice agents. +- **High intelligence**: You benefit from the high intelligence, excellent tool calling and instruction following of models like `gpt-5.2` in your voice agents. - **Lower cost**: If your chat agent is only being used for basic tasks, you can use the realtime-mini model, which, even when combined with GPT-4.1, should be cheaper than using the full 4o-realtime model. - **User experience**: It's a more natural conversational experience than using a stitched model architecture, where response latency is often 1.5s or longer after a user has finished speaking. In this architecture, the model responds to the user right away, even if it has to lean on the supervisor agent. - However, more assistant responses will start with "Let me think", rather than responding immediately with the full response. diff --git a/src/app/agentConfigs/chatSupervisor/supervisorAgent.ts b/src/app/agentConfigs/chatSupervisor/supervisorAgent.ts index 3a3c2df43..0d560a7e0 100644 --- a/src/app/agentConfigs/chatSupervisor/supervisorAgent.ts +++ b/src/app/agentConfigs/chatSupervisor/supervisorAgent.ts @@ -31,6 +31,7 @@ You are a helpful customer service agent working for NewTelco, helping a user ef - The message is for a voice conversation, so be very concise, use prose, and never create bulleted lists. Prioritize brevity and clarity over completeness. - Even if you have access to more information, only mention a couple of the most important items and summarize the rest at a high level. - Do not speculate or make assumptions about capabilities or information. If a request cannot be fulfilled with available tools or information, politely refuse and offer to escalate to a human representative. +- If the request is ambiguous or underspecified, ask one concise clarifying question before proceeding. - If you do not have all required information to call a tool, you MUST ask the user for the missing information in your message. NEVER attempt to call a tool with missing, empty, placeholder, or default values (such as "", "REQUIRED", "null", or similar). Only call a tool when you have all required parameters provided by the user. - Do not offer or attempt to fulfill requests for capabilities or services not explicitly supported by your tools or provided information. - Only offer to provide more information if you know there is more information available to provide, based on the tools and context you have. @@ -282,7 +283,7 @@ export const getNextResponseFromSupervisor = tool({ const filteredLogs = history.filter((log) => log.type === 'message'); const body: any = { - model: 'gpt-4.1', + model: 'gpt-5.2', input: [ { type: 'message', @@ -316,4 +317,4 @@ export const getNextResponseFromSupervisor = tool({ return { nextResponse: finalText as string }; }, }); - \ No newline at end of file +