Conversation Items
Conversation items represent messages and interactions in your conversation. Each item has:- ID: Unique identifier
- Type:
message,function_call,function_call_output - Role:
user,assistant, ortool - Content: The actual content of the item (array of content parts)
Content Types
Conversation items support different content types depending on direction: Input Content Types (for user messages):input_text- Plain text input from the userinput_audio- Base64-encoded audio input from the user
text- Text output from the assistantaudio- Audio output from the assistant
Creating Conversation Items
Text Messages
Audio Messages
There are two ways to send audio input: Method 1: Streaming Audio (Real-time) Useinput_audio_buffer.append for streaming real-time audio from a microphone:
conversation.item.create with input_audio for pre-recorded audio chunks:
- Streaming (
input_audio_buffer.append): Use for real-time microphone input, voice conversations, live audio streaming - Pre-recorded (
conversation.item.createwithinput_audio): Use for pre-recorded audio files, batch processing, or when you have complete audio chunks ready
Mixed Content
You can combine multiple content types in a single conversation item:Receiving Conversation Items
When items are added to the conversation, you’ll receive events:Retrieving Conversation Items
Retrieve specific conversation items:Deleting Conversation Items
Remove items from the conversation:Function Calling
The Realtime API supports function calling, allowing the assistant to invoke tools you define. Configure functions insession.update and handle function call events.
Defining Functions
Handling Function Calls
Voice Activity Detection
Voice Activity Detection (VAD) automatically detects when speech starts and stops, enabling natural turn-taking in conversations. Configure VAD throughsession.update.
Configuring VAD
VAD Types
semantic_vad: Uses conversational awareness to detect natural speech boundaries. Adjusteagerness(low,medium,high) to control responsiveness.
VAD Events
Error Handling
The Realtime API emitserror events for various failure scenarios. Handle these events to provide robust error recovery and user feedback.
Error Event Structure
Error Types
invalid_request_error: Invalid parameters or malformed requests. Checkerror.paramfor the specific field.server_error: Transient server-side failures. Implement retry logic with exponential backoff.rate_limit_error: Rate limit exceeded. Throttle requests and retry with exponential backoff.
Interruption Handling
Interrupt active responses when new user input arrives.Interrupting Responses
Cancel an in-progress response when the user starts speaking again:interrupt_response: true is set in VAD configuration, the server automatically cancels responses when new speech is detected.
Managing Context
Session Instructions
Update session instructions to guide the conversation:Conversation History
The API automatically maintains conversation history. You can:- Keep full history: Let the conversation grow naturally
- Selective deletion: Remove specific items that aren’t needed
- Session resets: Start a new session when you need a clean context window
Example: Conversation Manager
Here’s a complete example of managing conversations:Best Practices
- Monitor Context Length: Keep track of conversation length to avoid exceeding limits
- Strategic Deletion: Remove old context that’s no longer relevant
- Item Tracking: Maintain a local map of conversation items for quick access
- Error Handling: Handle cases where items might not exist when deleting/retrieving
- Context Management: Use session instructions to guide conversation behavior
Use Cases
- Long Conversations: Delete old context to maintain performance
- Error Recovery: Delete incorrect items and resend
- Context Switching: Clear conversation context when changing topics
- Memory Management: Remove items that are no longer needed