> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Managing Conversations

> Learn how to manage conversation history and context with the Realtime API

## Conversation Items

Conversation items represent messages and interactions in your conversation. Each item has:

* **ID**: Unique identifier
* **Type**: `message`, `function_call`, `function_call_output`
* **Role**: `user`, `assistant`, or `tool`
* **Content**: The actual content of the item (array of content parts)

### Content Types

Conversation items support different content types depending on direction:

**Input Content Types** (for user messages):

* `input_text` - Plain text input from the user
* `input_audio` - Base64-encoded audio input from the user

**Output Content Types** (for assistant responses):

* `text` - Text output from the assistant
* `audio` - Audio output from the assistant

You can mix multiple content parts in a single conversation item. For example, you can combine text and audio in the same message.

## Creating Conversation Items

### Text Messages

```javascript theme={"system"}
ws.send(JSON.stringify({
  type: 'conversation.item.create',
  item: {
    type: 'message',
    role: 'user',
    content: [
      {
        type: 'input_text',
        text: 'Hello, how are you?'
      }
    ]
  }
}));
```

### Audio Messages

There are two ways to send audio input:

**Method 1: Streaming Audio (Real-time)**
Use `input_audio_buffer.append` for streaming real-time audio from a microphone:

```javascript theme={"system"}
// Stream audio chunks in real-time
ws.send(JSON.stringify({
  type: 'input_audio_buffer.append',
  audio: base64AudioData
}));
// VAD automatically detects speech boundaries and commits the buffer
```

**Method 2: Pre-recorded Audio Chunks**
Use `conversation.item.create` with `input_audio` for pre-recorded audio chunks:

```javascript theme={"system"}
ws.send(JSON.stringify({
  type: 'conversation.item.create',
  item: {
    type: 'message',
    role: 'user',
    content: [{
      type: 'input_audio',
      audio: base64AudioData  // Base64-encoded PCM16 or OPUS audio
    }]
  }
}));
```

**When to use each method:**

* **Streaming (`input_audio_buffer.append`)**: Use for real-time microphone input, voice conversations, live audio streaming
* **Pre-recorded (`conversation.item.create` with `input_audio`)**: Use for pre-recorded audio files, batch processing, or when you have complete audio chunks ready

### Mixed Content

You can combine multiple content types in a single conversation item:

```javascript theme={"system"}
ws.send(JSON.stringify({
  type: 'conversation.item.create',
  item: {
    type: 'message',
    role: 'user',
    content: [
      {
        type: 'input_text',
        text: 'Here is some context about the audio:'
      },
      {
        type: 'input_audio',
        audio: base64AudioData
      },
      {
        type: 'input_text',
        text: 'And here is additional context.'
      }
    ]
  }
}));
```

## Receiving Conversation Items

When items are added to the conversation, you'll receive events:

```javascript theme={"system"}
ws.on('message', (data) => {
  const event = JSON.parse(data);
  
  if (event.type === 'conversation.item.added') {
    console.log('Item added:', event.item.id);
    console.log('Content:', event.item.content);
  }
  
  if (event.type === 'conversation.item.done') {
    console.log('Item processing complete:', event.item.id);
  }
});
```

## Retrieving Conversation Items

Retrieve specific conversation items:

```javascript theme={"system"}
ws.send(JSON.stringify({
  type: 'conversation.item.retrieve',
  item_id: 'item-id-here'
}));
```

The server will respond with:

```javascript theme={"system"}
{
  type: 'conversation.item.retrieved',
  item: {
    id: 'item-id-here',
    type: 'message',
    role: 'user',
    content: [...]
  }
}
```

## Deleting Conversation Items

Remove items from the conversation:

```javascript theme={"system"}
ws.send(JSON.stringify({
  type: 'conversation.item.delete',
  item_id: 'item-id-here'
}));
```

You'll receive a confirmation:

```javascript theme={"system"}
{
  type: 'conversation.item.deleted',
  item_id: 'item-id-here'
}
```

## Function Calling

The Realtime API supports function calling, allowing the assistant to invoke tools you define. Configure functions in `session.update` and handle function call events.

### Defining Functions

```javascript theme={"system"}
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    type: 'realtime',
    tools: [{
      type: 'function',
      name: 'get_weather',
      description: 'Get the weather for a location',
      parameters: {
        type: 'object',
        properties: {
          location: {
            type: 'string',
            description: 'The city and state, e.g. San Francisco, CA'
          }
        },
        required: ['location']
      }
    }],
    tool_choice: 'auto'
  }
}));
```

### Handling Function Calls

```javascript theme={"system"}
ws.on('message', (data) => {
  const event = JSON.parse(data);
  
  if (event.type === 'response.function_call_arguments.done') {
    const result = executeFunction(event.name, JSON.parse(event.arguments));
    
    ws.send(JSON.stringify({
      type: 'conversation.item.create',
      item: {
        type: 'function_call_output',
        call_id: event.call_id,
        output: JSON.stringify(result)
      }
    }));
    
    ws.send(JSON.stringify({
      type: 'response.create'
    }));
  }
});
```

## Voice Activity Detection

Voice Activity Detection (VAD) automatically detects when speech starts and stops, enabling natural turn-taking in conversations. Configure VAD through `session.update`.

### Configuring VAD

```javascript theme={"system"}
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    type: 'realtime',
    audio: {
      input: {
        turn_detection: {
          type: 'semantic_vad',
          eagerness: 'medium',
          create_response: true,
          interrupt_response: true
        }
      }
    }
  }
}));
```

### VAD Types

* **`semantic_vad`**: Uses conversational awareness to detect natural speech boundaries. Adjust `eagerness` (`low`, `medium`, `high`) to control responsiveness.

### VAD Events

```javascript theme={"system"}
ws.on('message', (data) => {
  const event = JSON.parse(data);
  
  if (event.type === 'input_audio_buffer.speech_started') {
    console.log('Speech detected');
    // Update UI to show user is speaking
  }
  
  if (event.type === 'input_audio_buffer.speech_stopped') {
    console.log('Speech ended');
    // Update UI, prepare for response
  }
});
```

## Error Handling

The Realtime API emits `error` events for various failure scenarios. Handle these events to provide robust error recovery and user feedback.

### Error Event Structure

```javascript theme={"system"}
ws.on('message', (data) => {
  const event = JSON.parse(data);
  
  if (event.type === 'error') {
    const error = event.error;
    
    switch (error.type) {
      case 'invalid_request_error':
        console.error('Invalid request:', error.message);
        if (error.param) {
          console.error('Parameter:', error.param);
        }
        break;
      case 'server_error':
        console.error('Server error:', error.message);
        // Implement retry logic
        break;
      case 'rate_limit_error':
        console.error('Rate limit exceeded');
        // Pause requests, implement backoff
        break;
    }
  }
});
```

### Error Types

* **`invalid_request_error`**: Invalid parameters or malformed requests. Check `error.param` for the specific field.
* **`server_error`**: Transient server-side failures. Implement retry logic with exponential backoff.
* **`rate_limit_error`**: Rate limit exceeded. Throttle requests and retry with exponential backoff.

## Interruption Handling

Interrupt active responses when new user input arrives.

### Interrupting Responses

Cancel an in-progress response when the user starts speaking again:

```javascript theme={"system"}
ws.on('message', (data) => {
  const event = JSON.parse(data);
  
  if (event.type === 'input_audio_buffer.speech_started') {
    // User started speaking, cancel current response
    ws.send(JSON.stringify({
      type: 'response.cancel'
    }));
  }
});
```

When `interrupt_response: true` is set in VAD configuration, the server automatically cancels responses when new speech is detected.

## Managing Context

### Session Instructions

Update session instructions to guide the conversation:

```javascript theme={"system"}
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    type: 'realtime',
    instructions: 'You are a helpful assistant. Be concise and friendly.'
  }
}));
```

### Conversation History

The API automatically maintains conversation history. You can:

1. **Keep full history**: Let the conversation grow naturally
2. **Selective deletion**: Remove specific items that aren't needed
3. **Session resets**: Start a new session when you need a clean context window

## Example: Conversation Manager

Here's a complete example of managing conversations:

```javascript theme={"system"}
class ConversationManager {
  constructor(ws) {
    this.ws = ws;
    this.items = new Map();
    this.setupListeners();
  }
  
  setupListeners() {
    this.ws.on('message', (data) => {
      const event = JSON.parse(data);
      
      switch (event.type) {
        case 'conversation.item.added':
          this.items.set(event.item.id, event.item);
          break;
        case 'conversation.item.deleted':
          this.items.delete(event.item_id);
          break;
      }
    });
  }
  
  sendMessage(text) {
    this.ws.send(JSON.stringify({
      type: 'conversation.item.create',
      item: {
        type: 'message',
        role: 'user',
        content: [{
          type: 'input_text',
          text: text
        }]
      }
    }));
  }
  
  deleteItem(itemId) {
    this.ws.send(JSON.stringify({
      type: 'conversation.item.delete',
      item_id: itemId
    }));
  }
  
  getConversationHistory() {
    return Array.from(this.items.values());
  }
}
```

## Best Practices

1. **Monitor Context Length**: Keep track of conversation length to avoid exceeding limits
2. **Strategic Deletion**: Remove old context that's no longer relevant
3. **Item Tracking**: Maintain a local map of conversation items for quick access
4. **Error Handling**: Handle cases where items might not exist when deleting/retrieving
5. **Context Management**: Use session instructions to guide conversation behavior

## Use Cases

* **Long Conversations**: Delete old context to maintain performance
* **Error Recovery**: Delete incorrect items and resend
* **Context Switching**: Clear conversation context when changing topics
* **Memory Management**: Remove items that are no longer needed
