> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Back-channel responses

> Emit short, low-latency interjections ("uh-huh", "right", "I see") while the user is mid-utterance, so the agent feels like an active listener.

Back-channel responses are short audio interjections — `"uh-huh"`, `"right"`, `"I see"` — that the server emits while the user is **still speaking**. They sit out-of-band from the main response stream and give the agent the cadence of an active listener without interrupting the user's turn.

## Enabling back-channel

```javascript theme={"system"}
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    providerData: {
      backchannel: {
        enabled: true,
        eval_interval_ms: 800,
        min_speech_ms: 800,
        min_gap_ms: 4000,
        max_per_turn: 3,
        hard_deadline_ms: 1500,
        history_tail_items: 4,
        temperature: 0.7,
        max_tokens: 6,
        volume_gain: 0.6,
        require_pause: false,
        decider_kind: 'llm'
      }
    }
  }
}));
```

| Field                   | Type                | Default        | Description                                                                                                                                                                                                                                                                                                |
| ----------------------- | ------------------- | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `enabled`               | boolean             | `false`        | Per-session opt-in. Sessions that don't send this field never receive back-channels even when server prerequisites are met.                                                                                                                                                                                |
| `small_model`           | string              | server default | Override the decider LLM model identifier. Empty string inherits the server default.                                                                                                                                                                                                                       |
| `eval_interval_ms`      | integer             | `800`          | How often the manager evaluates eligibility while the user is producing partial transcripts.                                                                                                                                                                                                               |
| `min_speech_ms`         | integer             | `800`          | Minimum time after speech onset before any back-channel can fire. Suppresses interjections on micro-utterances.                                                                                                                                                                                            |
| `min_gap_ms`            | integer             | `4000`         | Minimum spacing between two back-channels in the same user turn.                                                                                                                                                                                                                                           |
| `max_per_turn`          | integer             | `3`            | Cap on back-channels emitted within a single user turn.                                                                                                                                                                                                                                                    |
| `hard_deadline_ms`      | integer             | `1500`         | Combined small-LLM + TTS deadline per attempt. Misses are dropped.                                                                                                                                                                                                                                         |
| `history_tail_items`    | integer             | `4`            | Recent conversation items the small LLM sees as context.                                                                                                                                                                                                                                                   |
| `temperature`           | number              | `0.7`          | Sampling temperature for the small LLM.                                                                                                                                                                                                                                                                    |
| `max_tokens`            | integer             | `6`            | Max tokens for the small LLM's reply.                                                                                                                                                                                                                                                                      |
| `volume_gain`           | number              | `0.6`          | Linear gain multiplier applied to synthesized back-channel audio before it is sent to the client. `0.0` mutes back-channels (synthesis still runs but no audio is delivered); `1.0` keeps the synthesized volume; values >1.0 amplify. Useful when back-channels feel louder than the main response audio. |
| `require_pause`         | boolean             | `false`        | When `true`, only fire after a smart-turn pause signal (the `input_audio_buffer.turn_suggestion` event). When `false`, the periodic ticker fires regardless of speech state.                                                                                                                               |
| `allowed_phrases`       | string\[]           | server default | Restrict the phrase bank. `null` / field omitted inherits the default; an explicit empty array disables back-channel for the session (no phrase can be picked); a populated array replaces the bank.                                                                                                       |
| `prompt_template`       | string              | server default | Override the decider prompt. Empty string inherits the server default. Supports the Go `text/template` tokens `{{.PhrasesList}}`, `{{.History}}`, `{{.Partial}}`.                                                                                                                                          |
| `decider_kind`          | `"llm"` \| `"rule"` | `"llm"`        | `llm` uses a small LLM. `rule` picks phrases from the bank with per-tick probability `rule_fire_probability` — useful for load tests or low-cost production.                                                                                                                                               |
| `rule_fire_probability` | number              | `1.0`          | Per-tick fire probability for the rule decider (`0.0`–`1.0`). Values outside `[0,1]` are clamped. Ignored when `decider_kind != "rule"`. The default of `1.0` matches legacy / test behavior; production rule-decider deployments typically set this lower (e.g. `0.3`) for a natural cadence.             |

Sending `providerData.backchannel: {}` (empty object) clears all overrides; the server falls back to its compiled-in defaults.

## Handling back-channel audio

```javascript theme={"system"}
session.on('transport_event', (event) => {
  switch (event.type) {
    case 'response.backchannel.audio.delta': {
      // event.backchannel_id groups deltas + done for one interjection.
      // event.delta is base64-encoded PCM16 (or whatever output format you
      // configured on session.audio.output.format).
      audioHandler.playAudio(event.delta, `backchannel:${event.backchannel_id}`);
      break;
    }
    case 'response.backchannel.audio.done': {
      // event.phrase (optional) is the chosen utterance, e.g. "uh-huh".
      // No teardown needed — playAudio queues until exhausted.
      break;
    }
  }
});
```

Use `backchannel_id` as the playback bucket key so chunks of one interjection don't collide with the active assistant response item.

### Client integration: preserve audio during user speech

A natural-feeling back-channel is **audible to the user while they are still talking**. The default audio-ducking behavior in many client integrations attenuates *all* output channels when VAD reports user activity — this also silences the back-channel, defeating its purpose. When you wire back-channel into your client, exempt the back-channel playback bucket (keyed by `backchannel_id`) from the duck-on-user-speech rule so interjections remain audible while the user holds the floor.

## Event reference

| Event                              | Direction       | Description                                                                                                                                                                                                                                                                                                                                                            |
| ---------------------------------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `response.backchannel.audio.delta` | server → client | Streaming PCM audio chunk for one interjection. Carries `backchannel_id`, base64 `delta`.                                                                                                                                                                                                                                                                              |
| `response.backchannel.audio.done`  | server → client | Indicates all audio for the back-channel interjection identified by `backchannel_id` has been streamed. Carries an optional `phrase` field with the chosen utterance (e.g. `"uh-huh"`) when the decider surfaces it; omitted when the decider doesn't expose the phrase (configuration-dependent). The wire shape is `omitempty`, so absent and `null` are equivalent. |
| `response.backchannel.skipped`     | server → client | The decider chose not to fire on an evaluation tick. Carries a `reason` string (e.g. `"deadline_missed"`, `"no_phrase"`) for client-side telemetry. Safe to ignore.                                                                                                                                                                                                    |

Example `response.backchannel.skipped` payload:

```json theme={"system"}
{
  "event_id": "evt_5f7d2",
  "type": "response.backchannel.skipped",
  "reason": "min_gap_not_elapsed"
}
```

See the [WebSocket API reference](/api-reference/realtimeAPI/realtime/realtime-websocket) for the full schemas.

## Example: Spanish back-channel with rule decider

For a low-cost, deterministic back-channel — no small-LLM call per evaluation, just a random pick from a fixed Spanish phrase bank — use `decider_kind: "rule"` with a populated `allowed_phrases` and a per-tick fire probability tuned for natural cadence. The TTS voice and language come from your normal `audio.output` and `providerData.tts.language` settings, so the phrases get spoken in the Spanish accent you've already configured.

```javascript theme={"system"}
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    audio: { output: { voice: 'Olivia', model: 'inworld-tts-2' } },
    providerData: {
      tts: { language: 'es-ES' },
      backchannel: {
        enabled: true,
        decider_kind: 'rule',
        rule_fire_probability: 0.3,
        allowed_phrases: ['ajá', 'claro', 'sí', 'mhm', 'entiendo', 'vale'],
        min_speech_ms: 800,
        min_gap_ms: 4000,
        max_per_turn: 3
      }
    }
  }
}));
```

What this does:

* **No LLM in the hot path.** Every evaluation tick is a coin flip against `rule_fire_probability`; if it fires, the server picks a random phrase from `allowed_phrases`. Latency is effectively just TTS synthesis.
* **`rule_fire_probability: 0.3`** keeps the cadence natural — at the default 800 ms `eval_interval_ms`, that's roughly one fire every \~2.7 ticks once the gating thresholds pass. Tune up for more eager back-channels, down for sparser.
* **`allowed_phrases`** replaces the compiled-in English bank with Spanish utterances. With the LLM decider you'd instead append a language directive to `prompt_template`; with the rule decider, the phrase bank is the only thing controlling what gets said, so list them explicitly.
* **`providerData.tts.language: 'es-ES'`** anchors the TTS accent. Without it, TTS-2 may infer a different accent from the audio or text context.

## Tuning tips

* Start with the server defaults (just `{ enabled: true }`). Adjust `min_speech_ms` and `min_gap_ms` first if back-channels feel too eager or too sparse.
* Pair with `turn_detection.eagerness: 'low'` so the main response model gives the user space to continue — back-channel fills the perceived silence.
* If back-channels feel louder than the main assistant response, lower `volume_gain` (default `0.6`); if they feel inaudible, raise it toward `1.0`. Setting `volume_gain: 0.0` mutes back-channels entirely without disabling synthesis — useful for A/B tests of the decider in isolation.
* For multilingual sessions, append a language directive to `prompt_template`. The compiled-in default lists English example phrases; without a language hint the small model echoes them in English regardless of the conversation language.
* For load tests, switch `decider_kind` to `rule` with `rule_fire_probability: 0.3` to remove the LLM call from the hot path while still exercising the audio pipeline.
