> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Core Concepts

A **router** is a reusable configuration that can be used with our [Chat Completions](/api-reference/routerAPI/chat-completions) API to define how requests are routed to models. Routers let you set up fallbacks, conditional routing, A/B test across models, attach prompt templates, and configure generation parameters — all without changing your application code.

You can create a router via Portal or [API](/api-reference/routerAPI/routerservice/create-router).

## How it all fits together

A router is made up of [**routes**](#routes) and [**variants**](#variants). Here's the full flow when a request hits a router:

1. **Route evaluation** — Conditional routes are checked in order. The first match is selected, and if none match, the default route is used.
2. **Variant selection** — Within the matched route, a variant is chosen based on weights.
3. **Model called** — The request is sent to a model based on the [variant configuration](#variant-configuration). If the variant uses `auto`, the best model is dynamically selected based on the provided criteria.

<Note>If a `user` is specified in the [Chat Completions](/api-reference/routerAPI/chat-completions) request, that user will consistently receive the same variant across requests (sticky routing).</Note>

## Routes

A **route** is a specific path within a router. There are two types of routes you can configure:

1. **Default route** - This is the default route that will be used if no conditional routes exist or match. If no default route is configured and no conditions match, the API returns an error.
   ```json theme={"system"}
   {
     "defaultRoute": {
       "route_id": "default",
       "variants": [...]
     }
   }
   ```
2. **Conditional route** - Conditional routes let you route requests based on runtime context (e.g., user tier). This can be useful if you want to segment users.

   Each conditional route includes a [CEL expression](https://github.com/google/cel-spec/blob/master/doc/langdef.md) that is evaluated against the request metadata (passed via `extra_body.metadata` in the Chat Completions request). Routes are evaluated **in order**, and the **first route** whose condition evaluates to `true` is selected.

   ```json theme={"system"}
   {
     "route": {
       "route_id": "premium",
       "variants": [...]
     },
     "condition": {
       "cel_expression": "tier == \"premium\""
     }
   }
   ```

   To trigger this route, pass the matching metadata in your Chat Completions request:

   ```json theme={"system"}
   {
     "model": "inworld/my-router",
     "messages": [{ "role": "user", "content": "Hello!" }],
     "extra_body": {
       "metadata": { "tier": "premium" }
     }
   }
   ```

   <Tip>
     Since the first matching route wins, place more specific conditional routes before general ones. For example, put `tier == "premium" && region == "us"` before `tier == "premium"` — otherwise the general condition would match first and the specific one would never be reached.
   </Tip>

## Variants

A **variant** is a specific configuration within a route. Each variant specifies which model to use and can optionally include its own text generation parameters and prompt templates.

When a route has multiple variants, traffic is distributed based on **weights**. Weights must sum to 100 within each route.

This is useful for A/B testing — for example, splitting traffic 50/50 between two models to compare performance:

```json theme={"system"}
{
  "route_id": "experiment",
  "variants": [
    {
      "variant": { "variant_id": "gpt5", "model_id": "openai/gpt-5.2" },
      "weight": 50
    },
    {
      "variant": { "variant_id": "claude", "model_id": "anthropic/claude-opus-4-6" },
      "weight": 50
    }
  ]
}
```

### Variant configuration

Each variant supports the following fields:

| Field                    | Description                                                                                                                                                                                                                                |
| :----------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `variant_id`             | Unique identifier for this variant within its route.                                                                                                                                                                                       |
| `model_id`               | The model to use. Can be a provider-prefixed model (e.g., `openai/gpt-5.2`), a model without provider for [provider routing](#provider-routing) (e.g., `gpt-oss-120b`), or `auto` for [dynamic model selection](#auto-selection).          |
| `model_selection`        | Configures [auto selection](#auto-selection) criteria (when `model_id` is `auto`), [fallback models](#fallbacks) (when `model_id` is a specific model), or [provider routing](#provider-routing) (when `model_id` has no provider prefix). |
| `message_templates`      | Prompt templates for this variant. Useful for variant-specific system prompts. Supports [prompt variables](/router/usage/prompt-variables).                                                                                                |
| `text_generation_config` | Generation parameters such as `temperature`, `max_tokens`, `top_p`, `frequency_penalty`, `presence_penalty`, `seed`, and `stop_sequences`. If set, this **entirely replaces** the router-level defaults.                                   |

Here's a fully configured variant example:

```json theme={"system"}
{
  "variant_id": "gpt5",
  "model_id": "openai/gpt-5.2",
  "text_generation_config": {
    "temperature": 0.7,
    "max_tokens": 1024
  },
  "message_templates": [
    { "role": "system", "content": "You are a helpful assistant specialized in {{topic}}." }
  ],
  "model_selection": {
    "models": ["anthropic/claude-opus-4-6", "google-ai-studio/gemini-2.5-pro"],
    "sort": [{ "metric": "SORT_METRIC_LATENCY" }]
  }
}
```

In this example, requests routed to this variant will use `openai/gpt-5.2` as the primary model with the specified temperature and prompt. If the primary model fails, the fallback models in `model_selection.models` are tried in order of lowest latency (determined by the `sort` criteria).

#### Fallbacks

When a variant has a specific `model_id` (e.g., `openai/gpt-5.2`), you can configure fallback models via `model_selection.models`. If the primary model fails, the router automatically retries with the fallback models.

By default, fallbacks are tried in the order listed. Add `sort` to control the order — for example, trying the cheapest fallback first.

<CodeGroup>
  ```json In order listed theme={"system"}
  // If gpt-5.2 fails, try Claude Opus 4.6, then if that fails, try Gemini 2.5 Pro
  {
    "variant_id": "gpt-5.2-with-fallbacks",
    "model_id": "openai/gpt-5.2",
    "model_selection": {
      "models": ["anthropic/claude-opus-4-6", "google-ai-studio/gemini-2.5-pro"]
    }
  }
  ```

  ```json Sorted by latency theme={"system"}
  // If gpt-5.2 fails, try Claude Opus 4.6 and Gemini 2.5 Pro in order of lowest latency
  {
    "variant_id": "gpt-5.2-with-fallbacks",
    "model_id": "openai/gpt-5.2",
    "model_selection": {
      "models": ["anthropic/claude-opus-4-6", "google-ai-studio/gemini-2.5-pro"],
      "sort": [{ "metric": "SORT_METRIC_LATENCY" }]
    }
  }
  ```
</CodeGroup>

You can inspect which models were attempted in the `metadata.attempts` array of the response.

#### Provider routing

Instead of specifying a provider-prefixed model (e.g., `openai/gpt-5.2`), you can specify just the model name (e.g., `gpt-oss-120b`) and let the router select the best provider automatically. Optionally, use `model_selection.provider` to control which providers are tried and in what order.

By default, the provider with the lowest latency is selected, and if it fails, the next-best provider is tried. You can change how the providers are sorted, or the explicit order in which providers are tried.

<CodeGroup>
  ```json Default provider selection theme={"system"}
  // Automatically selects the lowest-latency provider for gpt-oss-120b
  {
    "variant_id": "lowest-latency-provider",
    "model_id": "gpt-oss-120b"
  }
  ```

  ```json Order by throughput theme={"system"}
  // Select the highest throughput provider for gpt-oss-120b
  {
    "variant_id": "highest-throughput-provider",
    "model_id": "gpt-oss-120b",
    "model_selection": {
      "sort": [{"metric": "SORT_METRIC_THROUGHPUT"}]
    }
  }
  ```

  ```json Explicit provider order theme={"system"}
  // Try groq first, then fireworks
  {
    "variant_id": "groq-preferred",
    "model_id": "gpt-oss-120b",
    "model_selection": {
      "provider": {
        "order": ["groq", "fireworks"]
      }
    }
  }
  ```
</CodeGroup>

#### Auto selection

Instead of specifying a fixed model (e.g., `openai/gpt-5.2`), you can set `model_id` to `auto` and use `model_selection` to let the router dynamically pick the best model based on the sort criteria:

```json theme={"system"}
{
  "variant_id": "auto-variant",
  "model_id": "auto",
  "model_selection": {
    "sort": [{ "metric": "SORT_METRIC_LATENCY"}, {"metric": "SORT_METRIC_PRICE"}]
  }
}
```

In this example, the model with the lowest latency will be selected, using price as a tie-breaker.

The available sort metrics are:

| **Metric**                 | **Description**                                                                |
| :------------------------- | :----------------------------------------------------------------------------- |
| `SORT_METRIC_PRICE`        | Publicly listed token pricing, based on adding input and output token pricing. |
| `SORT_METRIC_LATENCY`      | Median time to first token.                                                    |
| `SORT_METRIC_THROUGHPUT`   | Median output tokens per second.                                               |
| `SORT_METRIC_INTELLIGENCE` | Overall intelligence based on the Artificial Analysis Intelligence Index.      |
| `SORT_METRIC_MATH`         | Math capabilities based on the MATH-500 benchmark.                             |
| `SORT_METRIC_CODING`       | Coding capabilities based on the LiveCodeBench benchmark.                      |

You can also limit the set of models to consider by specifying `models` and `ignore`. Both fields accept provider-prefixed models (e.g., `openai/gpt-5.2`) or provider names alone (e.g., `openai`) to include or exclude all models from that provider.

<CodeGroup>
  ```json Specify models theme={"system"}
  // Select the lowest latency model between gpt-5.2, Opus 4.6, and Gemini 2.5 Pro
  {
    "variant_id": "auto-variant",
    "model_id": "auto",
    "model_selection": {
      "models": ["openai/gpt-5.2", "anthropic/claude-opus-4-6", "google-ai-studio/gemini-2.5-pro"],
      "sort": [{ "metric": "SORT_METRIC_LATENCY" }]
    }
  }
  ```

  ```json Restrict to a provider theme={"system"}
  // Only consider OpenAI models, pick the lowest latency one
  {
    "variant_id": "auto-variant",
    "model_id": "auto",
    "model_selection": {
      "models": ["openai"],
      "sort": [{ "metric": "SORT_METRIC_LATENCY" }]
    }
  }
  ```

  ```json Ignore models theme={"system"}
  // Select lowest latency model that is not from OpenAI or Opus 4.6
  {
    "variant_id": "auto-variant",
    "model_id": "auto",
    "model_selection": {
      "ignore": ["openai", "anthropic/claude-opus-4-6"],
      "sort": [{ "metric": "SORT_METRIC_LATENCY" }]
    }
  }
  ```
</CodeGroup>