Core Concepts - Inworld AI Documentation

A router is a reusable configuration that can be used with our Chat Completions API to define how requests are routed to models. Routers let you set up fallbacks, conditional routing, A/B test across models, attach prompt templates, and configure generation parameters — all without changing your application code. You can create a router via Portal or API.

How it all fits together

A router is made up of routes and variants. Here’s the full flow when a request hits a router:

Route evaluation — Conditional routes are checked in order. The first match is selected, and if none match, the default route is used.
Variant selection — Within the matched route, a variant is chosen based on weights.
Model called — The request is sent to a model based on the variant configuration. If the variant uses auto, the best model is dynamically selected based on the provided criteria.

If a user is specified in the Chat Completions request, that user will consistently receive the same variant across requests (sticky routing).

Routes

A route is a specific path within a router. There are two types of routes you can configure:

Default route - This is the default route that will be used if no conditional routes exist or match. If no default route is configured and no conditions match, the API returns an error.
```
{
  "defaultRoute": {
    "route_id": "default",
    "variants": [...]
  }
}
```
Conditional route - Conditional routes let you route requests based on runtime context (e.g., user tier). This can be useful if you want to segment users. Each conditional route includes a CEL expression that is evaluated against the request metadata (passed via extra_body.metadata in the Chat Completions request). Routes are evaluated in order, and the first route whose condition evaluates to true is selected.
```
{
  "route": {
    "route_id": "premium",
    "variants": [...]
  },
  "condition": {
    "cel_expression": "tier == \"premium\""
  }
}
```
To trigger this route, pass the matching metadata in your Chat Completions request:
```
{
  "model": "inworld/my-router",
  "messages": [{ "role": "user", "content": "Hello!" }],
  "extra_body": {
    "metadata": { "tier": "premium" }
  }
}
```
Since the first matching route wins, place more specific conditional routes before general ones. For example, put tier == "premium" && region == "us" before tier == "premium" — otherwise the general condition would match first and the specific one would never be reached.

Variants

A variant is a specific configuration within a route. Each variant specifies which model to use and can optionally include its own text generation parameters and prompt templates. When a route has multiple variants, traffic is distributed based on weights. Weights must sum to 100 within each route. This is useful for A/B testing — for example, splitting traffic 50/50 between two models to compare performance:

{
  "route_id": "experiment",
  "variants": [
    {
      "variant": { "variant_id": "gpt5", "model_id": "openai/gpt-5.2" },
      "weight": 50
    },
    {
      "variant": { "variant_id": "claude", "model_id": "anthropic/claude-opus-4-6" },
      "weight": 50
    }
  ]
}

Variant configuration

Each variant supports the following fields:

Field	Description
`variant_id`	Unique identifier for this variant within its route.
`model_id`	The model to use. Can be a provider-prefixed model (e.g., `openai/gpt-5.2`), a model without provider for provider routing (e.g., `gpt-oss-120b`), or `auto` for dynamic model selection.
`model_selection`	Configures auto selection criteria (when `model_id` is `auto`), fallback models (when `model_id` is a specific model), or provider routing (when `model_id` has no provider prefix).
`message_templates`	Prompt templates for this variant. Useful for variant-specific system prompts. Supports prompt variables.
`text_generation_config`	Generation parameters such as `temperature`, `max_tokens`, `top_p`, `frequency_penalty`, `presence_penalty`, `seed`, and `stop_sequences`. If set, this entirely replaces the router-level defaults.

Here’s a fully configured variant example:

{
  "variant_id": "gpt5",
  "model_id": "openai/gpt-5.2",
  "text_generation_config": {
    "temperature": 0.7,
    "max_tokens": 1024
  },
  "message_templates": [
    { "role": "system", "content": "You are a helpful assistant specialized in {{topic}}." }
  ],
  "model_selection": {
    "models": ["anthropic/claude-opus-4-6", "google-ai-studio/gemini-2.5-pro"],
    "sort": [{ "metric": "SORT_METRIC_LATENCY" }]
  }
}

In this example, requests routed to this variant will use openai/gpt-5.2 as the primary model with the specified temperature and prompt. If the primary model fails, the fallback models in model_selection.models are tried in order of lowest latency (determined by the sort criteria).

Fallbacks

When a variant has a specific model_id (e.g., openai/gpt-5.2), you can configure fallback models via model_selection.models. If the primary model fails, the router automatically retries with the fallback models. By default, fallbacks are tried in the order listed. Add sort to control the order — for example, trying the cheapest fallback first.

// If gpt-5.2 fails, try Claude Opus 4.6, then if that fails, try Gemini 2.5 Pro
{
  "variant_id": "gpt-5.2-with-fallbacks",
  "model_id": "openai/gpt-5.2",
  "model_selection": {
    "models": ["anthropic/claude-opus-4-6", "google-ai-studio/gemini-2.5-pro"]
  }
}

You can inspect which models were attempted in the metadata.attempts array of the response.

Provider routing

Instead of specifying a provider-prefixed model (e.g., openai/gpt-5.2), you can specify just the model name (e.g., gpt-oss-120b) and let the router select the best provider automatically. Optionally, use model_selection.provider to control which providers are tried and in what order. By default, the provider with the lowest latency is selected, and if it fails, the next-best provider is tried. You can change how the providers are sorted, or the explicit order in which providers are tried.

// Automatically selects the lowest-latency provider for gpt-oss-120b
{
  "variant_id": "lowest-latency-provider",
  "model_id": "gpt-oss-120b"
}

Auto selection

Instead of specifying a fixed model (e.g., openai/gpt-5.2), you can set model_id to auto and use model_selection to let the router dynamically pick the best model based on the sort criteria:

{
  "variant_id": "auto-variant",
  "model_id": "auto",
  "model_selection": {
    "sort": [{ "metric": "SORT_METRIC_LATENCY"}, {"metric": "SORT_METRIC_PRICE"}]
  }
}

In this example, the model with the lowest latency will be selected, using price as a tie-breaker. The available sort metrics are:

Metric	Description
`SORT_METRIC_PRICE`	Publicly listed token pricing, based on adding input and output token pricing.
`SORT_METRIC_LATENCY`	Median time to first token.
`SORT_METRIC_THROUGHPUT`	Median output tokens per second.
`SORT_METRIC_INTELLIGENCE`	Overall intelligence based on the Artificial Analysis Intelligence Index.
`SORT_METRIC_MATH`	Math capabilities based on the MATH-500 benchmark.
`SORT_METRIC_CODING`	Coding capabilities based on the LiveCodeBench benchmark.

You can also limit the set of models to consider by specifying models and ignore. Both fields accept provider-prefixed models (e.g., openai/gpt-5.2) or provider names alone (e.g., openai) to include or exclude all models from that provider.

// Select the lowest latency model between gpt-5.2, Opus 4.6, and Gemini 2.5 Pro
{
  "variant_id": "auto-variant",
  "model_id": "auto",
  "model_selection": {
    "models": ["openai/gpt-5.2", "anthropic/claude-opus-4-6", "google-ai-studio/gemini-2.5-pro"],
    "sort": [{ "metric": "SORT_METRIC_LATENCY" }]
  }
}

Documentation Index

​How it all fits together

​Routes

​Variants

​Variant configuration

​Fallbacks

​Provider routing

​Auto selection

How it all fits together

Routes

Variants

Variant configuration

Fallbacks

Provider routing

Auto selection