Skip to main content
A router is a reusable configuration that can be used with our Chat Completions API to define how requests are routed to models. Routers let you set up fallbacks, conditional routing, A/B test across models, attach prompt templates, and configure generation parameters — all without changing your application code. You can create a router via Portal or API.

How it all fits together

A router is made up of routes and variants. Here’s the full flow when a request hits a router:
  1. Route evaluation — Conditional routes are checked in order. The first match is selected, and if none match, the default route is used.
  2. Variant selection — Within the matched route, a variant is chosen based on weights.
  3. Model called — The request is sent to a model based on the variant configuration. If the variant uses auto, the best model is dynamically selected based on the provided criteria.
If a user is specified in the Chat Completions request, that user will consistently receive the same variant across requests (sticky routing).

Routes

A route is a specific path within a router. There are two types of routes you can configure:
  1. Default route - This is the default route that will be used if no conditional routes exist or match. If no default route is configured and no conditions match, the API returns an error.
    {
      "defaultRoute": {
        "route_id": "default",
        "variants": [...]
      }
    }
    
  2. Conditional route - Conditional routes let you route requests based on runtime context (e.g., user tier). This can be useful if you want to segment users. Each conditional route includes a CEL expression that is evaluated against the request metadata (passed via extra_body.metadata in the Chat Completions request). Routes are evaluated in order, and the first route whose condition evaluates to true is selected.
    {
      "route": {
        "route_id": "premium",
        "variants": [...]
      },
      "condition": {
        "cel_expression": "tier == \"premium\""
      }
    }
    
    To trigger this route, pass the matching metadata in your Chat Completions request:
    {
      "model": "inworld/my-router",
      "messages": [{ "role": "user", "content": "Hello!" }],
      "extra_body": {
        "metadata": { "tier": "premium" }
      }
    }
    
    Since the first matching route wins, place more specific conditional routes before general ones. For example, put tier == "premium" && region == "us" before tier == "premium" — otherwise the general condition would match first and the specific one would never be reached.

Variants

A variant is a specific configuration within a route. Each variant specifies which model to use and can optionally include its own text generation parameters and prompt templates. When a route has multiple variants, traffic is distributed based on weights. Weights must sum to 100 within each route. This is useful for A/B testing — for example, splitting traffic 50/50 between two models to compare performance:
{
  "route_id": "experiment",
  "variants": [
    {
      "variant": { "variant_id": "gpt5", "model_id": "openai/gpt-5.2" },
      "weight": 50
    },
    {
      "variant": { "variant_id": "claude", "model_id": "anthropic/claude-opus-4-6" },
      "weight": 50
    }
  ]
}

Variant configuration

Each variant supports the following fields:
FieldDescription
variant_idUnique identifier for this variant within its route.
model_idThe model to use (e.g., openai/gpt-5.2). Set to auto for dynamic model selection.
model_selectionConfigures auto selection criteria (when model_id is auto) or fallback models (when model_id is a specific model).
message_templatesPrompt templates for this variant. Useful for variant-specific system prompts. Supports prompt variables.
text_generation_configGeneration parameters such as temperature, max_tokens, top_p, frequency_penalty, presence_penalty, seed, and stop_sequences. If set, this entirely replaces the router-level defaults.
Here’s a fully configured variant example:
{
  "variant_id": "gpt5",
  "model_id": "openai/gpt-5.2",
  "text_generation_config": {
    "temperature": 0.7,
    "max_tokens": 1024
  },
  "message_templates": [
    { "role": "system", "content": "You are a helpful assistant specialized in {{topic}}." }
  ],
  "model_selection": {
    "models": ["anthropic/claude-opus-4-6", "google-ai-studio/gemini-2.5-pro"],
    "sort": [{ "metric": "SORT_METRIC_LATENCY" }]
  }
}
In this example, requests routed to this variant will use openai/gpt-5.2 as the primary model with the specified temperature and prompt. If the primary model fails, the fallback models in model_selection.models are tried in order of lowest latency (determined by the sort criteria).

Fallbacks

When a variant has a specific model_id (e.g., openai/gpt-5.2), you can configure fallback models via model_selection.models. If the primary model fails, the router automatically retries with the fallback models. By default, fallbacks are tried in the order listed. Add sort to control the order — for example, trying the cheapest fallback first.
// If gpt-5.2 fails, try Claude Opus 4.6, then if that fails, try Gemini 2.5 Pro
{
  "variant_id": "gpt-5.2-with-fallbacks",
  "model_id": "openai/gpt-5.2",
  "model_selection": {
    "models": ["anthropic/claude-opus-4-6", "google-ai-studio/gemini-2.5-pro"]
  }
}
You can inspect which models were attempted in the metadata.attempts array of the response.

Auto selection

Instead of specifying a fixed model (e.g., openai/gpt-5.2), you can set model_id to auto and use model_selection to let the router dynamically pick the best model based on the sort criteria:
{
  "variant_id": "auto-variant",
  "model_id": "auto",
  "model_selection": {
    "sort": [{ "metric": "SORT_METRIC_LATENCY"}, {"metric": "SORT_METRIC_PRICE"}]
  }
}
In this example, the model with the lowest latency will be selected, using price as a tie-breaker. The available sort metrics are:
MetricDescription
SORT_METRIC_PRICEPublicly listed token pricing, based on adding input and output token pricing.
SORT_METRIC_LATENCYMedian time to first token.
SORT_METRIC_THROUGHPUTMedian output tokens per second.
SORT_METRIC_INTELLIGENCEOverall intelligence based on the Artificial Analysis Intelligence Index.
SORT_METRIC_MATHMath capabilities based on the MATH-500 benchmark.
SORT_METRIC_CODINGCoding capabilities based on the LiveCodeBench benchmark.
You can also limit the set of models to consider by specifying models and ignore.
// Select the lowest latency model between gpt-5.2, Opus 4.6, and Gemini 2.5 Pro
{
  "variant_id": "auto-variant",
  "model_id": "auto",
  "model_selection": {
    "models": ["openai/gpt-5.2", "anthropic/claude-opus-4-6", "google-ai-studio/gemini-2.5-pro"],
    "sort": [{ "metric": "SORT_METRIC_LATENCY" }]
  }
}