How it all fits together
A router is made up of routes and variants. Here’s the full flow when a request hits a router:- Route evaluation — Conditional routes are checked in order. The first match is selected, and if none match, the default route is used.
- Variant selection — Within the matched route, a variant is chosen based on weights.
- Model called — The request is sent to a model based on the variant configuration. If the variant uses
auto, the best model is dynamically selected based on the provided criteria.
If a
user is specified in the Chat Completions request, that user will consistently receive the same variant across requests (sticky routing).Routes
A route is a specific path within a router. There are two types of routes you can configure:-
Default route - This is the default route that will be used if no conditional routes exist or match. If no default route is configured and no conditions match, the API returns an error.
-
Conditional route - Conditional routes let you route requests based on runtime context (e.g., user tier). This can be useful if you want to segment users.
Each conditional route includes a CEL expression that is evaluated against the request metadata (passed via
extra_body.metadatain the Chat Completions request). Routes are evaluated in order, and the first route whose condition evaluates totrueis selected.To trigger this route, pass the matching metadata in your Chat Completions request:
Variants
A variant is a specific configuration within a route. Each variant specifies which model to use and can optionally include its own text generation parameters and prompt templates. When a route has multiple variants, traffic is distributed based on weights. Weights must sum to 100 within each route. This is useful for A/B testing — for example, splitting traffic 50/50 between two models to compare performance:Variant configuration
Each variant supports the following fields:| Field | Description |
|---|---|
variant_id | Unique identifier for this variant within its route. |
model_id | The model to use (e.g., openai/gpt-5.2). Set to auto for dynamic model selection. |
model_selection | Configures auto selection criteria (when model_id is auto) or fallback models (when model_id is a specific model). |
message_templates | Prompt templates for this variant. Useful for variant-specific system prompts. Supports prompt variables. |
text_generation_config | Generation parameters such as temperature, max_tokens, top_p, frequency_penalty, presence_penalty, seed, and stop_sequences. If set, this entirely replaces the router-level defaults. |
openai/gpt-5.2 as the primary model with the specified temperature and prompt. If the primary model fails, the fallback models in model_selection.models are tried in order of lowest latency (determined by the sort criteria).
Fallbacks
When a variant has a specificmodel_id (e.g., openai/gpt-5.2), you can configure fallback models via model_selection.models. If the primary model fails, the router automatically retries with the fallback models.
By default, fallbacks are tried in the order listed. Add sort to control the order — for example, trying the cheapest fallback first.
metadata.attempts array of the response.
Auto selection
Instead of specifying a fixed model (e.g.,openai/gpt-5.2), you can set model_id to auto and use model_selection to let the router dynamically pick the best model based on the sort criteria:
| Metric | Description |
|---|---|
SORT_METRIC_PRICE | Publicly listed token pricing, based on adding input and output token pricing. |
SORT_METRIC_LATENCY | Median time to first token. |
SORT_METRIC_THROUGHPUT | Median output tokens per second. |
SORT_METRIC_INTELLIGENCE | Overall intelligence based on the Artificial Analysis Intelligence Index. |
SORT_METRIC_MATH | Math capabilities based on the MATH-500 benchmark. |
SORT_METRIC_CODING | Coding capabilities based on the LiveCodeBench benchmark. |
models and ignore.