> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Request-Level Routing

> Route requests without a pre-defined router

The [Chat Completions](/api-reference/routerAPI/chat-completions) API supports routing directly at the request level, without needing to create a router. You can use it to call a specific model, add fallbacks, or let the engine auto-select the best model, all in a single request.

Request-level routing can be a good fit if you:

* Want to call a specific model through a unified API without setting up a router
* Are prototyping or benchmarking before committing to a router configuration

For more advanced use cases like conditional routing, A/B testing with weighted variants, or shared prompt templates, we recommend [setting up a router](/router/quickstart).

## Direct model call

Specify a model directly by its `provider/model` identifier:

```bash theme={"system"}
curl -X POST https://api.inworld.ai/v1/chat/completions \
  -H 'Authorization: Bearer <your-api-key>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "openai/gpt-5.2",
    "messages": [{ "role": "user", "content": "Hello!" }]
  }'
```

This sends the request to the specified model with no routing logic. You still benefit from Inworld Router's unified API.

## Fallbacks

Add fallback models via `extra_body.models`. If the primary model fails, the router automatically tries the next model in the list:

```bash theme={"system"}
curl -X POST https://api.inworld.ai/v1/chat/completions \
  -H 'Authorization: Bearer <your-api-key>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "openai/gpt-5.2",
    "messages": [{ "role": "user", "content": "Hello!" }],
    "extra_body": {
      "models": ["anthropic/claude-opus-4-6", "google-ai-studio/gemini-2.5-pro"]
    }
  }'
```

In this example, the router tries gpt-5.2 first, then Claude Opus, then Gemini Pro. You can inspect which models were attempted in the `metadata.attempts` array of the response.

### Fallback by first token timeout

You can set a time-to-first-token (TTFT) timeout to trigger fallback based on latency. If the current model does not return the first token within the specified threshold, the router cancels the request and tries the next model in the chain.

This is useful when your application has strict latency requirements and you'd rather try an alternative model than wait for a slow response.

Add a `fallback` object with a `ttft_timeout` field under `extra_body` in your request (that is, `extra_body.fallback.ttft_timeout`):

```bash theme={"system"}
curl -X POST https://api.inworld.ai/v1/chat/completions \
  -H 'Authorization: Bearer <your-api-key>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "openai/gpt-5.2",
    "messages": [{ "role": "user", "content": "Hello" }],
    "extra_body": {
      "models": ["openai/gpt-4o", "google-ai-studio/gemini-2.5-pro"],
      "fallback": {
        "ttft_timeout": "900ms"
      }
    }
  }'
```

The `ttft_timeout` value is a duration string (e.g., `"300ms"`, `"1s"`, `"1.5s"`). The minimum allowed value is `300ms`.

## Auto model selection

Set `model` to `auto` and provide sorting criteria via `extra_body.sort` to let the router pick the best model automatically:

```bash theme={"system"}
curl -X POST https://api.inworld.ai/v1/chat/completions \
  -H 'Authorization: Bearer <your-api-key>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "auto",
    "messages": [{ "role": "user", "content": "Hello!" }],
    "extra_body": {
      "sort": ["price"]
    }
  }'
```

This selects the cheapest available model. Available sort criteria: `price`, `latency`, `throughput`, `intelligence`, `math`, `coding`.

You can combine multiple criteria — models are ranked by the first criterion, with subsequent criteria used as tiebreakers:

```bash theme={"system"}
curl -X POST https://api.inworld.ai/v1/chat/completions \
  -H 'Authorization: Bearer <your-api-key>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "auto",
    "messages": [{ "role": "user", "content": "Hello!" }],
    "extra_body": {
      "sort": ["price", "latency"]
    }
  }'
```

This picks the cheapest model, using latency as a tiebreaker.

### Filtering models

Use `extra_body.models` to restrict the candidate pool, or `extra_body.ignore` to exclude specific models or entire providers:

<CodeGroup>
  ```json Restrict to specific models theme={"system"}
  {
    "model": "auto",
    "messages": [{ "role": "user", "content": "Hello!" }],
    "extra_body": {
      "models": ["openai/gpt-5.2", "anthropic/claude-opus-4-6", "google-ai-studio/gemini-2.5-pro"],
      "sort": ["latency"]
    }
  }
  ```

  ```json Exclude models or providers theme={"system"}
  {
    "model": "auto",
    "messages": [{ "role": "user", "content": "Hello!" }],
    "extra_body": {
      "ignore": ["openai", "anthropic/claude-opus-4-6"],
      "sort": ["latency"]
    }
  }
  ```
</CodeGroup>