| Mode | Field | How it works | Supported Models |
|---|---|---|---|
| Tool-based | web_search | LLM calls a search engine in a tool-calling loop, then synthesizes a final answer | Any LLM that supports tool calling |
| Native | web_search_options | Provider’s built-in search grounding (no tool loop) | OpenAI (search models only), Anthropic, Google, Vertex AI, Groq |
Only one mode may be set per variant. Setting both will result in an error.
Tool-based web search
Add aweb_search object to your router variant. When a request is routed to that variant, the router injects a search tool, lets the LLM call it in a loop, and returns a grounded answer with url_citation annotations.
| Parameter | Type | Default | Description |
|---|---|---|---|
engine | string | exa | Options: exa, google, native (uses model’s built-in grounding) |
max_results | int | 3 | Number of search results returned per step (1–20) |
max_steps | int | 1 | Maximum tool-call rounds before final synthesis (1–5) |
Variant configuration
- The router injects a search tool and sends the request to the LLM.
- The LLM calls the search tool with a query.
- The search engine returns results, which are injected back into the conversation.
- Steps 2–3 repeat up to
max_stepstimes. - The LLM synthesizes a final answer with
url_citationannotations.
Native web search
Addweb_search_options to your router variant to use a provider’s built-in search grounding. This skips the tool-calling loop entirely — the provider handles search internally.
| Parameter | Type | Default | Description |
|---|---|---|---|
search_context_size | string | "medium" | Amount of search context: "low", "medium", or "high" |
Variant configuration
gpt-4o-search-preview), Anthropic, Google / Vertex AI, and Groq.
Per-request web search
You can also passweb_search or web_search_options directly on a chat completion request instead of configuring it on a variant. The fields and values are the same as above.
Citations & streaming
Assistant messages may include OpenAI-styleannotations (e.g. type: "url_citation" with url, title, content).
With stream: true, annotations are delivered on the last SSE chunk, alongside finish_reason.