Skip to main content
You can ground LLM responses with real-time web search results by adding web search configuration to a router variant. Two mutually exclusive modes are available — setting both on the same variant returns an error.
ModeFieldHow it worksSupported Models
Tool-basedweb_searchLLM calls a search engine in a tool-calling loop, then synthesizes a final answerAny LLM that supports tool calling
Nativeweb_search_optionsProvider’s built-in search grounding (no tool loop)OpenAI (search models only), Anthropic, Google, Vertex AI, Groq
Only one mode may be set per variant. Setting both will result in an error.
Add a web_search object to your router variant. When a request is routed to that variant, the router injects a search tool, lets the LLM call it in a loop, and returns a grounded answer with url_citation annotations.
ParameterTypeDefaultDescription
enginestringexaOptions: exa, google, native (uses model’s built-in grounding)
max_resultsint3Number of search results returned per step (1–20)
max_stepsint1Maximum tool-call rounds before final synthesis (1–5)
Variant configuration
{
  "variant_id": "search-grounded",
  "model_id": "openai/gpt-4o",
  "web_search": {
    "engine": "exa",
    "max_results": 5,
    "max_steps": 2
  }
}
How it works:
  1. The router injects a search tool and sends the request to the LLM.
  2. The LLM calls the search tool with a query.
  3. The search engine returns results, which are injected back into the conversation.
  4. Steps 2–3 repeat up to max_steps times.
  5. The LLM synthesizes a final answer with url_citation annotations.
Add web_search_options to your router variant to use a provider’s built-in search grounding. This skips the tool-calling loop entirely — the provider handles search internally.
ParameterTypeDefaultDescription
search_context_sizestring"medium"Amount of search context: "low", "medium", or "high"
Variant configuration
{
  "variant_id": "native-search",
  "model_id": "openai/gpt-4o-search-preview",
  "web_search_options": {
    "search_context_size": "high"
  }
}
Supported providers: OpenAI (search models only, e.g. gpt-4o-search-preview), Anthropic, Google / Vertex AI, and Groq. You can also pass web_search or web_search_options directly on a chat completion request instead of configuring it on a variant. The fields and values are the same as above.
Both fields can be passed at the top level of the request body or inside extra_body for OpenAI SDK compatibility.

Citations & streaming

Assistant messages may include OpenAI-style annotations (e.g. type: "url_citation" with url, title, content). With stream: true, annotations are delivered on the last SSE chunk, alongside finish_reason.