TTS-1.5 is here: under 120ms latency, optimized stability! Learn more
curl --location 'https://api.inworld.ai/v1/chat/completions' \
--header "Authorization: Basic $INWORLD_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'{
"id": "chatcmpl-1772347141924",
"object": "chat.completion",
"created": 1772347141,
"model": "openai/gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 9,
"total_tokens": 18
},
"metadata": {
"attempts": [
{
"model": "openai/gpt-4o",
"success": true,
"time_to_first_token_ms": 428
}
],
"generation_id": "9b365b38-f09f-96d1-8c77-99b28c8b74bf",
"reasoning": "Using specified model: 'openai/gpt-4o' - success",
"total_duration_ms": 472
}
}Generate a response for the given chat conversation
curl --location 'https://api.inworld.ai/v1/chat/completions' \
--header "Authorization: Basic $INWORLD_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'{
"id": "chatcmpl-1772347141924",
"object": "chat.completion",
"created": 1772347141,
"model": "openai/gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 9,
"total_tokens": 18
},
"metadata": {
"attempts": [
{
"model": "openai/gpt-4o",
"success": true,
"time_to_first_token_ms": 428
}
],
"generation_id": "9b365b38-f09f-96d1-8c77-99b28c8b74bf",
"reasoning": "Using specified model: 'openai/gpt-4o' - success",
"total_duration_ms": 472
}
}Call hundreds of models from various providers directly through our unified API, or setDocumentation Index
Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
Use this file to discover all available pages before exploring further.
model to auto for automatic model selection based on criteria like price, latency, or performance.
For more advanced routing — such as conditional routing, A/B testing across variants, and reusable configurations — create a router and reference it via the model field (e.g., inworld/my-router).
For web-grounded answers, use extra_body.web_search.Your authentication credentials. For Basic authentication, please populate Basic $INWORLD_API_KEY.
Please make sure your API Key has write permissions for the Router API in order to create, update, and delete routers.
The model to use, which can be:
gpt-oss-120b). The best provider is automatically selected by latency, or you can control provider selection via extra_body.provider. See Models for available models.openai/gpt-5). This specifies the provider and model to use.auto for automatic model selection based on criteria like price, latency, or intelligenceinworld/<router-name>. The router name must be prefixed by inworld/.A list of messages comprising the conversation so far.
If using a router where a prompt is specified, these messages will be appended to the prompt.
Show child attributes
If true, partial message deltas will be sent as server-sent events.
Sampling temperature between 0 and 2. Higher values make output more random.
0 <= x <= 2Nucleus sampling parameter. Must be greater than 0.
0 < x <= 1Maximum number of tokens to generate.
x >= 1Maximum number of completion tokens to generate.
x >= 1Penalizes tokens based on presence in the text.
-2 <= x <= 2Penalizes tokens based on frequency in the text.
-2 <= x <= 2Random seed for generation.
Up to 4 sequences where the API will stop generating.
Modifies the likelihood of specified tokens appearing in the completion.
Show child attributes
Controls the amount of reasoning effort the model uses. Note: This parameter is provider/model-specific and may not be supported by all models (e.g., OpenAI models do not support this parameter). This will be overridden if extra_body.reasoning is specified.
none, low, minimal, medium, high, xhigh A unique identifier for the end user. When used with a router, the same user will consistently receive the same variant across requests (sticky routing).
Tool-based web search configuration. The LLM calls a search engine in a tool-calling loop, then synthesizes a grounded answer with url_citation annotations. Works with any LLM that supports tool calling. Mutually exclusive with web_search_options. See Web Search for details.
Show child attributes
Native web search using the provider's built-in search grounding (no tool loop). Supported by OpenAI (search models only), Anthropic, Google / Vertex AI, and Groq. Mutually exclusive with web_search. See Web Search for details.
Show child attributes
Output modalities to generate. Defaults to ["text"]. Include "image" to request image generation (e.g., ["text", "image"]). Currently supported for OpenAI and Google image models.
text, image Configuration for image output. Optional when requesting image output via modalities: ["image"].
Show child attributes
Optional parameters for model routing and optimization.
Show child attributes
A successful response. Returns either a complete chat completion or streaming chunks.
Unique identifier for the chat completion.
Object type, always 'chat.completion'.
Unix timestamp when the completion was created.
The model that was actually used.
List of chat completion choices.
Show child attributes
Token usage statistics.
Show child attributes
Routing metadata providing transparency into model selection decisions.
Show child attributes
Was this page helpful?