Create chat completion

Authorizations

Authorization

string

header

required

Your authentication credentials. For Basic authentication, please populate Basic $INWORLD_API_KEY.

Please make sure your API Key has write permissions for the Router API in order to create, update, and delete routers.

Body

application/json

model

string

required

The model to use, which can be:

A model id (e.g., gpt-oss-120b). The best provider is automatically selected by latency, or you can control provider selection via extra_body.provider. See Models for available models.
A provider-prefixed model id (e.g., openai/gpt-5). This specifies the provider and model to use.
auto for automatic model selection based on criteria like price, latency, or intelligence
A router, which is specified by inworld/<router-name>. The router name must be prefixed by inworld/.

messages

object[]

required

A list of messages comprising the conversation so far.

If using a router where a prompt is specified, these messages will be appended to the prompt.

Show child attributes

stream

boolean

default:false

If true, partial message deltas will be sent as server-sent events.

temperature

number

default:1

Sampling temperature between 0 and 2. Higher values make output more random.

Required range: 0 <= x <= 2

top_p

number

Nucleus sampling parameter. Must be greater than 0.

Required range: 0 < x <= 1

max_tokens

integer

Maximum number of tokens to generate.

Required range: x >= 1

max_completion_tokens

integer

Maximum number of completion tokens to generate.

Required range: x >= 1

presence_penalty

number

default:0

Penalizes tokens based on presence in the text.

Required range: -2 <= x <= 2

frequency_penalty

number

default:0

Penalizes tokens based on frequency in the text.

Required range: -2 <= x <= 2

seed

integer<int32>

Random seed for generation.

stop

string[]

Up to 4 sequences where the API will stop generating.

logit_bias

object[]

Modifies the likelihood of specified tokens appearing in the completion.

Show child attributes

reasoning_effort

enum<string>

Controls the amount of reasoning effort the model uses. Note: This parameter is provider/model-specific and may not be supported by all models (e.g., OpenAI models do not support this parameter). This will be overridden if extra_body.reasoning is specified.

Available options:

none,

low,

minimal,

medium,

high,

xhigh

user

string

A unique identifier for the end user. When used with a router, the same user will consistently receive the same variant across requests (sticky routing).

web_search

object

Tool-based web search configuration. The LLM calls a search engine in a tool-calling loop, then synthesizes a grounded answer with url_citation annotations. Works with any LLM that supports tool calling. Mutually exclusive with web_search_options. See Web Search for details.

Show child attributes

web_search_options

object

Native web search using the provider's built-in search grounding (no tool loop). Supported by OpenAI (search models only), Anthropic, Google / Vertex AI, and Groq. Mutually exclusive with web_search. See Web Search for details.

Show child attributes

modalities

enum<string>[]

Output modalities to generate. Defaults to ["text"]. Include "image" to request image generation (e.g., ["text", "image"]). Currently supported for OpenAI and Google image models.

Available options:

text,

image

image_config

object

Configuration for image output. Optional when requesting image output via modalities: ["image"].

Show child attributes

extra_body

object

Optional parameters for model routing and optimization.

Show child attributes

Response

A successful response. Returns either a complete chat completion or streaming chunks.

string

Unique identifier for the chat completion.

object

string

Object type, always 'chat.completion'.

created

integer

Unix timestamp when the completion was created.

model

string

The model that was actually used.

choices

object[]

List of chat completion choices.

Show child attributes

usage

object

Token usage statistics.

Show child attributes

metadata

object

Routing metadata providing transparency into model selection decisions.

Show child attributes

Overview

Text-to-Speech

Voices

Speech-to-Text

Realtime API

LLM

Router

Moderation

Models

Embeddings

Authorizations

Body

Response

Overview

Text-to-Speech

Voices

Speech-to-Text

Realtime API

LLM

Router

Moderation

Models

Embeddings

Documentation Index

Authorizations

Body

Response