Skip to main content
POST
/
v1
/
chat
/
completions
curl --location 'https://api.inworld.ai/v1/chat/completions' \
--header 'Authorization: Basic <your-api-key>' \
--header 'Content-Type: application/json' \
--data '{
  "model": "openai/gpt-4o",
  "messages": [{"role": "user", "content": "Hello!"}]
}'
{
  "id": "chatcmpl-1772347141924",
  "object": "chat.completion",
  "created": 1772347141,
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 9,
    "total_tokens": 18
  },
  "metadata": {
    "attempts": [
      {
        "model": "openai/gpt-4o",
        "success": true,
        "time_to_first_token_ms": 428
      }
    ],
    "generation_id": "9b365b38-f09f-96d1-8c77-99b28c8b74bf",
    "reasoning": "Using specified model: 'openai/gpt-4o' - success",
    "total_duration_ms": 472
  }
}
Call hundreds of models from various providers directly through our unified API, or set model to auto for automatic model selection based on criteria like price, latency, or performance. For more advanced routing — such as conditional routing, A/B testing across variants, and reusable configurations — create a router and reference it via the model field (e.g., inworld/my-router).

Authorizations

Authorization
string
header
required

Your authentication credentials. For Basic authentication, please populate Basic $INWORLD_API_KEY

Body

application/json
model
string
required

The model to use, which can be:

  • A model id (e.g., openai/gpt-5). See Models for available models.
  • auto for automatic selection based on criteria like price, latency, or intelligence
  • A router id (e.g., inworld/my-router)
messages
object[]
required

A list of messages comprising the conversation so far.

If using a router where a prompt is specified, these messages will be appended to the prompt.

stream
boolean
default:false

If true, partial message deltas will be sent as server-sent events.

temperature
number
default:1

Sampling temperature between 0 and 2. Higher values make output more random.

Required range: 0 <= x <= 2
top_p
number

Nucleus sampling parameter. Must be greater than 0.

Required range: 0 < x <= 1
max_tokens
integer

Maximum number of tokens to generate.

Required range: x >= 1
max_completion_tokens
integer

Maximum number of completion tokens to generate.

Required range: x >= 1
presence_penalty
number
default:0

Penalizes tokens based on presence in the text.

Required range: -2 <= x <= 2
frequency_penalty
number
default:0

Penalizes tokens based on frequency in the text.

Required range: -2 <= x <= 2
seed
integer<int32>

Random seed for generation.

stop
string[]

Up to 4 sequences where the API will stop generating.

logit_bias
object[]

Modifies the likelihood of specified tokens appearing in the completion.

reasoning_effort
enum<string>

Controls the amount of reasoning effort the model uses. Note: This parameter is provider/model-specific and may not be supported by all models (e.g., OpenAI models do not support this parameter). This will be overridden if extra_body.reasoning is specified.

Available options:
none,
low,
minimal,
medium,
high,
xhigh
user
string

A unique identifier for the end user. When used with a router, the same user will consistently receive the same variant across requests (sticky routing).

extra_body
object

Optional parameters for model routing and optimization.

Response

A successful response. Returns either a complete chat completion or streaming chunks.

id
string

Unique identifier for the chat completion.

object
string

Object type, always 'chat.completion'.

created
integer

Unix timestamp when the completion was created.

model
string

The model that was actually used.

choices
object[]

List of chat completion choices.

usage
object

Token usage statistics.

metadata
object

Routing metadata providing transparency into model selection decisions.