Prompt compression automatically shortens long messages before they reach the LLM. This reduces the number of input tokens without meaningfully changing the response quality — saving cost on every request. Compression works best on long system prompts, detailed instructions, and context-heavy messages. Messages shorter than 250 tokens are skipped automatically.Documentation Index
Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
Use this file to discover all available pages before exploring further.
Router-level compression
Setcompression on a router variant to compress all message templates in that variant. This is the most common setup - configure once, and every request through this variant benefits from compression.
Router with compression
| Parameter | Type | Range | Default | Description |
|---|---|---|---|---|
aggressiveness | float | 0.0–1.0 | 0.5 | How aggressively to compress. Higher values save more tokens but may reduce fidelity. |
Variant-level compression only applies to the router’s message templates. It does not affect messages sent by the user in the request. Individual templates can override the variant-level aggressiveness by setting their own
compression field.Message-level compression
You can compress specific messages in a chat completion request by adding acompression field to any message. Consecutive messages with the same aggressiveness are batched and compressed together. Non-consecutive compressed messages are processed separately.
compression field are compressed. Other messages are sent as-is.
Response
The response includes compression stats inmetadata.compression:
| Field | Description |
|---|---|
original_tokens | Total tokens before compression |
compressed_tokens | Total tokens after compression |
saved_tokens | Tokens saved (original - compressed) |
compression field is omitted from the response.
Warnings
When compression is enabled but skipped or fails for specific messages, the response includes warnings inmetadata.attempts[].warnings:
- A message has compression enabled but is too short (below 250 tokens)
- The compression service fails for a message (the message is sent uncompressed)