> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Cost Optimizer

> Reduce LLM API costs by routing based on query complexity

## Overview

The cost optimizer routes simple queries to cheap, fast models and reserves expensive flagship models for complex tasks that actually need them. Your application classifies each query's complexity and passes it as metadata — Inworld Router handles the rest.

## The Problem

Without intelligent routing, every API call — from "Hello" to "Analyze this legal contract" — goes to the same expensive model. You're paying GPT-5 prices ($5.00/1M tokens) for queries that a $0.05/1M token model handles just as well.

## The Solution

Your application classifies query complexity (simple vs. complex) and sends it as `complexity`. Inworld Router uses CEL conditions to route each query to the right model tier:

* **Simple queries** (greetings, basic Q\&A, summaries) → Cost-effective models
* **Complex queries** (analysis, reasoning, code generation) → Premium models

## Quick Start (No Router Needed)

The fastest way to optimize costs — no router creation required:

```bash theme={"system"}
# Simple query → cheapest model
curl --request POST \
  --url https://api.inworld.ai/v1/chat/completions \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello, how are you?"}],
    "extra_body": {
      "sort": ["price", "latency"]
    }
  }'
```

```bash theme={"system"}
# Complex query → smartest model
curl --request POST \
  --url https://api.inworld.ai/v1/chat/completions \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Analyze this legal contract and identify liability clauses."}],
    "extra_body": {
      "sort": ["intelligence", "price"]
    }
  }'
```

Your application decides the `sort` priority per request. No router configuration needed.

## Advanced: Router with Complexity-Based Routing

For production systems, create a router with conditional routes so Inworld Router handles the routing logic server-side. Your app passes `complexity` and the router does the rest.

### Step 1: Create the Router

```bash theme={"system"}
curl --request POST \
  --url https://api.inworld.ai/router/v1/routers \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "name": "routers/cost-optimizer",
    "defaults": {
      "text_generation_config": {
        "max_new_tokens": 1024,
        "temperature": 0.7
      }
    },
    "routes": [
      {
        "route": {
          "route_id": "complex-queries",
          "variants": [
            {
              "variant": {
                "variant_id": "premium",
                "model_selection": {
                  "models": [
                    "openai/gpt-5",
                    "anthropic/claude-opus-4-6"
                  ],
                  "sort": [
                    {"metric": "SORT_METRIC_INTELLIGENCE"},
                    {"metric": "SORT_METRIC_PRICE"}
                  ]
                }
              },
              "weight": 100
            }
          ]
        },
        "condition": {
          "cel_expression": "complexity == \"complex\""
        }
      }
    ],
    "defaultRoute": {
      "route_id": "simple-queries",
      "variants": [
        {
          "variant": {
            "variant_id": "budget",
            "model_selection": {
              "models": [
                "groq/llama-3.1-8b-instant",
                "google-ai-studio/gemini-2.5-flash"
              ],
              "sort": [
                {"metric": "SORT_METRIC_PRICE"},
                {"metric": "SORT_METRIC_LATENCY"}
              ]
            }
          },
          "weight": 100
        }
      ]
    }
  }'
```

<Note>
  Routes are evaluated in order. Complex queries match the first route. Everything else (simple queries, or requests without complexity metadata) falls through to `defaultRoute` — the cheap models. This means if your app forgets to set `complexity`, the request defaults to the budget tier, which is the safe choice for cost optimization.
</Note>

### Step 2: Classify and Route

Your application determines complexity and passes it in `metadata`:

```bash theme={"system"}
# Simple query → routes to defaultRoute (budget models)
curl --request POST \
  --url https://api.inworld.ai/v1/chat/completions \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "inworld/cost-optimizer",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
    "extra_body": {
      "metadata": {
        "complexity": "simple"
      }
    }
  }'
```

```bash theme={"system"}
# Complex query → routes to premium models
curl --request POST \
  --url https://api.inworld.ai/v1/chat/completions \
  --header 'Authorization: Bearer <your-api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "inworld/cost-optimizer",
    "messages": [
      {"role": "user", "content": "Analyze this legal contract and identify all potential liability clauses, then provide a risk assessment with recommendations."}
    ],
    "extra_body": {
      "metadata": {
        "complexity": "complex"
      }
    }
  }'
```

### Step 3: Classify in Your App

Here's a simple classification approach for your backend:

```python theme={"system"}
def classify_complexity(user_message: str) -> str:
    """Simple heuristic: short messages and common patterns are 'simple'."""
    simple_patterns = ["hello", "hi", "hey", "thanks", "summarize", "what is", "define"]
    message_lower = user_message.lower().strip()
    
    if len(user_message) < 50:
        return "simple"
    if any(message_lower.startswith(p) for p in simple_patterns):
        return "simple"
    return "complex"

# Use it when making requests
complexity = classify_complexity(user_message)
response = client.chat.completions.create(
    model="inworld/cost-optimizer",
    messages=[{"role": "user", "content": user_message}],
    extra_body={
        "metadata": {"complexity": complexity}
    }
)
```

## Cost Savings Example

Consider a typical workload where 70% of queries are simple:

* **Simple queries** (70%): 1M tokens/month
* **Complex queries** (30%): 500K tokens/month

**Without routing:**

* All queries use GPT-5: 1.5M tokens × $5.00 = **$7,500/month\*\*

**With cost-optimized routing:**

* Simple queries use Llama 3 8B: 1M tokens × $0.05 = $50
* Complex queries use GPT-5: 500K tokens × $5.00 = $2,500
* **Total: \$2,550/month**

**Savings: 66% reduction** (\$4,950/month saved)

## Best Practices

1. **Start simple**: Use the Quick Start approach (per-request `sort`) before building a router
2. **Default to cheap**: Make `defaultRoute` the budget tier so unclassified queries don't waste money
3. **Monitor quality**: Track response quality per tier to ensure budget models meet your minimum bar
4. **Refine classification**: Start with simple heuristics, then improve with actual usage data
5. **Add more tiers**: For production, consider three tiers — budget, standard, premium — using multiple CEL conditions

## Next Steps

* [Failover System](/router/guides/failover-system) to add reliability on top of cost optimization
* [Extra Body Parameters](/router/usage/extra-body-parameters) for all available `sort` criteria
* [Conditional Routing](/router/capabilities/conditional-routing) for advanced CEL expressions
