> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Moderations

> Classify text for harmful content with OpenAI-compatible and AILuminate safety categories

Inworld Router provides moderation endpoints that classify text against safety categories. Use them to screen user input before sending it to an LLM, filter model output before displaying it to users, or moderate content in batch pipelines.

| Endpoint                                                                  | Input                      | OAI SDK compatible | Use case                                        |
| :------------------------------------------------------------------------ | :------------------------- | :----------------- | :---------------------------------------------- |
| [`/v1/moderations`](/api-reference/routerAPI/create-moderation)           | String or array of strings | Schema-compatible  | Moderate standalone text                        |
| [`/v1/chat/moderations`](/api-reference/routerAPI/create-chat-moderation) | Chat messages              | No                 | Moderate a conversation with configurable scope |

Both endpoints return the same classification structure: OpenAI-compatible categories plus [AILuminate](https://ailuminate.mlcommons.org/) safety signals.

<Note>
  `category_scores` values are returned as integers (e.g., `0`) rather than floats (e.g., `0.0`). If your code expects floats, cast accordingly.
</Note>

## Quickstart

The `/v1/moderations` endpoint works directly with the OpenAI SDK — just change the base URL:

<CodeGroup>
  ```python Python theme={"system"}
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.inworld.ai/v1",
      api_key="YOUR_INWORLD_API_KEY"
  )

  response = client.moderations.create(input="Hello world!")
  print(response.results[0].flagged)
  ```

  ```javascript Node theme={"system"}
  import OpenAI from 'openai';

  const client = new OpenAI({
    baseURL: 'https://api.inworld.ai/v1',
    apiKey: 'YOUR_INWORLD_API_KEY',
  });

  const response = await client.moderations.create({ input: 'Hello world!' });
  console.log(response.results[0].flagged);
  ```

  ```bash curl theme={"system"}
  curl -X POST https://api.inworld.ai/v1/moderations \
    -H "Authorization: Bearer $INWORLD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"input": "Hello world!"}'
  ```
</CodeGroup>

## Conversation moderation

To moderate messages in a chat conversation, use `/v1/chat/moderations`. The `scope` parameter controls which messages are evaluated:

| `scope` value          | Behavior                                   |
| :--------------------- | :----------------------------------------- |
| `"last"` (default)     | Classify only the last message             |
| `"all"`                | Classify every message in the conversation |
| `N` (positive integer) | Classify the last `N` messages             |

```bash theme={"system"}
curl -X POST https://api.inworld.ai/v1/chat/moderations \
  -H "Authorization: Bearer $INWORLD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Hello!"},
      {"role": "assistant", "content": "Hi there!"},
      {"role": "user", "content": "Tell me something"}
    ],
    "scope": 2
  }'
```

<Warning>
  Setting `scope` to `"all"` or a large number increases response latency because more content needs to be processed. For real-time applications, prefer the default (`"last"`) and only broaden scope when you need full-conversation safety checks.
</Warning>

<Note>
  `/v1/chat/moderations` is not compatible with the OpenAI SDK — it accepts messages instead of strings and returns a single `result` object instead of a `results` array. Use `/v1/moderations` for SDK compatibility.
</Note>

## Content categories

Each result includes boolean flags and numeric confidence scores for 13 categories:

| Category                 | Description                                        |
| :----------------------- | :------------------------------------------------- |
| `sexual`                 | Sexual content                                     |
| `sexual/minors`          | Sexual content involving minors                    |
| `harassment`             | Harassing language toward any target               |
| `harassment/threatening` | Harassment that includes violence or serious harm  |
| `hate`                   | Hate speech based on protected characteristics     |
| `hate/threatening`       | Hate speech that includes violence or serious harm |
| `illicit`                | Content advising or describing illicit acts        |
| `illicit/violent`        | Illicit content involving violence or weapons      |
| `self-harm`              | Content promoting or depicting self-harm           |
| `self-harm/intent`       | Expressed intent to engage in self-harm            |
| `self-harm/instructions` | Instructions for committing self-harm              |
| `violence`               | Content depicting violence toward a person         |
| `violence/graphic`       | Graphic depictions of death, violence, or injury   |

The `flagged` field is `true` when **any** category exceeds the default threshold. Use `category_scores` (0–1 confidence values) to set custom thresholds for your application.

## AILuminate

Both endpoints include an `ailuminate` object with safety classifications based on the [AILuminate benchmark](https://ailuminate.mlcommons.org/) by MLCommons, providing more granular signals beyond the standard OpenAI categories.

| Field        | Type      | Description                                                                |
| :----------- | :-------- | :------------------------------------------------------------------------- |
| `safety`     | `string`  | Overall assessment: `"safe"`, `"unsafe"`, or `"controversial"`             |
| `categories` | `object`  | 12 fine-grained safety categories                                          |
| `extensions` | `object`  | Additional signals: `politically_sensitive`, `unethical_acts`, `jailbreak` |
| `refusal`    | `boolean` | Whether the content represents a refusal to comply                         |

The `safety` field classifies content into three levels. `"safe"` content is benign. `"unsafe"` content is clearly harmful and always sets `flagged: true`. `"controversial"` content falls in between — it may touch sensitive topics without being explicitly harmful. By default, `"controversial"` content is treated as safe and does not set `flagged: true`. For stricter moderation, treat `"controversial"` the same as `"unsafe"`.

**AILuminate categories:** `violent_crimes`, `sex_related_crimes`, `child_sexual_exploitation`, `suicide_self_harm`, `indiscriminate_weapons`, `intellectual_property`, `defamation`, `non_violent_crimes`, `hate`, `specialized_advice`, `privacy`, `sexual_content`

The `jailbreak` extension is particularly useful for detecting prompt injection attempts before they reach your LLM.

## Best practices

* **Screen both inputs and outputs.** Run moderation on user prompts before sending them to the model and on model responses before displaying to users.
* **Use `category_scores` for custom thresholds.** The `flagged` boolean uses default thresholds. For your application, you may want stricter thresholds for certain categories (e.g., `sexual/minors`) and more permissive ones for others.
* **Use `scope: "last"` for real-time chat.** Only broaden to `"all"` or `N` when you need full-conversation safety audits and can tolerate higher latency.
* **Batch text inputs.** When moderating multiple pieces of content, pass an array to `/v1/moderations` instead of making separate requests.
* **Combine with other safety layers.** Moderation should be one part of your safety strategy alongside system prompts, output filtering, and human review.
