> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Prompting for TTS-2

> Prompt engineering techniques that use inworld-tts-2 steering to produce expressive, directed speech from LLM output.

When an LLM generates text that gets fed into TTS, the default output often sounds flat and unnatural. With `inworld-tts-2`, you can go further: instruct the LLM to embed [steering](/tts/capabilities/steering) tags directly in its output. The result is speech that isn't just well-formatted, but actively directed, with emotion, pacing, volume, and vocal style shaped by the LLM itself.

This page covers what is new for `inworld-tts-2`. The guidance in [Prompting for TTS](/tts/best-practices/prompting-for-tts) still generally applies as a best practice, especially in cases where no steering instructions are applied.

<Note>
  Steering is fully supported only on `inworld-tts-2`. On prior models, descriptive steering instructions (e.g. `[say with a hint of amusement]`) may be spoken aloud verbatim instead of interpreted. For consistent results, use `inworld-tts-2` for steering.
</Note>

## Instructing the LLM to use steering

The [Steering](/tts/capabilities/steering) page documents all supported instruction tags across emotion, speed, volume, vocal style, tone, non-verbals, and free-form directions. To make your LLM use them, include a section in your system prompt that explains the tag format and lists the tags relevant to your use case.

**Prompt snippet:**

```
Your responses will be spoken aloud using inworld-tts-2, which supports
instruction tags — natural language directions in square brackets placed before
the text they apply to.

Use instruction tags to match your delivery to the content. The following are
suggestions; natural language instructions can be used to describe the
appropriate delivery:
- Emotion: [say excitedly], [sound sad], [sound concerned], [sound terrified]
- Articulation: [say with force], [articulate clearly], [say with deliberate pauses]
- Intonation: [say with a falling pitch], [say with a rising pitch]
- Volume: [very quiet], [very loud]
- Pitch: [say in a low tone], [say in a high pitch]
- Range: [say playfully], [say with no pitch variation]
- Speed: [very fast], [very slow]
- Vocal style: [whisper in a hushed style], [give a nasal quality]
- Non-verbals: [laugh], [sigh], [clear throat], [breathe], [cough], [yawn]

For maximum control, combine qualities from multiple categories in a single
natural language instruction. A bare tag like [sound sad] gives the model one
dimension to work with. A fuller instruction like [say sadly with deliberate
pauses in a low voice and hushed style] layers mood, rhythm, pitch, and mode —
producing a more nuanced and convincing performance.

Place the tag at the start of the text it applies to. A single tag can apply
across multiple sentences; repeat or change tags only when the delivery should
change. Non-verbal tags can also be used inline where they occur. Do not
apply a tag that contradicts the content of the text. Avoid combining opposing
directions in the same tag — for example, [whisper in a hushed style] and
[very loud] together produce unpredictable results.
```

**Before (no instruction tags):**

> I have great news. Your package has arrived.

**After (with instruction tags):**

> \[say excitedly with a high pitch and fast pace] I have great news. Your package has arrived!

For the full list of supported tags and examples, see the [Steering](/tts/capabilities/steering) page.

## Example Prompt Templates

Below are complete, copyable system prompt blocks for common use cases. Each template combines steering with the text formatting guidance from [Prompting for TTS](/tts/best-practices/prompting-for-tts).

<Tabs>
  <Tab title="Companion / Conversational">
    Use this template for chatbots, AI companions, virtual friends, and other informal conversational applications.

    ```
    ## Speech Output Rules

    Your responses will be converted to speech using inworld-tts-2. Follow these
    rules to produce natural, expressive, directed spoken output:

    ### Instruction Tags
    - Open with an instruction tag that captures the emotional quality of your
      response; combine mood, pitch, pacing, and manner for best results:
      [say excitedly with a high pitch and fast pace],
      [say sadly with deliberate pauses in a low voice and hushed style],
      [sound concerned with a measured pace and low tone]
    - For intimate or private moments, combine volume and manner:
      [quietly with a warm and gentle tone]
    - Insert non-verbal tags where organic: [laugh], [sigh], [breathe]
    - Place tags at the start of the sentence they apply to

    ### Emphasis
    - Capitalize full words for stress: "I told you NOT to do that"
    - Capitalize syllables for nuance: "AbsoLUTEly"
    - Use sparingly for maximum effect

    ### Naturalness
    - Include filler words (uh, um, well, like, you know) where a human would naturally pause
    - Vary sentence length for natural rhythm
    - Use contractions (don't, can't, I'm, we're) instead of formal forms

    ### Text Formatting
    - Write numbers in spoken form: "twenty-three" not "23"
    - Write dates in spoken form: "march fifteenth" not "3/15"
    - Never use markdown formatting, bullet points, or structured text
    - Never use emojis or special characters
    - Write everything as natural spoken sentences
    ```
  </Tab>

  <Tab title="Support / Sales">
    Use this template for customer support agents, sales assistants, and other professional conversational applications.

    ```
    ## Speech Output Rules

    Your responses will be converted to speech using inworld-tts-2. Follow these
    rules to produce clear, professional, directed spoken output:

    ### Instruction Tags
    - When acknowledging a customer's problem, combine concern with pacing:
      [sound concerned with a measured pace and low tone]
    - When delivering sensitive information, combine volume and manner:
      [quietly with a calm and steady tone]
    - For time-sensitive alerts, combine speed and manner:
      [speak quickly with a clear and direct manner]
    - When combining qualities, keep the tone professional and measured
    - Do NOT use non-verbal tags (laugh, sigh, etc.) — maintain professionalism
    - Place tags at the start of the sentence they apply to

    ### Emphasis
    - Capitalize key words to draw attention to critical information:
      "Your order will arrive by FRIDAY" or "This offer expires TONIGHT"
    - Use sparingly

    ### Professionalism
    - Do NOT use filler words (uh, um, like, you know)
    - Maintain a warm but professional tone
    - Use contractions naturally (don't, we'll, you're)

    ### Numbers and Data
    - Speak account numbers digit by digit: "one two three four five six"
    - Speak prices naturally: "forty-nine ninety-nine"
    - Speak dates fully: "january fifteenth, twenty twenty-five"

    ### Text Formatting
    - Never use markdown formatting, bullet points, or structured text
    - Never use emojis or special characters
    - Write everything as natural spoken sentences
    ```
  </Tab>

  <Tab title="Dev Tools / Technical">
    Use this template for coding assistants, documentation readers, technical narrators, and developer-facing tools.

    ```
    ## Speech Output Rules

    Your responses will be converted to speech using inworld-tts-2. Follow these
    rules to produce accurate, well-paced technical speech:

    ### Instruction Tags
    - For urgent alerts, combine speed and manner:
      [very fast with a sharp and urgent tone]
    - For critical steps, combine pace and articulation:
      [very slow with deliberate pauses and clear articulation]
    - When flagging errors or risks, combine concern with pacing:
      [sound concerned with a measured pace and low tone]
    - Do NOT use non-verbal tags — maintain a focused, technical delivery
    - Place tags at the start of the sentence they apply to

    ### Emphasis
    - Capitalize key technical terms or required actions: "you MUST run this as root"

    ### Technical Accuracy
    - Speak URLs by component: "github dot com slash inworld dash AI"
    - Speak code identifiers in plain English: "the getUserName function"
    - Speak version numbers naturally: "version three point two"

    ### Pacing
    - Use measured, even pacing. Avoid rushing through technical content.
    - Use periods to separate distinct steps or key terms
    - Do NOT use filler words (uh, um, like, you know)

    ### Text Formatting
    - Write all numbers in spoken form: "forty-two" not "42"
    - Never use markdown formatting, bullet points, or code blocks
    - Write everything as natural spoken sentences
    ```
  </Tab>
</Tabs>

## Tips for Iterating

* **Test with the TTS Playground**: Use the [TTS Playground](/tts/tts-playground) to hear how your LLM output sounds when synthesized. Paste in sample outputs with instruction tags and iterate until the speech quality meets your needs.
* **Check for tag/content mismatches**: The LLM should not apply a instruction tag that contradicts the content. A `[sound sad]` tag on celebratory text will produce degraded output. Review LLM outputs for mismatches during testing.
* **Avoid conflicting instructions**: Instruct the LLM not to combine opposing directions in the same tag. Pairing `[whisper in a hushed style]` with `[very loud]` produces unpredictable results. One clear instruction per tag is the rule.

## Next Steps

<CardGroup cols={3}>
  <Card title="Steering" icon="wand-magic-sparkles" href="/tts/capabilities/steering">
    Full reference for all instruction tags, free-form instructions, non-verbals, and best practices.
  </Card>

  <Card title="Pause Controls" icon="waveform-lines" href="/tts/capabilities/pause-controls">
    Add precise pauses to your speech with SSML break tags.
  </Card>

  <Card title="Prompting for TTS" icon="message-lines" href="/tts/best-practices/prompting-for-tts">
    Prompt engineering techniques that apply to all Inworld TTS models.
  </Card>
</CardGroup>