Skip to main content
The TTS API accepts up to 2,000 characters per request. For longer content — articles, book chapters, scripts — you need to split the text into chunks, synthesize each one, and stitch the resulting audio back together. We provide ready-to-run scripts in Python and JavaScript that handle this entire pipeline for you.

How It Works

1

Chunk the text

The input is split into segments under the 2,000-character API limit. The chunking algorithm looks for natural break points in the following priority order:
  1. Paragraph breaks (\n\n)
  2. Line breaks (\n)
  3. Sentence endings (. ! ?)
  4. Last space (fallback)
This ensures audio segments end at natural pauses, producing smooth-sounding output.
2

Synthesize each chunk

Each chunk is sent to the TTS API with controlled concurrency and automatic retry logic for rate limits. Chunks are processed in parallel (default: 2 concurrent requests) to speed up synthesis while respecting API rate limits.
3

Stitch the audio

The individual audio responses are combined into a single output file. The Python script produces a WAV file with configurable silence between segments, while the JavaScript script produces an MP3 file and uses ffmpeg to merge segments with correct duration metadata.

Configuration

Both scripts share the same tunable parameters:
ParameterDefaultDescription
MIN_CHUNK_SIZE500Minimum characters before looking for a break point
MAX_CHUNK_SIZE1,900Maximum chunk size (stays under the 2,000-char API limit)
MAX_CONCURRENT_REQUESTS2Parallel API requests (increase with caution to avoid rate limits)
MAX_RETRIES3Retry attempts for rate-limited requests with exponential backoff

Running the Scripts

Prerequisites

  • An Inworld API key set as the INWORLD_API_KEY environment variable
  • A text file with your long-form content
  • Python 3 (for the Python script) or Node.js (for the JavaScript script)
  • ffmpeg (optional, for the JS script — produces correct MP3 duration metadata)

Python

export INWORLD_API_KEY=your_api_key_here
pip install requests python-dotenv
python example_tts_long_input.py
The script reads the input text file, chunks it, synthesizes all chunks with the Inworld TTS API, and saves the combined audio as a WAV file. It also prints a splice report showing the exact timestamps where chunks were joined, useful for quality checking.

JavaScript

export INWORLD_API_KEY=your_api_key_here
node example_tts_long_input_compressed.js
The script follows the same chunking and synthesis pipeline, outputting a compressed MP3 file. When ffmpeg is available, it merges segments with correct duration metadata. Otherwise, it falls back to raw concatenation.

Code Examples

Next Steps