Long Text Input - Inworld AI Documentation

The TTS API accepts up to 2,000 characters per request. For longer content — articles, book chapters, scripts — you need to split the text into chunks, synthesize each one, and stitch the resulting audio back together. We provide ready-to-run scripts in Python and JavaScript that handle this entire pipeline for you.

How It Works

Chunk the text

The input is split into segments under the 2,000-character API limit. The chunking algorithm looks for natural break points in the following priority order:

Paragraph breaks (\n\n)
Line breaks (\n)
Sentence endings (. ! ?)
Last space (fallback)

This ensures audio segments end at natural pauses, producing smooth-sounding output.

Synthesize each chunk

Each chunk is sent to the TTS API with controlled concurrency and automatic retry logic for rate limits. Chunks are processed in parallel (default: 2 concurrent requests) to speed up synthesis while respecting API rate limits.

Stitch the audio

The individual audio responses are combined into a single output file. The Python script produces a WAV file with configurable silence between segments, while the JavaScript script produces an MP3 file and uses ffmpeg to merge segments with correct duration metadata.

Configuration

Both scripts share the same tunable parameters:

Parameter	Default	Description
`MIN_CHUNK_SIZE`	500	Minimum characters before looking for a break point
`MAX_CHUNK_SIZE`	1,900	Maximum chunk size (stays under the 2,000-char API limit)
`MAX_CONCURRENT_REQUESTS`	2	Parallel API requests (increase with caution to avoid rate limits)
`MAX_RETRIES`	3	Retry attempts for rate-limited requests with exponential backoff

Running the Scripts

Prerequisites

An Inworld API key set as the INWORLD_API_KEY environment variable
A text file with your long-form content
Python 3 (for the Python script) or Node.js (for the JavaScript script)
ffmpeg (optional, for the JS script — produces correct MP3 duration metadata)

Python

export INWORLD_API_KEY=your_api_key_here
pip install requests python-dotenv
python example_tts_long_input.py

The script reads the input text file, chunks it, synthesizes all chunks with the Inworld TTS API, and saves the combined audio as a WAV file. It also prints a splice report showing the exact timestamps where chunks were joined, useful for quality checking.

JavaScript

export INWORLD_API_KEY=your_api_key_here
node example_tts_long_input_compressed.js

The script follows the same chunking and synthesis pipeline, outputting a compressed MP3 file. When ffmpeg is available, it merges segments with correct duration metadata. Otherwise, it falls back to raw concatenation.

Code Examples

Python

WAV output with splice report and configurable silence between segments

JavaScript

Compressed MP3 output with ffmpeg-based segment merging

Next Steps

Synthesize Speech

Learn about the standard (non-streaming) synthesis API.

Streaming API

Use streaming for real-time playback of shorter content.

Latency Best Practices

Optimize time-to-first-audio for real-time use cases.

​How It Works

​Configuration

​Running the Scripts

​Prerequisites

​Python

​JavaScript

​Code Examples

Python

JavaScript

​Next Steps

Synthesize Speech

Streaming API

Latency Best Practices

How It Works

Configuration

Running the Scripts

Prerequisites

Python

JavaScript

Code Examples

Next Steps