Use the @inworld/tts package to add speech synthesis, voice cloning, and voice design to your Node.js app
The @inworld/tts Node.js SDK wraps the Inworld TTS REST API with a clean, typed interface. It handles chunking for long text, retries with exponential backoff, and connection management automatically — reducing typical integrations from 30+ lines of raw HTTP to just a few lines of code.
import { InworldTTS } from '@inworld/tts';import fs from 'fs';const tts = InworldTTS(); // reads INWORLD_API_KEY from envconst audio = await tts.generate({ text: 'What a wonderful day to be a text-to-speech model!', voice: 'Ashley',});fs.writeFileSync('output.mp3', audio);
Stream audio chunks over HTTP as they are generated. Lower time-to-first-audio than generate(). Text must be 2,000 characters or fewer.
const chunks = [];for await (const chunk of tts.stream({ text: 'Streaming is great for real-time playback!', voice: 'Ashley',})) { chunks.push(chunk);}const audio = Buffer.concat(chunks);
Parameters are the same as generate(), except text must be ≤2,000 characters and the default model is "inworld-tts-1.5-mini".Yields:Uint8Array — audio chunks as they arrive.
Stream audio chunks, each paired with optional timestamp data. Text must be ≤2,000 characters.
for await (const chunk of tts.streamWithTimestamps({ text: 'Streaming with timestamps!', voice: 'Ashley', timestampType: 'WORD',})) { // chunk.audio: Uint8Array // chunk.timestamps: TimestampInfo | undefined}
Takes all the same parameters as stream(), plus timestampType (required). Default model is "inworld-tts-1.5-mini".Yields:{ audio: Uint8Array, timestamps?: TimestampInfo }
Play audio from a Uint8Array or file path. Encoding is auto-detected from magic bytes unless overridden.
const audio = await tts.generate({ text: 'Listen to this!', voice: 'Ashley' });await tts.play(audio);// Or play from a fileawait tts.play('output.mp3');
Parameter
Type
Required
Default
Description
audio
Uint8Array | string
Yes
—
Raw audio bytes or a file path (Node.js only).
encoding
string
No
auto-detected
Format hint ("MP3", "WAV", etc.). Inferred from extension for file paths.
Clone a voice from one or more audio recordings. Only 5–15 seconds of audio is needed.
const result = await tts.cloneVoice({ audioSamples: ['./recording.wav'], displayName: 'My Cloned Voice', lang: 'EN_US',});console.log(result.voice.voiceId); // use this ID in generate()
Parameter
Type
Required
Default
Description
audioSamples
Array<Uint8Array | string>
Yes
—
Audio files as Uint8Array/Buffer, or file paths (Node.js only). WAV or MP3.
displayName
string
No
"Cloned Voice"
Display name for the cloned voice.
lang
string
No
"EN_US"
Language code of the recordings.
transcriptions
string[]
No
—
Transcriptions aligned with each audio sample. Improves clone quality.
description
string
No
—
Voice description.
tags
string[]
No
—
Tags for filtering.
removeBackgroundNoise
boolean
No
false
Apply noise reduction before cloning.
Returns:CloneVoiceResult — the cloned voice ID is at result.voice.voiceId.
Design a new voice from a text description — no audio recording needed.
const result = await tts.designVoice({ designPrompt: 'A warm, friendly female voice with a slight British accent', previewText: 'Hello! Welcome to our application.', numberOfSamples: 3,});// Listen to previews, then publish the one you likeconst chosenVoice = result.previewVoices[0];
Parameter
Type
Required
Default
Description
designPrompt
string
Yes
—
Natural-language description of the voice (30–250 characters).
previewText
string
Yes
—
Text the generated voice will speak in the preview.
lang
string
No
"EN_US"
Language code.
numberOfSamples
number
No
1
Number of preview candidates (1–3).
Returns:DesignVoiceResult — preview voices at result.previewVoices.
Migrate a voice from ElevenLabs to your Inworld workspace. Fetches the voice’s audio samples directly from ElevenLabs and clones them into Inworld. No ElevenLabs SDK required.
The SDK works in browsers (Vite, webpack 5, Rollup, esbuild) with no extra configuration. Use JWT tokens instead of API keys to keep your credentials safe.
In production, your backend generates short-lived JWT tokens and your frontend uses them to authenticate. See the JWT authentication guide and the sample Node.js JWT app for how to set up the server-side token endpoint.
import { InworldTTS } from '@inworld/tts';async function fetchToken() { const res = await fetch('/api/tts-token'); const { token } = await res.json(); return token;}const tts = InworldTTS({ token: await fetchToken(), onTokenExpiring: fetchToken, // called automatically ~5 min before expiry});
The onTokenExpiring callback fires automatically when the current token is about to expire. It must return a fresh JWT string. The SDK uses a stale-while-revalidate strategy — requests continue with the current token while the refresh happens in the background.
For quick prototyping, you can use an API key directly in the browser by setting dangerouslyAllowBrowser:
// ⚠️ Development only — your API key will be visible in DevToolsconst tts = InworldTTS({ apiKey: 'your_key', dangerouslyAllowBrowser: true,});
Never use dangerouslyAllowBrowser in production. Your API key will be visible in browser DevTools and billed to your account. Use JWT authentication instead.
generate() and generateWithTimestamps() automatically chunk text longer than 2,000 characters and send chunks in parallel (controlled by maxConcurrentRequests). The resulting audio is seamlessly concatenated, and timestamp offsets are merged correctly.stream() and streamWithTimestamps() require text of 2,000 characters or fewer. For longer text with streaming, split the text yourself and call stream() for each segment.