Skip to main content
The @inworld/tts Node.js SDK wraps the Inworld TTS REST API with a clean, typed interface. It handles chunking for long text, retries with exponential backoff, and connection management automatically — reducing typical integrations from 30+ lines of raw HTTP to just a few lines of code.
npm install @inworld/tts

Quick Start

import { InworldTTS } from '@inworld/tts';
import fs from 'fs';

const tts = InworldTTS(); // reads INWORLD_API_KEY from env

const audio = await tts.generate({
  text: 'What a wonderful day to be a text-to-speech model!',
  voice: 'Ashley',
});

fs.writeFileSync('output.mp3', audio);

Speech Synthesis

generate(options)

Synthesize speech and return the complete audio as a Uint8Array. Text longer than 2,000 characters is automatically chunked and sent in parallel.
const audio = await tts.generate({
  text: 'Hello, world!',
  voice: 'Ashley',
  model: 'inworld-tts-1.5-max',
  encoding: 'MP3',
  outputFile: 'output.mp3', // optional — also writes to disk
});
ParameterTypeRequiredDefaultDescription
textstringYesText to synthesize. Any length. Supports <break time="Xs"/> SSML.
voicestringYesVoice ID (e.g. "Ashley", "Dennis", or a custom voice ID).
modelstringNo"inworld-tts-1.5-max"Model ID.
encodingstringNo"MP3"Audio format: MP3, OGG_OPUS, FLAC, LINEAR16, WAV, PCM, ALAW, MULAW.
sampleRatenumberNo48000Sample rate in Hz.
bitRatenumberNo128000Bit rate in bps (MP3 / OGG_OPUS only).
speakingRatenumberNo1.0Speed multiplier (0.5–1.5).
temperaturenumberNo1.0Expressiveness (0.0–2.0). Higher = more expressive.
outputFilestringNoWrite audio to this file path (Node.js only).
playbooleanNofalsePlay audio immediately after synthesis (Node.js only).
Returns: Uint8Array — raw audio bytes in the requested encoding.

stream(options)

Stream audio chunks over HTTP as they are generated. Lower time-to-first-audio than generate(). Text must be 2,000 characters or fewer.
const chunks = [];

for await (const chunk of tts.stream({
  text: 'Streaming is great for real-time playback!',
  voice: 'Ashley',
})) {
  chunks.push(chunk);
}

const audio = Buffer.concat(chunks);
Parameters are the same as generate(), except text must be ≤2,000 characters and the default model is "inworld-tts-1.5-mini". Yields: Uint8Array — audio chunks as they arrive.

generateWithTimestamps(options)

Same as generate() but also returns word- or character-level timing data. Useful for lip-sync, karaoke, and subtitle alignment.
const { audio, timestamps } = await tts.generateWithTimestamps({
  text: 'Timestamps are useful for lip sync.',
  voice: 'Ashley',
  timestampType: 'WORD',
});

// timestamps.wordAlignment.words → ['Timestamps', 'are', 'useful', ...]
// timestamps.wordAlignment.wordStartTimeSeconds → [0.0, 0.42, 0.61, ...]
Takes all the same parameters as generate(), plus:
ParameterTypeRequiredDescription
timestampType"WORD" | "CHARACTER"Yes"WORD" returns word timing, phonemes, and visemes. "CHARACTER" returns per-character timing.
Returns: { audio: Uint8Array, timestamps: TimestampInfo }

streamWithTimestamps(options)

Stream audio chunks, each paired with optional timestamp data. Text must be ≤2,000 characters.
for await (const chunk of tts.streamWithTimestamps({
  text: 'Streaming with timestamps!',
  voice: 'Ashley',
  timestampType: 'WORD',
})) {
  // chunk.audio: Uint8Array
  // chunk.timestamps: TimestampInfo | undefined
}
Takes all the same parameters as stream(), plus timestampType (required). Default model is "inworld-tts-1.5-mini". Yields: { audio: Uint8Array, timestamps?: TimestampInfo }

play(audio, options)

Play audio from a Uint8Array or file path. Encoding is auto-detected from magic bytes unless overridden.
const audio = await tts.generate({ text: 'Listen to this!', voice: 'Ashley' });
await tts.play(audio);

// Or play from a file
await tts.play('output.mp3');
ParameterTypeRequiredDefaultDescription
audioUint8Array | stringYesRaw audio bytes or a file path (Node.js only).
encodingstringNoauto-detectedFormat hint ("MP3", "WAV", etc.). Inferred from extension for file paths.

Voice Management

listVoices(options)

List available voices, optionally filtered by language.
const voices = await tts.listVoices();

// Filter by language
const enVoices = await tts.listVoices({ lang: 'EN_US' });
const multiLang = await tts.listVoices({ lang: ['EN_US', 'ES_ES'] });
ParameterTypeRequiredDescription
langstring | string[]NoFilter by language code(s). Returns all voices when omitted.
Returns: VoiceInfo[]

getVoice(voice)

Get details for a single voice. Works with custom voices in your workspace (cloned or designed voices).
const voice = await tts.getVoice('my-custom-voice-id');
// voice.voiceId, voice.displayName, voice.langCode, ...
Returns: VoiceInfo

cloneVoice(options)

Clone a voice from one or more audio recordings. Only 5–15 seconds of audio is needed.
const result = await tts.cloneVoice({
  audioSamples: ['./recording.wav'],
  displayName: 'My Cloned Voice',
  lang: 'EN_US',
});

console.log(result.voice.voiceId); // use this ID in generate()
ParameterTypeRequiredDefaultDescription
audioSamplesArray<Uint8Array | string>YesAudio files as Uint8Array/Buffer, or file paths (Node.js only). WAV or MP3.
displayNamestringNo"Cloned Voice"Display name for the cloned voice.
langstringNo"EN_US"Language code of the recordings.
transcriptionsstring[]NoTranscriptions aligned with each audio sample. Improves clone quality.
descriptionstringNoVoice description.
tagsstring[]NoTags for filtering.
removeBackgroundNoisebooleanNofalseApply noise reduction before cloning.
Returns: CloneVoiceResult — the cloned voice ID is at result.voice.voiceId.

designVoice(options)

Design a new voice from a text description — no audio recording needed.
const result = await tts.designVoice({
  designPrompt: 'A warm, friendly female voice with a slight British accent',
  previewText: 'Hello! Welcome to our application.',
  numberOfSamples: 3,
});

// Listen to previews, then publish the one you like
const chosenVoice = result.previewVoices[0];
ParameterTypeRequiredDefaultDescription
designPromptstringYesNatural-language description of the voice (30–250 characters).
previewTextstringYesText the generated voice will speak in the preview.
langstringNo"EN_US"Language code.
numberOfSamplesnumberNo1Number of preview candidates (1–3).
Returns: DesignVoiceResult — preview voices at result.previewVoices.

publishVoice(options)

Publish a designed or cloned voice preview to your library so it can be used in generate() and stream().
const voice = await tts.publishVoice({
  voice: chosenVoice.voiceId,
  displayName: 'My Designed Voice',
});
ParameterTypeRequiredDescription
voicestringYesVoice ID from designVoice() or cloneVoice().
displayNamestringNoDisplay name for the published voice.
descriptionstringNoDescription.
tagsstring[]NoTags for filtering.
Returns: VoiceInfo

migrateFromElevenLabs(options)

Migrate a voice from ElevenLabs to your Inworld workspace. Fetches the voice’s audio samples directly from ElevenLabs and clones them into Inworld. No ElevenLabs SDK required.
const result = await tts.migrateFromElevenLabs({
  elevenLabsApiKey: process.env.ELEVEN_LABS_API_KEY,
  elevenLabsVoiceId: 'abc123',
});

console.log(`Migrated "${result.elevenLabsName}" → ${result.inworldVoiceId}`);
ParameterTypeRequiredDescription
elevenLabsApiKeystringYesYour ElevenLabs API key.
elevenLabsVoiceIdstringYesElevenLabs voice ID to migrate.
Returns: { elevenLabsVoiceId, elevenLabsName, inworldVoiceId }

Configuration

Create a client with InworldTTS() or the equivalent createClient():
import { InworldTTS, createClient } from '@inworld/tts';

const tts = InworldTTS();                       // reads INWORLD_API_KEY from env
const tts = InworldTTS({ apiKey: 'your_key' }); // or pass explicitly
const tts = createClient();                     // alias for InworldTTS()
OptionTypeRequiredDefaultDescription
apiKeystringINWORLD_API_KEY env varInworld API key. Mutually exclusive with token.
tokenstringJWT token for browser use. Mutually exclusive with apiKey.
onTokenExpiring() => Promise<string>NoCalled when token is about to expire. Return a fresh JWT.
dangerouslyAllowBrowserbooleanNofalseAllow apiKey in browser environments (key will be visible in DevTools).
baseUrlstringNohttps://api.inworld.aiOverride the API base URL.
timeoutnumberNoper-methodGlobal HTTP timeout in milliseconds.
maxRetriesnumberNo2Retry attempts on NetworkError or 5xx. Uses exponential backoff (1s, 2s, 4s… capped at 16s). 0 disables retries.
maxConcurrentRequestsnumberNo2Max parallel chunk requests for long-text generate().
debugbooleanNofalseEnable debug logging. Also activated by DEBUG=inworld-tts env var.
Either apiKey or token must be provided. If neither is set, a MissingApiKeyError is thrown.

Browser

The SDK works in browsers (Vite, webpack 5, Rollup, esbuild) with no extra configuration. Use JWT tokens instead of API keys to keep your credentials safe.

Authentication with JWT

In production, your backend generates short-lived JWT tokens and your frontend uses them to authenticate. See the JWT authentication guide and the sample Node.js JWT app for how to set up the server-side token endpoint.
import { InworldTTS } from '@inworld/tts';

async function fetchToken() {
  const res = await fetch('/api/tts-token');
  const { token } = await res.json();
  return token;
}

const tts = InworldTTS({
  token: await fetchToken(),
  onTokenExpiring: fetchToken, // called automatically ~5 min before expiry
});
The onTokenExpiring callback fires automatically when the current token is about to expire. It must return a fresh JWT string. The SDK uses a stale-while-revalidate strategy — requests continue with the current token while the refresh happens in the background.

Example: text-to-speech button

A minimal browser example — user clicks a button, the SDK generates audio and plays it:
<button id="speak">Speak</button>
<script type="module">
  import { InworldTTS } from '@inworld/tts';

  async function fetchToken() {
    const res = await fetch('/api/tts-token');
    const { token } = await res.json();
    return token;
  }

  const tts = InworldTTS({
    token: await fetchToken(),
    onTokenExpiring: fetchToken,
  });

  document.getElementById('speak').addEventListener('click', async () => {
    const audio = await tts.generate({
      text: 'Hello from the browser!',
      voice: 'Ashley',
      encoding: 'MP3', // recommended for cross-browser support
    });
    await tts.play(audio);
  });
</script>
play() must be called inside a user event handler (click, keypress, etc.) due to browser autoplay policies.

Browser encoding compatibility

Not all audio encodings are playable in all browsers. Use MP3 for the widest compatibility.
EncodingChromeFirefoxSafari
MP3YesYesYes
WAVYesYesYes
OGG_OPUSYesYesNo
FLACYesYesNo
LINEAR16, PCM, ALAW, MULAW
For LINEAR16/PCM formats, use the Web Audio API directly with the Uint8Array returned by generate() instead of play().

Browser limitations

  • outputFile — not supported, throws an error. Use the returned Uint8Array directly.
  • play() with file paths — not supported, pass a Uint8Array instead.
  • cloneVoice() with file paths — not supported, pass Uint8Array buffers for audio samples.

Development shortcut

For quick prototyping, you can use an API key directly in the browser by setting dangerouslyAllowBrowser:
// ⚠️ Development only — your API key will be visible in DevTools
const tts = InworldTTS({
  apiKey: 'your_key',
  dangerouslyAllowBrowser: true,
});
Never use dangerouslyAllowBrowser in production. Your API key will be visible in browser DevTools and billed to your account. Use JWT authentication instead.

Long Text

generate() and generateWithTimestamps() automatically chunk text longer than 2,000 characters and send chunks in parallel (controlled by maxConcurrentRequests). The resulting audio is seamlessly concatenated, and timestamp offsets are merged correctly. stream() and streamWithTimestamps() require text of 2,000 characters or fewer. For longer text with streaming, split the text yourself and call stream() for each segment.

Error Handling

The SDK exports three error classes, all extending InworldTTSError:
import { InworldTTS, ApiError, NetworkError, MissingApiKeyError } from '@inworld/tts';

const tts = InworldTTS(); // reads INWORLD_API_KEY from env

try {
  const audio = await tts.generate({ text: 'Hello!', voice: 'Ashley' });
} catch (err) {
  if (err instanceof MissingApiKeyError) {
    // No API key or token provided
  } else if (err instanceof ApiError) {
    console.error(`HTTP ${err.code}: ${err.message}`, err.details);
  } else if (err instanceof NetworkError) {
    console.error(`Network error: ${err.message}`);
  } else {
    throw err;
  }
}
ErrorWhen
MissingApiKeyErrorNo apiKey or token was provided and INWORLD_API_KEY is not set.
ApiErrorThe API returned a 4xx or 5xx response. Includes .code (HTTP status) and .details.
NetworkErrorConnection failure or timeout. Automatically retried up to maxRetries times before throwing.

Next Steps

Voice Cloning

Create a personalized voice clone with just 5 seconds of audio.

Best Practices

Learn tips and tricks for synthesizing high-quality speech.

API Reference

View the complete TTS API specification.