Node.js SDK - Inworld AI Documentation

The @inworld/tts Node.js SDK wraps the Inworld TTS REST API with a clean, typed interface. It handles chunking for long text, retries with exponential backoff, and connection management automatically — reducing typical integrations from 30+ lines of raw HTTP to just a few lines of code.

npm install @inworld/tts

Quick Start

import { InworldTTS } from '@inworld/tts';
import fs from 'fs';

const tts = InworldTTS(); // reads INWORLD_API_KEY from env

const audio = await tts.generate({
  text: 'What a wonderful day to be a text-to-speech model!',
  voice: 'Ashley',
});

fs.writeFileSync('output.mp3', audio);

Speech Synthesis

`generate(options)`

Synthesize speech and return the complete audio as a Uint8Array. Text longer than 2,000 characters is automatically chunked and sent in parallel.

const audio = await tts.generate({
  text: 'Hello, world!',
  voice: 'Ashley',
  model: 'inworld-tts-2',
  encoding: 'MP3',
  outputFile: 'output.mp3', // optional — also writes to disk
});

Parameter	Type	Required	Default	Description
`text`	`string`	Yes	—	Text to synthesize. Any length. Supports `<break time="Xs"/>` SSML.
`voice`	`string`	Yes	—	Voice ID (e.g. `"Ashley"`, `"Dennis"`, or a custom voice ID).
`model`	`string`	No	`"inworld-tts-2"`	Model ID.
`encoding`	`string`	No	`"MP3"`	Audio format: `MP3`, `OGG_OPUS`, `FLAC`, `LINEAR16`, `WAV`, `PCM`, `ALAW`, `MULAW`.
`sampleRate`	`number`	No	`48000`	Sample rate in Hz.
`bitRate`	`number`	No	`128000`	Bit rate in bps (MP3 / OGG_OPUS only).
`speakingRate`	`number`	No	`1.0`	Speed multiplier (0.5–1.5).
`language`	`string`	No	—	BCP-47 language tag (e.g. `"en-US"`, `"fr-FR"`) telling the model which language the voice should speak. Auto-detected from the input text when omitted.
`deliveryMode`	`string`	No	`"BALANCED"`	Trade-off between stability and expressiveness on `inworld-tts-2`: `"STABLE"`, `"BALANCED"`, or `"CREATIVE"`. Ignored on other models.
`temperature`	`number`	No	`1.0`	Expressiveness (0.0–2.0). Higher = more expressive. Ignored on `inworld-tts-2` — use `deliveryMode` instead.
`outputFile`	`string`	No	—	Write audio to this file path (Node.js only).
`play`	`boolean`	No	`false`	Play audio immediately after synthesis (Node.js only).

Returns: Uint8Array — raw audio bytes in the requested encoding.

`stream(options)`

Stream audio chunks over HTTP as they are generated. Lower time-to-first-audio than generate(). Text must be 2,000 characters or fewer.

const chunks = [];

for await (const chunk of tts.stream({
  text: 'Streaming is great for real-time playback!',
  voice: 'Ashley',
})) {
  chunks.push(chunk);
}

const audio = Buffer.concat(chunks);

Parameters are the same as generate(), except text must be ≤2,000 characters and the default model is "inworld-tts-1.5-mini". Yields: Uint8Array — audio chunks as they arrive.

`generateWithTimestamps(options)`

Same as generate() but also returns word- or character-level timing data. Useful for lip-sync, karaoke, and subtitle alignment.

const { audio, timestamps } = await tts.generateWithTimestamps({
  text: 'Timestamps are useful for lip sync.',
  voice: 'Ashley',
  timestampType: 'WORD',
});

// timestamps.wordAlignment.words → ['Timestamps', 'are', 'useful', ...]
// timestamps.wordAlignment.wordStartTimeSeconds → [0.0, 0.42, 0.61, ...]

Takes all the same parameters as generate(), plus:

Parameter	Type	Required	Description
`timestampType`	`"WORD"` \| `"CHARACTER"`	Yes	`"WORD"` returns word timing, phonemes, and visemes. `"CHARACTER"` returns per-character timing.

Returns: { audio: Uint8Array, timestamps: TimestampInfo }

`streamWithTimestamps(options)`

Stream audio chunks, each paired with optional timestamp data. Text must be ≤2,000 characters.

for await (const chunk of tts.streamWithTimestamps({
  text: 'Streaming with timestamps!',
  voice: 'Ashley',
  timestampType: 'WORD',
})) {
  // chunk.audio: Uint8Array
  // chunk.timestamps: TimestampInfo | undefined
}

Takes all the same parameters as stream(), plus timestampType (required). Default model is "inworld-tts-1.5-mini". Yields: { audio: Uint8Array, timestamps?: TimestampInfo }

`play(audio, options)`

Play audio from a Uint8Array or file path. Encoding is auto-detected from magic bytes unless overridden.

const audio = await tts.generate({ text: 'Listen to this!', voice: 'Ashley' });
await tts.play(audio);

// Or play from a file
await tts.play('output.mp3');

Parameter	Type	Required	Default	Description
`audio`	`Uint8Array` \| `string`	Yes	—	Raw audio bytes or a file path (Node.js only).
`encoding`	`string`	No	auto-detected	Format hint (`"MP3"`, `"WAV"`, etc.). Inferred from extension for file paths.

Voice Management

`listVoices(options)`

List available voices, optionally filtered by language.

const voices = await tts.listVoices();

// Filter by language
const enVoices = await tts.listVoices({ lang: 'EN_US' });
const multiLang = await tts.listVoices({ lang: ['EN_US', 'ES_ES'] });

Parameter	Type	Required	Description
`lang`	`string` \| `string[]`	No	Filter by language code(s). Returns all voices when omitted.

Returns: VoiceInfo[]

`getVoice(voice)`

Get details for a single voice. Works with custom voices in your workspace (cloned or designed voices).

const voice = await tts.getVoice('my-custom-voice-id');
// voice.voiceId, voice.displayName, voice.langCode, ...

Returns: VoiceInfo

`cloneVoice(options)`

Clone a voice from one or more audio recordings. Only 5–15 seconds of audio is needed.

const result = await tts.cloneVoice({
  audioSamples: ['./recording.wav'],
  displayName: 'My Cloned Voice',
  lang: 'EN_US',
});

console.log(result.voice.voiceId); // use this ID in generate()

Parameter	Type	Required	Default	Description
`audioSamples`	`Array<Uint8Array \| string>`	Yes	—	Audio files as `Uint8Array`/Buffer, or file paths (Node.js only). WAV or MP3.
`displayName`	`string`	No	`"Cloned Voice"`	Display name for the cloned voice.
`lang`	`string`	No	`"EN_US"`	Language code of the recordings.
`transcriptions`	`string[]`	No	—	Transcriptions aligned with each audio sample. Improves clone quality.
`description`	`string`	No	—	Voice description.
`tags`	`string[]`	No	—	Tags for filtering.
`removeBackgroundNoise`	`boolean`	No	`false`	Apply noise reduction before cloning.

Returns: CloneVoiceResult — the cloned voice ID is at result.voice.voiceId.

`designVoice(options)`

Design a new voice from a text description — no audio recording needed.

const result = await tts.designVoice({
  designPrompt: 'A warm, friendly female voice with a slight British accent',
  previewText: 'Hello! Welcome to our application.',
  numberOfSamples: 3,
});

// Listen to previews, then publish the one you like
const chosenVoice = result.previewVoices[0];

Parameter	Type	Required	Default	Description
`designPrompt`	`string`	Yes	—	Natural-language description of the voice (30–250 characters).
`previewText`	`string`	Yes	—	Text the generated voice will speak in the preview.
`lang`	`string`	No	`"EN_US"`	Language code.
`numberOfSamples`	`number`	No	`1`	Number of preview candidates (1–3).

Returns: DesignVoiceResult — preview voices at result.previewVoices.

`publishVoice(options)`

Publish a designed or cloned voice preview to your library so it can be used in generate() and stream().

const voice = await tts.publishVoice({
  voice: chosenVoice.voiceId,
  displayName: 'My Designed Voice',
});

Parameter	Type	Required	Description
`voice`	`string`	Yes	Voice ID from `designVoice()` or `cloneVoice()`.
`displayName`	`string`	No	Display name for the published voice.
`description`	`string`	No	Description.
`tags`	`string[]`	No	Tags for filtering.

Returns: VoiceInfo

`migrateFromElevenLabs(options)`

Migrate a voice from ElevenLabs to your Inworld workspace. Fetches the voice’s audio samples directly from ElevenLabs and clones them into Inworld. No ElevenLabs SDK required.

const result = await tts.migrateFromElevenLabs({
  elevenLabsApiKey: process.env.ELEVEN_LABS_API_KEY,
  elevenLabsVoiceId: 'abc123',
});

console.log(`Migrated "${result.elevenLabsName}" → ${result.inworldVoiceId}`);

Parameter	Type	Required	Description
`elevenLabsApiKey`	`string`	Yes	Your ElevenLabs API key.
`elevenLabsVoiceId`	`string`	Yes	ElevenLabs voice ID to migrate.

Returns: { elevenLabsVoiceId, elevenLabsName, inworldVoiceId }

Configuration

Create a client with InworldTTS() or the equivalent createClient():

import { InworldTTS, createClient } from '@inworld/tts';

const tts = InworldTTS();                       // reads INWORLD_API_KEY from env
const tts = InworldTTS({ apiKey: 'your_key' }); // or pass explicitly
const tts = createClient();                     // alias for InworldTTS()

Option	Type	Required	Default	Description
`apiKey`	`string`	—	`INWORLD_API_KEY` env var	Inworld API key. Mutually exclusive with `token`.
`token`	`string`	—	—	JWT token for browser use. Mutually exclusive with `apiKey`.
`onTokenExpiring`	`() => Promise<string>`	No	—	Called when `token` is about to expire. Return a fresh JWT.
`dangerouslyAllowBrowser`	`boolean`	No	`false`	Allow `apiKey` in browser environments (key will be visible in DevTools).
`baseUrl`	`string`	No	`https://api.inworld.ai`	Override the API base URL.
`timeout`	`number`	No	per-method	Global HTTP timeout in milliseconds.
`maxRetries`	`number`	No	`2`	Retry attempts on `NetworkError` or 5xx. Uses exponential backoff (1s, 2s, 4s… capped at 16s). `0` disables retries.
`maxConcurrentRequests`	`number`	No	`2`	Max parallel chunk requests for long-text `generate()`.
`debug`	`boolean`	No	`false`	Enable debug logging. Also activated by `DEBUG=inworld-tts` env var.

Either apiKey or token must be provided. If neither is set, a MissingApiKeyError is thrown.

Browser

The SDK works in browsers (Vite, webpack 5, Rollup, esbuild) with no extra configuration. Use JWT tokens instead of API keys to keep your credentials safe.

Authentication with JWT

In production, your backend generates short-lived JWT tokens and your frontend uses them to authenticate. See the JWT authentication guide and the sample Node.js JWT app for how to set up the server-side token endpoint.

import { InworldTTS } from '@inworld/tts';

async function fetchToken() {
  const res = await fetch('/api/tts-token');
  const { token } = await res.json();
  return token;
}

const tts = InworldTTS({
  token: await fetchToken(),
  onTokenExpiring: fetchToken, // called automatically ~5 min before expiry
});

The onTokenExpiring callback fires automatically when the current token is about to expire. It must return a fresh JWT string. The SDK uses a stale-while-revalidate strategy — requests continue with the current token while the refresh happens in the background.

Example: text-to-speech button

A minimal browser example — user clicks a button, the SDK generates audio and plays it:

<button id="speak">Speak</button>
<script type="module">
  import { InworldTTS } from '@inworld/tts';

  async function fetchToken() {
    const res = await fetch('/api/tts-token');
    const { token } = await res.json();
    return token;
  }

  const tts = InworldTTS({
    token: await fetchToken(),
    onTokenExpiring: fetchToken,
  });

  document.getElementById('speak').addEventListener('click', async () => {
    const audio = await tts.generate({
      text: 'Hello from the browser!',
      voice: 'Ashley',
      encoding: 'MP3', // recommended for cross-browser support
    });
    await tts.play(audio);
  });
</script>

play() must be called inside a user event handler (click, keypress, etc.) due to browser autoplay policies.

Browser encoding compatibility

Not all audio encodings are playable in all browsers. Use MP3 for the widest compatibility.

Encoding	Chrome	Firefox	Safari
`MP3`	Yes	Yes	Yes
`WAV`	Yes	Yes	Yes
`OGG_OPUS`	Yes	Yes	No
`FLAC`	Yes	Yes	No
`LINEAR16`, `PCM`, `ALAW`, `MULAW`	—	—	—

For LINEAR16/PCM formats, use the Web Audio API directly with the Uint8Array returned by generate() instead of play().

Browser limitations

outputFile — not supported, throws an error. Use the returned Uint8Array directly.
play() with file paths — not supported, pass a Uint8Array instead.
cloneVoice() with file paths — not supported, pass Uint8Array buffers for audio samples.

Development shortcut

For quick prototyping, you can use an API key directly in the browser by setting dangerouslyAllowBrowser:

// ⚠️ Development only — your API key will be visible in DevTools
const tts = InworldTTS({
  apiKey: 'your_key',
  dangerouslyAllowBrowser: true,
});

Never use dangerouslyAllowBrowser in production. Your API key will be visible in browser DevTools and billed to your account. Use JWT authentication instead.

Long Text

generate() and generateWithTimestamps() automatically chunk text longer than 2,000 characters and send chunks in parallel (controlled by maxConcurrentRequests). The resulting audio is seamlessly concatenated, and timestamp offsets are merged correctly. stream() and streamWithTimestamps() require text of 2,000 characters or fewer. For longer text with streaming, split the text yourself and call stream() for each segment.

Error Handling

The SDK exports three error classes, all extending InworldTTSError:

import { InworldTTS, ApiError, NetworkError, MissingApiKeyError } from '@inworld/tts';

const tts = InworldTTS(); // reads INWORLD_API_KEY from env

try {
  const audio = await tts.generate({ text: 'Hello!', voice: 'Ashley' });
} catch (err) {
  if (err instanceof MissingApiKeyError) {
    // No API key or token provided
  } else if (err instanceof ApiError) {
    console.error(`HTTP ${err.code}: ${err.message}`, err.details);
  } else if (err instanceof NetworkError) {
    console.error(`Network error: ${err.message}`);
  } else {
    throw err;
  }
}

Error	When
`MissingApiKeyError`	No `apiKey` or `token` was provided and `INWORLD_API_KEY` is not set.
`ApiError`	The API returned a 4xx or 5xx response. Includes `.code` (HTTP status) and `.details`.
`NetworkError`	Connection failure or timeout. Automatically retried up to `maxRetries` times before throwing.

Next Steps

Voice Cloning

Create a personalized voice clone with just 5 seconds of audio.

Best Practices

Learn tips and tricks for synthesizing high-quality speech.

API Reference

View the complete TTS API specification.

​Quick Start

​Speech Synthesis

​generate(options)

​stream(options)

​generateWithTimestamps(options)

​streamWithTimestamps(options)

​play(audio, options)

​Voice Management

​listVoices(options)

​getVoice(voice)

​cloneVoice(options)

​designVoice(options)

​publishVoice(options)

​migrateFromElevenLabs(options)

​Configuration

​Browser

​Authentication with JWT

​Example: text-to-speech button

​Browser encoding compatibility

​Browser limitations

​Development shortcut

​Long Text

​Error Handling

​Next Steps

Voice Cloning

Best Practices

API Reference

Quick Start

Speech Synthesis

`generate(options)`

`stream(options)`

`generateWithTimestamps(options)`

`streamWithTimestamps(options)`

`play(audio, options)`

Voice Management

`listVoices(options)`

`getVoice(voice)`

`cloneVoice(options)`

`designVoice(options)`

`publishVoice(options)`

`migrateFromElevenLabs(options)`

Configuration

Browser

Authentication with JWT

Example: text-to-speech button

Browser encoding compatibility

Browser limitations

Development shortcut

Long Text

Error Handling

Next Steps