> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Convert Text-to-Speech (TTS)

The `node-tts` template illustrates how to convert text-to-speech using the TTS node.

<Note>
  **Architecture**

  * **Backend:** Inworld Agent Runtime
  * **Frontend:** N/A (CLI example)
</Note>

## Prerequisites

* Node.js v20 or higher: [Download here](https://nodejs.org/en/download)
* Inworld API key (required): [Sign up here](https://platform.inworld.ai/signup) or see [quickstart guide](/node/authentication#getting-an-api-key)

## Run the Template

1. Clone the [templates repository](https://github.com/inworld-ai/inworld-runtime-templates-node):
   ```bash theme={"system"}
   git clone https://github.com/inworld-ai/inworld-runtime-templates-node
   cd inworld-runtime-templates-node
   ```
2. Install the Runtime SDK inside the `cli` directory.

   <CodeGroup>
     ```shell Yarn theme={"system"}
     yarn add @inworld/runtime
     ```

     ```shell npm theme={"system"}
     npm install @inworld/runtime
     ```
   </CodeGroup>
3. Set up your Base64 [Runtime API key](/node/authentication) by copying the `.env-sample` file into a `.env` file in the `cli` folder and adding your API key.

   ```env .env theme={"system"}
   # Inworld Agent Runtime Base64 API key
   INWORLD_API_KEY=<your_api_key_here>
   ```
4. Try a different [model](/models#tts) or [voice](/api-reference/ttsAPI/texttospeech/list-voices)! You can specify the model using the `--modelId` parameter and a voice using the `--voiceName` parameter:

   ```
   yarn node-tts "Hello, how are you?" --modelId=inworld-tts-2 --voiceName=Ronald
   ```

## Understanding the Template

The main functionality of the template is contained in the run function, which demonstrates how to use the Inworld Agent Runtime to convert text-to-speech using the TTS node.

Now let's break down the template into more detail:

### 1) Node Initialization

We start by creating the TTS node.

```javascript theme={"system"}
const ttsNode = new RemoteTTSNode({
  id: 'tts_node',
  speakerId: voiceName,
  modelId,
  sampleRate: SAMPLE_RATE,
  temperature: 1.0,
  speakingRate: 1,
});
```

When creating the TTS node, you can specify:

* **id**: A unique identifier for the node
* **speakerId**: The voice to use for synthesis (see [available voices](/api-reference/ttsAPI/texttospeech/list-voices))
* **modelId**: The [TTS model](/models#tts) to use for synthesis
* **sampleRate**: Audio output sample rate
* **temperature**: Controls randomness in synthesis
* **speakingRate**: Controls the speed of speech (1.0 is the voice's natural speed)

### 2) Graph initialization

Next, we create the graph using the GraphBuilder, adding the TTS node and setting it as both start and end node:

```javascript theme={"system"}
const graph = new GraphBuilder({
  id: 'node_tts_graph',
  apiKey,
  enableRemoteConfig: false,
})
  .addNode(ttsNode)
  .setStartNode(ttsNode)
  .setEndNode(ttsNode)
  .build();
```

The [GraphBuilder](/node/runtime-reference/classes/graph_dsl_graph_builder.GraphBuilder) configuration includes:

* **id**: A unique identifier for the graph
* **apiKey**: Your Inworld API key for authentication
* **enableRemoteConfig**: Whether to enable remote configuration (set to false for local execution)

In this example, we only have a single TTS node, setting it as the start and end node. In more complex applications, you could connect other nodes, like a LLM node, to the TTS node to create a processing pipeline.

### 3) Graph execution

Now we execute the graph with the text input directly:

```javascript theme={"system"}
const { outputStream } = await graph.start(text);
```

The text input is passed directly to the graph, which will be processed by the TTS node.

### 4) Response handling

The audio generation results are handled using the `processResponse` method, which supports streaming audio responses:

```javascript theme={"system"}
let initialText = '';
let resultCount = 0;
let allAudioData: number[] = [];

for await (const result of outputStream) {
  await result.processResponse({
    TTSOutputStream: async (ttsStream: GraphTypes.TTSOutputStream) => {
      for await (const chunk of ttsStream) {
        if (chunk.text) initialText += chunk.text;
        if (chunk.audio?.data) {
          allAudioData = allAudioData.concat(Array.from(chunk.audio.data));
        }
        resultCount++;
      }
    },
  });
}

console.log(`Result count: ${resultCount}`);
console.log(`Initial text: ${initialText}`);
```

The response handler processes:

* **TTSOutputStream**: Streaming audio responses containing both text and audio data
* **chunk.text**: The text being synthesized
* **chunk.audio.data**: The audio data as Float32Array samples

### 5) Audio file creation

Then, we encode the audio data and save it as a WAV file:

```javascript theme={"system"}
const audio = {
  sampleRate: SAMPLE_RATE,
  channelData: [new Float32Array(allAudioData)],
};

const buffer = await wavEncoder.encode(audio);
if (!fs.existsSync(OUTPUT_DIRECTORY)) {
  fs.mkdirSync(OUTPUT_DIRECTORY, { recursive: true });
}

fs.writeFileSync(OUTPUT_PATH, Buffer.from(buffer));

console.log(`Audio saved to ${OUTPUT_PATH}`);
```
