> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Convert Speech-to-Text (STT)

The `node-stt` template illustrates how to convert speech-to-text using the STT (Speech-to-Text) node.

<Note>
  **Architecture**

  * **Backend:** Inworld Agent Runtime
  * **Frontend:** N/A (CLI example)
</Note>

## Prerequisites

* Node.js v20 or higher: [Download here](https://nodejs.org/en/download)
* Inworld API key (required): [Sign up here](https://platform.inworld.ai/signup) or see [quickstart guide](/node/authentication#getting-an-api-key)

## Run the Template

1. Clone the [templates repository](https://github.com/inworld-ai/inworld-runtime-templates-node):
   ```bash theme={"system"}
   git clone https://github.com/inworld-ai/inworld-runtime-templates-node
   cd inworld-runtime-templates-node
   ```
2. Install the Runtime SDK inside the `cli` directory.

   <CodeGroup>
     ```shell Yarn theme={"system"}
     yarn add @inworld/runtime
     ```

     ```shell npm theme={"system"}
     npm install @inworld/runtime
     ```
   </CodeGroup>
3. Set up your Base64 [Runtime API key](/node/authentication) by copying the `.env-sample` file into a `.env` file in the `cli` folder and adding your API key.
   ```env .env theme={"system"}
   # Inworld Agent Runtime Base64 API key
   INWORLD_API_KEY=<your_api_key_here>
   ```
4. Run this code in your console, providing a WAV audio file:

```
yarn node-stt --audioFilePath=path/to/your/audio.wav
```

## Understanding the Template

The main functionality of the template is contained in the run function, which demonstrates how to use the Inworld Agent Runtime to convert speech-to-text using the STT node.

Let's break it down into more detail:

### 1) Audio input preparation

First, we read and decode the WAV audio file to prepare it for processing:

```javascript theme={"system"}
const { audioFilePath, apiKey } = parseArgs();

const audioData = await WavDecoder.decode(fs.readFileSync(audioFilePath));
```

### 2) Node Initialization

Then, we create the STT node:

```javascript theme={"system"}
const sttNode = new RemoteSTTNode();
```

### 3) Graph initialization

Next, we create the graph using the GraphBuilder, adding the STT node and setting it as both start and end node:

```javascript theme={"system"}
const graph = new GraphBuilder({
  id: 'node_stt_graph',
  apiKey,
  enableRemoteConfig: false,
})
  .addNode(sttNode)
  .setStartNode(sttNode)
  .setEndNode(sttNode)
  .build();
```

The [GraphBuilder](/node/runtime-reference/classes/graph_dsl_graph_builder.GraphBuilder) configuration includes:

* **id**: A unique identifier for the graph
* **apiKey**: Your Inworld API key for authentication
* **enableRemoteConfig**: Whether to enable remote configuration (set to false for local execution)

In this example, we only have a single STT node, setting it as the start and end node. In more complex applications, you could connect multiple nodes to create a processing pipeline.

### 4) Graph execution

Now we execute the graph with the audio data directly as an input object.

```javascript theme={"system"}
const { outputStream } = await graph.start(
  new GraphTypes.Audio({
    data: Array.from(audioData.channelData[0] || []),
    sampleRate: audioData.sampleRate,
  }),
);
```

The audio input is wrapped in a `GraphTypes.Audio` object that contains:

* **data**: The audio channel data converted to an array
* **sampleRate**: The sample rate of the audio file

### 5) Response handling

The transcription results are handled using the `processResponse` method, which supports both streaming and non-streaming text responses:

```javascript theme={"system"}
let result = '';
let resultCount = 0;

for await (const resp of outputStream) {
  await resp.processResponse({
    string: (text: string) => {
      result += text;
      resultCount++;
    },
    TextStream: async (textStream: any) => {
      for await (const chunk of textStream) {
        if (chunk.text) {
          result += chunk.text;
          resultCount++;
        }
      }
    },
    default: (data: any) => {
      if (typeof data === 'string') {
        result += data;
        resultCount++;
      } else {
        console.log('Unprocessed response:', data);
      }
    },
  });
}

console.log(`Result count: ${resultCount}`);
console.log(`Result: ${result}`);
```

The response handler supports multiple response types:

* **string**: Direct string responses containing transcribed text
* **TextStream**: Streaming text responses for real-time transcription
* **default**: Fallback handler for any other response types
