node-stt template illustrates how to convert speech-to-text using the STT (Speech-to-Text) node.
Architecture
- Backend: Inworld Runtime
- Frontend: N/A (CLI example)
Run the Template
-
Clone the templates repository:
-
Install the Runtime SDK inside the
clidirectory. -
Set up your Base64 Runtime API key by copying the
.env-samplefile into a.envfile in theclifolder and adding your API key..env - Run this code in your console, providing a WAV audio file:
Understanding the Template
The main functionality of the template is contained in the run function, which demonstrates how to use the Inworld Runtime to convert speech-to-text using the STT node. Let’s break it down into more detail:1) Audio input preparation
First, we read and decode the WAV audio file to prepare it for processing:2) Node Initialization
Then, we create the STT node:3) Graph initialization
Next, we create the graph using the GraphBuilder, adding the STT node and setting it as both start and end node:- id: A unique identifier for the graph
- apiKey: Your Inworld API key for authentication
- enableRemoteConfig: Whether to enable remote configuration (set to false for local execution)
4) Graph execution
Now we execute the graph with the audio data directly as an input object.GraphTypes.Audio object that contains:
- data: The audio channel data converted to an array
- sampleRate: The sample rate of the audio file
5) Response handling
The transcription results are handled using theprocessResponse method, which supports both streaming and non-streaming text responses:
- string: Direct string responses containing transcribed text
- TextStream: Streaming text responses for real-time transcription
- default: Fallback handler for any other response types