Skip to main content

Getting Started

Overview

The Inworld Text-to-Speech (TTS) API allows you to generate high-quality, human-like speech at high speed and low latency, allowing for real-time, natural, and expressive NPC speech. This guide covers how to set up the Inworld TTS system for your project.

Setup

To set up the TTS system for your project, perform the following steps:

1. Sign Up

To use the Inworld TTS API, you first need an Inworld account.

To set up your Inworld accoount, go to Inworld Studio and follow the onscreen instructions.

2. Obtain API Key

Once in the Inworld Studio, you then need to create a workspace.

To do this, left-click on the Drop-Down Menu in the top-left of the screen.

Dropdown Menu

This opens the Workspaces page.

From this interface, left-click Create New Workspace in the top-right of the interface.

Create New Workspace

Enter a workspace name into the field provided and press Continue.

A new workspace is then created.

Within this new workspace, left-click on Integrations on the left sidebar.

Press Integrations

Then, scroll down and press Generate New Key.

Generate New

A new key is then generated.

One this key has been generated, press Copy Base64.

Copy Base64

This key is used to authenticate with the Inworld TTS API.

Whenever needed, the key can be retrieved from this page at any time.

3. Authenticate the API Key

The Inworld TTS API supports both Basic and JWT authentication.

The following workflow steps in this guide cover Basic authentication.

To perform authentication, the API key must be included in the API Request Header, with each call appearing as below by replacing INWORLD_API_KEY with the key copied in Step 2.

Authorization

Warning: API Keys are Not Secure

Note that Basic authentication is not considered secure and may leak API keys.

It is recommended that API keys are stored as environment variables and read at run time. It is further recommended that JWT authentication be used where possible.

Both setting and retrieving environment variables, plus using JWT authentication, are out of the scope of this guide, which has a focus on the Inworld API usage only.

4. Making Your First API Call

The first API call you make will be to the ListVoices endpoint. This endpoint returns a list of voices that can be used in future calls.

Reminder: Don’t forget to replace INWORLD_API_KEY with the key generated earlier.

Make the call as follows:

curl --request GET --header "Authorization: Basic INWORLD_API_KEY" https://api.inworld.ai/tts/v1alpha/voices

The API then returns a JSON Blob containing a list of voices.

The example below shows the voice for "Timothy."

{
"voices": [
...
{
"languageCodes": [
"en-US"
],
"name": "Timothy",
"voiceMetadata": {
"gender": "MALE",
"age": "TEEN",
"accent": "ACCENT_UNSPECIFIED"
},
"naturalSampleRateHertz": 22050
},
...
]
}

In the example for Timothy above, the output has been cut off and replaced with “...” for the sake of concision.

5. Synthesizing Speech

Next, you need to to synthesize some speech by calling the Synthesize endpoint.

Before beginning this step, assure you have replaced INWORLD_API_KEY with your API key covered in Step 2.

Note: The following command assumes you have the commands jq and base64 available to extract the response from the API and convert it into a wav file. These are not necessary to use the API, and are included for instructional purposes so you can record the audio returned by the API and validate the call worked.

Make the call as follows:

curl --request POST \
--header "Authorization: Basic INWORLD_API_KEY" \
--data '{
"input": {
"text": "Hello world"
},
"voice": {
"name": "Timothy"
}
}' \
https://api.inworld.ai/tts/v1alpha/text:synthesize | jq -r '.result.audioContent' | base64 -d > hello_world.wav

The Synthesize endpoint supports response streaming by default. If response streaming is not required, you can also use the Synthesize-Sync endpoint:

6. Finished

You are now able to authenticate against and use the Inworld TTS API.