Inworld’s text-to-speech models offers best-in-class voice cloning capabilities, enabling developers to create distinct, personalized voices for their experience. There are two primary ways to clone a voice:
  1. Instant Voice Cloning - Clone a voice in minutes, with only 5-15 seconds of audio. Also known as zero-shot cloning. Available to all users through Portal.
  2. Professional Voice Cloning - For the highest quality, fine-tune a model with 30+ minutes of audio.
Professional voice cloning is currently not publicly available. To get access, please reach out to our sales team.

Instant Voice Cloning

1

Go to Inworld Portal

In Portal, select TTS Playground from the left-hand side panel. In the TTS Playground, click “Clone Voice” in the top right corner.
2

Upload or record audio samples

Name your voice and select the language, which should match the audio samples uploaded. Voices will work best when synthesizing text that matches the language of the original audio samples.Upload or record up to 3 audio samples between 5 to 15 seconds long. Confirm you have the rights to clone the voice, and click “Continue”.
Check out our Voice Cloning Best Practices for helpful tips and tricks to improve the quality of your voices clones.
3

Test your cloned voice

Once voice cloning completes, you can try it in TTS Playground! The voice will show up in the “Voices” list under the name you provided.
4

Use your cloned voice via API

To use the cloned voice via API, copy the voice ID for your cloned voice in TTS Playground. Use that value for the voiceId when making an API call. See our Quickstart to learn how to make your first API call.
Instant voice cloning may not perform well for less common voices, such as children’s voices or unique accents. For those use cases, we recommend professional voice cloning.

Next Steps

Looking for more tips and tricks? Check out the resources below to get started!