Skip to main content
Inworld’s text-to-speech models offers best-in-class voice cloning capabilities, enabling developers to create distinct, personalized voices for their experience. There are two primary ways to clone a voice:
  1. Instant Voice Cloning - Clone a voice in minutes, with only 5-15 seconds of audio. Also known as zero-shot cloning. Available to all users through Portal.
  2. Professional Voice Cloning - For the highest quality, fine-tune a model with 30+ minutes of audio.
Professional voice cloning is currently not publicly available. To get access, please reach out to our sales team.

Instant Voice Cloning

1

Go to Inworld Portal

In Portal, select TTS Playground from the left-hand side panel. In the TTS Playground, click “Clone Voice” in the top right corner.
2

Choose how to provide audio samples

Choose between Upload audio or Record audio:
  • Upload audio: Select this option if you have existing audio recordings you would like to use.
  • Record audio: Follow a guided process to record a high-quality voice sample directly in your browser.
3

Record audio samples (if recording)

If you chose to record, you’ll see the recording interface with helpful tips:
  • Find a quiet place: Minimize background noise to ensure your voice is captured clearly.
  • Avoid mic noise: Keep a reasonable distance from the mic to prevent echo and plosives.
  • Be expressive: Speak with a variety of emotions to capture the full range of the voice.
Name your voice and select the language, which should match the audio samples you’ll record. Voices will work best when synthesizing text that matches the language of the original audio samples.Use the suggested scripts to help guide your recording. Scripts are available for different personas like Math Tutor, AI companion, Therapist, Customer Support, and Game Character. You can also try a personal script. Record up to 3 samples between 5 to 15 seconds long.Optionally enable “Remove background noise” and confirm you have the rights to clone the voice, then click “Continue”.
After clicking “Continue”, wait for the validation of your voice samples to complete before proceeding.
Check out our Voice Cloning Best Practices for helpful tips and tricks to improve the quality of your voices clones.
4

Upload audio samples (if uploading)

If you chose to upload, you’ll see the upload interface. Name your voice and select the language, which should match the audio samples uploaded. Voices will work best when synthesizing text that matches the language of the original audio samples.Drag and drop or browse to upload up to 3 audio files. Accepted formats: wav, mp3, webm. Maximum total size is 16MB. Audio samples longer than 15 seconds will be automatically trimmed to 15 seconds.Optionally enable “Remove background noise” and confirm you have the rights to clone the voice, then click “Continue”.
After clicking “Continue”, wait for the validation of your voice samples to complete before proceeding.
5

Test your cloned voice

Once voice cloning completes, you’ll see the “Try your cloned voice” interface. Enter text in the input field and press play to hear your cloned voice. You can test different phrases to ensure the voice sounds as expected.If the voice doesn’t sound quite right, you can delete the voice and start over, create another voice, or test it in the TTS Playground for more advanced testing options.
6

Use your cloned voice via API

To use the cloned voice via API, copy the voice ID for your cloned voice in TTS Playground. Use that value for the voiceId when making an API call. See our Quickstart to learn how to make your first API call.
Instant voice cloning may not perform well for less common voices, such as children’s voices or unique accents. For those use cases, we recommend professional voice cloning.

Next Steps

Looking for more tips and tricks? Check out the resources below to get started!