Inworld Lip-sync

Inworld’s TTS output carries per-phoneme timing data that drives real-time lip-sync. The system maps phonemes to viseme categories and exposes blend weights each frame, which you apply to morph targets through an Animation Blueprint node.

How it works

When the TTS node produces a FInworldData_TTSOutput, it includes a Timestamps array. Each entry is an FInworldAudioChunkTimestamp — a word with its start/end times and a Phones array of FInworldPhoneSpan entries. Each span carries the phoneme symbol, its viseme category, and its timestamp. At playback time, UInworldVoiceAudioComponent fires OnVoiceAudioPlayback every tick with the current FInworldVoiceAudioPlaybackInfo (elapsed duration) and the cached phone spans. You pass these into a BFL function to get per-viseme or per-phoneme blend weights, then feed those weights into the Inworld Viseme AnimGraph node.

TTS Node → FInworldData_TTSOutput (Audio + Timestamps[word → Phones[phoneme]])
         ↓
UInworldVoiceAudioComponent → OnVoiceAudioPlayback (PlaybackInfo + PhoneSpans)
         ↓
GetVisemeBlendsTTS / GetVisemeBlends → FInworldVisemeBlends
         ↓
Inworld Viseme AnimGraph Node → morph target curves

Data types

FInworldData_TTSOutput

The output of a TTS node. Contains everything needed for playback and lip-sync.

Field	Type	Description
`Audio`	`FInworldData_Audio`	The synthesized PCM audio
`Text`	`FString`	The text that was synthesized
`Timestamps`	`TArray<FInworldAudioChunkTimestamp>`	Per-word timing and phonetic breakdown

FInworldAudioChunkTimestamp

One word in the utterance, with its time range and phone-level detail.

Field	Type	Description
`Token`	`FString`	The word text
`StartTime`	`float`	Word start time in seconds
`EndTime`	`float`	Word end time in seconds
`Phones`	`TArray<FInworldPhoneSpan>`	Per-phoneme breakdown for this word
`bIsPartial`	`bool`	`true` if this word may still change (streaming update)

FInworldPhoneSpan

One phoneme within a word. The source of all lip-sync timing.

Field	Type	Description
`Phoneme`	`FString`	IPA phoneme symbol (e.g. `"b"`, `"æ"`)
`Viseme`	`FString`	Viseme category string (e.g. `"BMP"`, `"AEI"`)
`Timestamp`	`float`	Time in seconds when this phoneme sounds
`Duration`	`float`	Duration of this phoneme in seconds
`WordIndexAtAudioChunk`	`int32`	Index of the parent word in `Timestamps`

FInworldVisemeBlends

Blend weights for the 12 Inworld viseme categories, each in [0, 1]. STOP represents silence/rest.

Field	Sounds
`BMP`	b, m, p
`FV`	f, v
`TH`	th
`CDGKNSTXYZ`	c, d, g, k, n, s, t, x, y, z
`CHJSH`	ch, j, sh
`L`	l
`R`	r
`QW`	q, w
`AEI`	a, e, i
`EE`	ee
`O`	o
`U`	u
`STOP`	silence / rest (defaults to 1.0)

FInworldVoiceAudioPlaybackInfo

Playback timing provided each tick by OnVoiceAudioPlayback. Pass this to BFL functions to get the correct viseme weights for the current frame.

Field	Type	Description
`Utterance.PlayedDuration`	`float`	Seconds elapsed in the current utterance — used by BFL functions to look up the active phone span
`Utterance.TotalDuration`	`float`	Total duration of the utterance
`Utterance.PlayedPercent`	`float`	Playback progress `[0, 1]`
`Interaction.PlayedDuration`	`float`	Seconds elapsed across the whole interaction

UInworldVoiceAudioComponent

The UInworldVoiceAudioComponent handles TTS audio playback and is the main source of per-frame lip-sync data.

Methods

Method	Description
`QueueVoice(FInworldData_DataStream_TTSOutput)`	Queue a TTS stream chunk for playback
`Interrupt()`	Stop playback immediately and clear the queue
`GetCurrentPhoneSpans()`	Returns the cached `TArray<FInworldPhoneSpan>` for the current utterance — use with `GetVisemeBlends`

Events

Event	Signature	Description
`OnVoiceAudioStart`	`(Component, FInworldData_TTSOutput, bInteractionStart)`	Fired when a new utterance begins
`OnVoiceAudioPlayback`	`(Component, PlaybackInfo, FInworldData_TTSOutput, PhoneSpans)`	Fired every tick during playback — primary hook for lip-sync
`OnVoiceAudioUpdated`	`(Component, FInworldData_TTSOutput)`	Fired when TTS output data is updated
`OnVoiceAudioComplete`	`(Component, FInworldData_TTSOutput, bInteractionEnd)`	Fired when an utterance finishes normally
`OnVoiceAudioInterrupt`	`(Component, FInworldData_TTSOutput)`	Fired when playback is interrupted

OnVoiceAudioPlayback is the recommended binding point for lip-sync — it provides PlaybackInfo and pre-built PhoneSpans in one call.

Blueprint Function Library — Viseme & Phoneme functions

All functions are on UInworldBlueprintFunctionLibrary.

Function	Category	Description
`BuildPhoneSpansFromTTSOutput(TTSOutput, OutPhoneSpans)`	Viseme	Flattens `Timestamps[].Phones[]` into a single array. Call once per utterance and cache the result.
`GetVisemeBlends(PlaybackInfo, PhoneSpans)`	Viseme	Returns `FInworldVisemeBlends` for the current playback time using a cached span array. Recommended for performance.
`GetVisemeBlendsTTS(PlaybackInfo, TTSOutput)`	Viseme	Same as above but builds spans internally from `TTSOutput` each call. Convenient, but less efficient.
`GetPhonemeBlends(PlaybackInfo, PhoneSpans)`	Phoneme	Returns `FInworldPhonemeBlends` (raw IPA phoneme weights) from a cached span array.
`GetPhonemeBlendsTTS(PlaybackInfo, TTSOutput)`	Phoneme	Same as above but reads directly from `TTSOutput`.
`GetCurrentWord(PlaybackInfo, TTSOutput)`	Playback	Returns `FInworldAudioVoiceWord` — the word currently being spoken (`Word`, `WordIndex`, `TotalWordCount`).

Recommended pattern: bind to OnVoiceAudioPlayback, cache PhoneSpans on OnVoiceAudioStart, then call GetVisemeBlends(PlaybackInfo, CachedPhoneSpans) each tick. This avoids re-flattening the timestamp array every frame.

Inworld Viseme AnimGraph node

UAnimGraphNode_InworldViseme is an Animation Blueprint node that applies morph target curves from viseme blend weights. Bone transforms pass through unchanged — only the curve track (morph targets) is modified.

Properties

Property	Type	Default	Pin	Description
`Source`	`FPoseLink`	—	Yes	Incoming pose — bones pass through unmodified
`VisemeBlends`	`FInworldVisemeBlends`	—	Yes	Per-viseme weights from the BFL functions, updated each tick
`VisemeData`	`UInworldVisemeDataAsset*`	—	No	Data asset mapping visemes to morph target curve names and weights
`SmoothingSpeed`	`float`	`12.0`	No	Interpolation speed toward target weights per second. `0` disables smoothing
`Alpha`	`float`	`1.0`	No	Overall blend strength `[0, 2]`. `0` suppresses lip-sync, `1` is full, `2` doubles morph values

UInworldVisemeDataAsset

A UDataAsset that maps each viseme category to one or more morph target curve name/weight pairs. You create one asset per character rig, then assign it to the VisemeData property on the AnimGraph node. Each viseme entry is a TMap<FName, float> where the key is the morph target curve name on your Skeletal Mesh and the value is the contribution weight for that viseme. Supported viseme entries: BMP, FV, TH, CDGKNSTXYZ, CHJSH, L, R, QW, AEI, EE, O, U (STOP is handled automatically — it does not need an entry in the data asset.)

Setting up a VisemeDataAsset

In the Content Browser, right-click and choose Miscellaneous > Data Asset
Select InworldVisemeDataAsset as the class
Open the asset and for each viseme entry, add the morph target curve names from your Skeletal Mesh and their blend weights
Assign the asset to the VisemeData property on your Inworld Viseme AnimGraph node

For MetaHuman characters, each viseme typically maps to one or more CTRL_expressions_* curves. Weights are additive — multiple curves per viseme are all applied simultaneously.

Setting up lip-sync in an Animation Blueprint

Add the Inworld Viseme node to your AnimGraph

Open your character’s Animation Blueprint and navigate to the AnimGraph. Search for Inworld Viseme and place the node in your graph, wiring its Source input from your existing pose and its output toward Output Pose.Assign your UInworldVisemeDataAsset to the Viseme Data property on the node.

Create a VisemeBlends variable

Add an FInworldVisemeBlends variable to the Animation Blueprint. Wire it into the Viseme Blends pin on the Inworld Viseme node. This variable will be updated each tick from the character component.

Bind to OnVoiceAudioPlayback

In the character actor’s Blueprint (or on BeginPlay in the Anim BP), get the UInworldVoiceAudioComponent and bind to OnVoiceAudioPlayback. In the callback, call GetVisemeBlends(PlaybackInfo, PhoneSpans) and store the result into your FInworldVisemeBlends variable using Set Anim Instance Variable or a direct property write.For best performance, also bind to OnVoiceAudioStart and call BuildPhoneSpansFromTTSOutput there to cache the spans array. Then use GetVisemeBlends(PlaybackInfo, CachedPhoneSpans) in OnVoiceAudioPlayback instead of GetVisemeBlendsTTS.

Tune smoothing and alpha

Adjust Smoothing Speed on the AnimGraph node to control how snappily the mouth follows phoneme changes. The default of 12 is a good starting point. Use Alpha to globally scale the strength of lip-sync, which is useful for blending with other facial animation systems.

Get Started

Templates

Guides

SDK Reference

Resources

How it works

Data types

FInworldData_TTSOutput

FInworldAudioChunkTimestamp

FInworldPhoneSpan

FInworldVisemeBlends

FInworldVoiceAudioPlaybackInfo

UInworldVoiceAudioComponent

Methods

Events

Blueprint Function Library — Viseme & Phoneme functions

Inworld Viseme AnimGraph node

Properties

UInworldVisemeDataAsset

Setting up a VisemeDataAsset

Setting up lip-sync in an Animation Blueprint

Add the Inworld Viseme node to your AnimGraph

Create a VisemeBlends variable

Bind to OnVoiceAudioPlayback

Tune smoothing and alpha

​How it works

​Data types

​FInworldData_TTSOutput

​FInworldAudioChunkTimestamp

​FInworldPhoneSpan

​FInworldVisemeBlends

​FInworldVoiceAudioPlaybackInfo

​UInworldVoiceAudioComponent

​Methods

​Events

​Blueprint Function Library — Viseme & Phoneme functions

​Inworld Viseme AnimGraph node

​Properties

​UInworldVisemeDataAsset

​Setting up a VisemeDataAsset

​Setting up lip-sync in an Animation Blueprint

Add the Inworld Viseme node to your AnimGraph

Create a VisemeBlends variable

Bind to OnVoiceAudioPlayback

Tune smoothing and alpha

How it works

Data types

FInworldData_TTSOutput

FInworldAudioChunkTimestamp

FInworldPhoneSpan

FInworldVisemeBlends

FInworldVoiceAudioPlaybackInfo

UInworldVoiceAudioComponent

Methods

Events

Blueprint Function Library — Viseme & Phoneme functions

Inworld Viseme AnimGraph node

Properties

UInworldVisemeDataAsset

Setting up a VisemeDataAsset

Setting up lip-sync in an Animation Blueprint