1. Upload photo
2. Choose Model
3. Add Script
20 credits
Billing unit10 credits / 5s
Billing units2
Estimated length8s
Est. total20 credits
Uses real audio duration when available.
87 / 1000
Avg render time
7 min
Languages supported
50+
Creators onboarded
3,200+
Trusted by teams
StudioBlendAudioNovaCourseWaveMintlyVisionSpark
Overview
Audio‑driven multi‑character avatar model built for realistic group conversations with synchronized lip sync and natural turn‑taking.
Highlights
- Multi‑character conversations with synchronized lip sync.
- Multi‑stream audio support for multi‑speaker dialogue.
- Natural group dynamics and turn‑taking.
Quick Specifications
Primary useMulti‑speaker avatar video
InputsMultiple portraits + multiple audio tracks
OutputMulti‑avatar talking‑head video
Best strengthGroup conversations with turn‑taking
Best for
PodcastsMulti‑speaker narration
Inputs & Outputs
Inputs
ImageAudio
Outputs
Video
Two‑speaker panel
Drive multiple avatars from one audio track.
Portraits
Generated
Capabilities
Multi‑character sync
- Separate speakers and lip‑sync each avatar.
- Supports panel‑style conversations.
Long‑form stability
- Consistent identity across longer scenes.
- Natural timing and turn‑taking.
Use Cases
Podcast panels
Multi‑guest episodes.
Roundtables
Two‑speaker summaries.
Debates
Split‑speaker scripts.
Applications
Podcast panels
Convert multi‑speaker audio into visuals.
Roundtables
Create panel‑style summaries.
Debates
Visualize opposing viewpoints quickly.
Best Practices
- 1Provide clean, separated audio per speaker.
- 2Use distinct portraits to avoid identity confusion.
- 3Keep background motion minimal for clarity.
Frequently Asked Questions
Can it handle multiple speakers?
Yes. It is designed for multi‑character lip‑sync conversations.
What inputs are required?
Provide portraits plus a separate audio stream for each speaker.
Is it suitable for longer scenes?
It targets long‑form stability with consistent identity.
