1. Upload photo
2. Choose Model
3. Add Script
20 credits
Billing unit10 credits / 5s
Billing units2
Estimated length8s
Est. total20 credits
Uses real audio duration when available.
87 / 1000
Avg render time
7 min
Languages supported
50+
Creators onboarded
3,200+
Trusted by teams
StudioBlendAudioNovaCourseWaveMintlyVisionSpark
Overview
PixVerse Speech (LipSync) aligns mouth movement to audio for expressive, emotion‑driven performance using either a PixVerse video_id or uploaded video.
Highlights
- Analyzes both audio and mouth motion for tight sync.
- Accepts PixVerse video_id or uploaded MP4/MOV.
- Audio via file upload or built‑in TTS script.
Quick Specifications
Primary useFast lip sync for social clips
InputsPixVerse video_id or uploaded video + audio
OutputSynced video
Best strengthSpeed and simplicity
Best for
Social clipsQuick iteration
Inputs & Outputs
Inputs
VideoAudio
Outputs
Video
Social clip refresh
Swap narration for a faster hook.
Original
Synced
Capabilities
Flexible inputs
- Accepts video_id or MP4/MOV uploads.
- Audio via upload or TTS.
Social‑ready output
- Designed for short‑form content.
- Good for rapid creative iteration.
Use Cases
Short‑form
Quick hook iterations.
Social ads
Fast creative refresh.
Creator posts
Lightweight updates.
Applications
Short‑form ads
Refresh hooks without re‑shooting.
Creator clips
Swap narration fast.
Social updates
Quickly iterate messaging.
Best Practices
- 1Keep clips short for the fastest turnaround.
- 2Use clean, noise‑free audio for crisp mouth motion.
- 3Ensure the face is clear and well‑lit.
Frequently Asked Questions
What are the video limits?
Up to 30 seconds, 1920px resolution, and 50MB per video.
What audio formats are supported?
MP3 or WAV audio, up to 30 seconds and 50MB.
Can I use a script instead of audio?
Yes. Provide a TTS script to generate the audio automatically.
