Video-to-Video model

PixVerse LipSync

Reliable video lip sync with fast turnaround.

PixVerse Speech (LipSync) aligns mouth movement to audio for expressive, emotion‑driven performance using either a PixVerse video_id or uploaded video.

Best for: Social clips

Inputs: Video + Audio

Outputs: Video

What this model is best at

Short answer: PixVerse Speech (LipSync) aligns mouth movement to audio for expressive, emotion‑driven performance using either a PixVerse video_id or uploaded video.

Use this workspace to preview the model, compare example output, and start creating with the recommended workflow for this model.

Highlight 1

Analyzes both audio and mouth motion for tight sync.

Highlight 2

Accepts PixVerse video_id or uploaded MP4/MOV.

Highlight 3

Audio via file upload or built‑in TTS script.

Video-to-Video

PixVerse LipSync workspace

Start from the built-in workflow below, then tune the model inside the standard LipsyncX creation surface.

Talking Photo Video Dubbing Long Video Pet & Anime

1. Choose a face

1. Choose a face

Choose a template or uploadDrag & drop video or photoor click to upload

2. Model

3. Add your audio

clean-male-demo-3s.mp3Supports MP3, WAV, M4A. Max 30MB / 10 min. For best lip sync quality, upload audio under 1 min.

Preview uploaded audioUpload a new audio file to replace this demo.

0 / 1000

Est. total10/Balance0

Step 1/3

Choose a face

Follow the next step to keep building your video.

Est. total10/Balance0

Avg render time

7 min

Languages supported

50+

Creators onboarded

3,200+

Trusted by teams

StudioBlendAudioNovaCourseWaveMintlyVisionSpark

Social clip refresh

Swap narration for a faster hook.

Original

Social clip refresh original

Synced

Social clip refresh generated

Popular use cases

Use case 1

Short‑form

Quick hook iterations.

Use case 2

Social ads

Fast creative refresh.

Use case 3

Creator posts

Lightweight updates.

Quick specs

Primary use

Fast lip sync for social clips

Inputs

PixVerse video_id or uploaded video + audio

Output

Synced video

Best strength

Speed and simplicity

Best practices

Keep clips short for the fastest turnaround.

Use clean, noise‑free audio for crisp mouth motion.

Ensure the face is clear and well‑lit.

FAQ

What are the video limits?

Up to 30 seconds, 1920px resolution, and 50MB per video.

What audio formats are supported?

MP3 or WAV audio, up to 30 seconds and 50MB.

Can I use a script instead of audio?

Yes. Provide a TTS script to generate the audio automatically.

Ready to try PixVerse LipSync?

Use the built-in workspace to test prompts, compare outputs, and see how this model fits your content workflow.