Text-to-Video model

Kling LipSync (Text‑to‑Video)

Generate lip‑synced video directly from a script.

Kling’s native‑audio generation creates text‑to‑video clips with synchronized voice and lip sync, including multi‑person dialogue.

Best for: Script‑only workflows

Inputs: Text

Outputs: Video

What this model is best at

Short answer: Kling’s native‑audio generation creates text‑to‑video clips with synchronized voice and lip sync, including multi‑person dialogue.

Use this workspace to preview the model, compare example output, and start creating with the recommended workflow for this model.

Highlight 1

Voice narration with natural emotion.

Highlight 2

Multi‑person dialogue with lip sync.

Highlight 3

Singing/rap and ambient audio support.

Text-to-Video

Kling LipSync (Text‑to‑Video) workspace

Start from the built-in workflow below, then tune the model inside the standard LipsyncX creation surface.

Talking Photo Video Dubbing Long Video Pet & Anime

1. Choose a face

Choose a template or uploadDrag & drop video or photoor click to upload

2. Model

3. Add your audio

clean-male-demo-3s.mp3Supports MP3, WAV, M4A. Max 30MB / 10 min. For best lip sync quality, upload audio under 1 min.

Preview uploaded audioUpload a new audio file to replace this demo.

0 / 1000

Est. total10/Balance0

Step 1/3

Choose a face

Follow the next step to keep building your video.

Est. total10/Balance0

Avg render time

7 min

Languages supported

50+

Creators onboarded

3,200+

Trusted by teams

StudioBlendAudioNovaCourseWaveMintlyVisionSpark

Script‑to‑video

Type a script and generate a talking clip.

Script

Generated

Popular use cases

Use case 1

Rapid prototyping

No media required.

Use case 2

Concept testing

Validate scripts quickly.

Use case 3

Internal drafts

Fast review loops.

Quick specs

Primary use

Text‑to‑video with lip sync

Inputs

Script / prompt

Output

Video with generated audio

Best strength

Script‑only workflow

Best practices

Write clear dialogue with speaker changes labeled.

Keep prompts concise and visually specific.

Use short segments to test tone before longer runs.

FAQ

Is audio generated with the video?

Yes. Native audio is generated alongside the video output.

Can it handle multi‑person dialogue?

Yes. Kling supports multi‑person dialogue with lip sync.

Which languages are supported?

Chinese and English voice output are supported.

Ready to try Kling LipSync (Text‑to‑Video)?

Use the built-in workspace to test prompts, compare outputs, and see how this model fits your content workflow.