1. Upload photo
2. Choose Model
3. Add Script
20 credits
Billing unit10 credits / 5s
Billing units2
Estimated length8s
Est. total20 credits
Uses real audio duration when available.
87 / 1000
Avg render time
7 min
Languages supported
50+
Creators onboarded
3,200+
Trusted by teams
StudioBlendAudioNovaCourseWaveMintlyVisionSpark
Overview
Kling’s native‑audio generation creates text‑to‑video clips with synchronized voice and lip sync, including multi‑person dialogue.
Highlights
- Voice narration with natural emotion.
- Multi‑person dialogue with lip sync.
- Singing/rap and ambient audio support.
- Chinese and English voice output.
Quick Specifications
Primary useText‑to‑video with lip sync
InputsScript / prompt
OutputVideo with generated audio
Best strengthScript‑only workflow
Best for
Script‑only workflowsRapid prototyping
Inputs & Outputs
Inputs
Text
Outputs
Video
Script‑to‑video
Type a script and generate a talking clip.
Script
Generated
Capabilities
Native audio generation
- Creates speech and lip sync together.
- Supports multi‑person dialogue.
Expressive delivery
- Natural emotion in voice output.
- Works for narration and performance.
Use Cases
Rapid prototyping
No media required.
Concept testing
Validate scripts quickly.
Internal drafts
Fast review loops.
Applications
Script testing
Validate scripts without recording.
Concept reels
Prototype ideas fast.
Internal drafts
Quick previews for approvals.
Best Practices
- 1Write clear dialogue with speaker changes labeled.
- 2Keep prompts concise and visually specific.
- 3Use short segments to test tone before longer runs.
Frequently Asked Questions
Is audio generated with the video?
Yes. Native audio is generated alongside the video output.
Can it handle multi‑person dialogue?
Yes. Kling supports multi‑person dialogue with lip sync.
Which languages are supported?
Chinese and English voice output are supported.
