LipsyncX
NewNew

Seedance 2.0

New multi‑shot video model with strong consistency.

1. Upload photo

2. Choose Model

3. Add Script

20 credits
Billing unit10 credits / 5s
Billing units2
Estimated length8s
Est. total20 credits
Uses real audio duration when available.
87 / 1000

Overview

Next‑gen video model focused on multi‑shot storytelling with multimodal reference inputs, audio‑visual sync, and strong character consistency.

Highlights

  • Multi‑shot narrative generation with consistent characters.
  • Multimodal references (image, video, text, and audio).
  • Audio‑visual beat matching for timing and rhythm.
  • High‑resolution outputs up to 2K.

Quick Specifications

Image inputsUp to 9 images
Video inputsUp to 3 videos (max 15s total)
Audio inputsUp to 3 MP3 files (max 15s total)
Text inputNatural‑language prompts
Output duration4–15 seconds (selectable)
Audio outputNative sound effects + music
Total files per run12 uploads per generation

Best for

Cinematic previewsMulti‑scene storytelling

Inputs & Outputs

Inputs
Text/Audio
Outputs
Video

Multi‑shot teaser

Generate a cinematic multi‑scene preview.

Concept
Multi‑shot teaser original
Generated
Multi‑shot teaser generated

Reference Guide

Seedance 2.0 uses an @‑mention system to point each uploaded asset to a role in the generation.

Modes

  • First / Last Frame: Use a starting (or ending) frame plus a prompt.
  • Universal Reference: Mix images, video, audio, and text in one prompt.

Syntax Example

Syntax Example

@Image1 as the first frame, reference @Video1 for camera movement, use @Audio1 for music.

Examples

Set first frame
@Image1 as the first frame
Reference motion
Reference @Video1 for choreography
Copy camera work
Follow @Video1's camera movements and transitions
Add music / rhythm
Use @Audio1 for background music
Extend a video
Extend @Video1 by 5 seconds
Replace character
Replace the woman in @Video1 with @Image1

Capabilities

Enhanced base quality

  • Improved physics realism for objects and motion.
  • Smoother temporal continuity across frames.
  • Stronger instruction following for complex prompts.
A girl hangs laundry, reaching into the basket and shaking out the next piece.

Multimodal reference system

  • Extract motion patterns from reference video.
  • Lock visual style with reference images.
  • Drive rhythm and mood from audio tracks.

Character & object consistency

  • Stable facial identity across shots.
  • Preserves logos, text, and product details.
  • Maintains scene coherence without style drift.

Motion & camera replication

  • Replicate choreography, action, or dance.
  • Match camera moves (dolly, tracking, handheld).
  • Copy editing rhythm and transitions.

Video extension & editing

  • Extend existing videos while preserving narrative flow.
  • Replace characters or props without re‑rendering.
  • Re‑style scenes while keeping motion.

Audio‑synchronized generation

  • Sync dialogue lip‑motion to audio.
  • Match sound effects to on‑screen actions.
  • Follow musical beats for pacing.

Use Cases

Cinematic previews

Showcase multi‑shot narratives.

Brand storytelling

Create multi‑scene launch teasers.

Concept videos

Prototype full sequences fast.

Applications

Advertising & e‑commerce

Build product demos that mirror brand assets while adding new scenes.

Content localization

Generate multi‑language versions with native lip sync.

Storyboards → video

Animate storyboard panels into short sequences.

Template‑based creation

Reference an existing style and rebuild it with new content.

Best Practices

  1. 1Be explicit about what each reference controls (style, motion, camera, character).
  2. 2Prioritize the most important assets within the 12‑file limit.
  3. 3Double‑check @‑mentions to avoid swapping files.
  4. 4Specify edit vs reference when using an existing video.
  5. 5Align generation duration with intended extension length.
  6. 6Write prompts like you are briefing a human editor.

Frequently Asked Questions

What makes it different from single‑shot models?

Seedance 2.0 focuses on multi‑shot storytelling with consistent characters across scenes.

What reference inputs are supported?

It supports multimodal references such as text, images, video, and audio.

What resolution does it target?

High‑resolution outputs up to 2K are supported.