Audio-to-VideoNew

LTX 2.3

FAL.ai audio-to-video model for lip sync and stylized talking videos.

Talking Photo Video Dubbing Long Video Pet & Anime

1. Upload photo

Drag & drop video or photoor click to upload

2. Choose Model

3. Add Script

20 credits

Billing unit10 credits / 5s

Billing units2

Estimated length8s

Est. total20 credits

Uses real audio duration when available.

Voice

Speech speed (1.00x)

87 / 1000

Avg render time

7 min

Languages supported

50+

Creators onboarded

3,200+

Trusted by teams

StudioBlendAudioNovaCourseWaveMintlyVisionSpark

Overview

LTX 2.3 (LTX Video Pro) is an audio-driven generation model from FAL.ai that turns speech into stylized video with synchronized lip motion. It supports optional image and prompt guidance for controlled creative outputs.

Highlights

Audio-to-video generation with synchronized lip motion.
Supports stylized video output with optional prompt guidance.
Optional image conditioning for character or scene direction.
Endpoint: fal.run/fal-ai/ltx-2.3/audio-to-video.

Quick Specifications

Primary useAudio-to-video lip sync

ProviderFAL.ai

Endpointhttps://fal.run/fal-ai/ltx-2.3/audio-to-video

Pricing$0.10 per second

Best for

Stylized talking videosMusic-driven visualsFast avatar experiments

Inputs & Outputs

Inputs

Audio URLImage URL (optional)Prompt (optional)

Outputs

Video file

Stylized audio-driven host

Generate a lip-synced talking clip from audio with optional visual direction.

Input

Stylized audio-driven host original

Generated

Stylized audio-driven host generated

Capabilities

Audio-driven generation

Requires audio_url input.
Generates synchronized lip motion from speech.

Creative control

Optional image_url for visual guidance.
Optional prompt for style and scene direction.

Use Cases

Stylized explainers

Create artistic talking-head explainers from voice tracks.

Social content

Turn recorded audio into short-form talking videos quickly.

Prototype avatars

Test character concepts using audio and optional reference images.

Applications

Content creation

Produce stylized talking clips from narration.

Marketing tests

Generate multiple creative variants without filming.

Avatar workflows

Build fast audio-first avatar videos for social channels.

Best Practices

1Use clean, high-volume speech audio to improve lip-sync clarity.
2Provide a clear portrait image when identity consistency matters.
3Keep prompts specific about style, framing, and motion.

Frequently Asked Questions

What inputs are required for LTX 2.3?

audio_url is required. image_url and prompt are optional controls.

What does the model output?

It outputs a generated video file.

Where is this model hosted?

The model runs on FAL.ai via fal.run/fal-ai/ltx-2.3/audio-to-video.