1. Upload photo
2. Choose Model
3. Add Script
Trusted by teams
Overview
LTX 2.3 (LTX Video Pro) is an audio-driven generation model from FAL.ai that turns speech into stylized video with synchronized lip motion. It supports optional image and prompt guidance for controlled creative outputs.
Highlights
- Audio-to-video generation with synchronized lip motion.
- Supports stylized video output with optional prompt guidance.
- Optional image conditioning for character or scene direction.
- Endpoint: fal.run/fal-ai/ltx-2.3/audio-to-video.
Quick Specifications
Best for
Inputs & Outputs
Stylized audio-driven host
Generate a lip-synced talking clip from audio with optional visual direction.
Capabilities
Audio-driven generation
- Requires audio_url input.
- Generates synchronized lip motion from speech.
Creative control
- Optional image_url for visual guidance.
- Optional prompt for style and scene direction.
Use Cases
Stylized explainers
Create artistic talking-head explainers from voice tracks.
Social content
Turn recorded audio into short-form talking videos quickly.
Prototype avatars
Test character concepts using audio and optional reference images.
Applications
Content creation
Produce stylized talking clips from narration.
Marketing tests
Generate multiple creative variants without filming.
Avatar workflows
Build fast audio-first avatar videos for social channels.
Best Practices
- 1Use clean, high-volume speech audio to improve lip-sync clarity.
- 2Provide a clear portrait image when identity consistency matters.
- 3Keep prompts specific about style, framing, and motion.
Frequently Asked Questions
What inputs are required for LTX 2.3?
audio_url is required. image_url and prompt are optional controls.
What does the model output?
It outputs a generated video file.
Where is this model hosted?
The model runs on FAL.ai via fal.run/fal-ai/ltx-2.3/audio-to-video.
