LipsyncX
Audio-to-VideoNew

LTX 2.3

FAL.ai audio-to-video model for lip sync and stylized talking videos.

1. Upload photo

2. Choose Model

3. Add Script

20 credits
Billing unit10 credits / 5s
Billing units2
Estimated length8s
Est. total20 credits
Uses real audio duration when available.
87 / 1000

Overview

LTX 2.3 (LTX Video Pro) is an audio-driven generation model from FAL.ai that turns speech into stylized video with synchronized lip motion. It supports optional image and prompt guidance for controlled creative outputs.

Highlights

  • Audio-to-video generation with synchronized lip motion.
  • Supports stylized video output with optional prompt guidance.
  • Optional image conditioning for character or scene direction.
  • Endpoint: fal.run/fal-ai/ltx-2.3/audio-to-video.

Quick Specifications

Primary useAudio-to-video lip sync
ProviderFAL.ai
Endpointhttps://fal.run/fal-ai/ltx-2.3/audio-to-video
Pricing$0.10 per second

Best for

Stylized talking videosMusic-driven visualsFast avatar experiments

Inputs & Outputs

Inputs
Audio URLImage URL (optional)Prompt (optional)
Outputs
Video file

Stylized audio-driven host

Generate a lip-synced talking clip from audio with optional visual direction.

Input
Stylized audio-driven host original
Generated
Stylized audio-driven host generated

Capabilities

Audio-driven generation

  • Requires audio_url input.
  • Generates synchronized lip motion from speech.

Creative control

  • Optional image_url for visual guidance.
  • Optional prompt for style and scene direction.

Use Cases

Stylized explainers

Create artistic talking-head explainers from voice tracks.

Social content

Turn recorded audio into short-form talking videos quickly.

Prototype avatars

Test character concepts using audio and optional reference images.

Applications

Content creation

Produce stylized talking clips from narration.

Marketing tests

Generate multiple creative variants without filming.

Avatar workflows

Build fast audio-first avatar videos for social channels.

Best Practices

  1. 1Use clean, high-volume speech audio to improve lip-sync clarity.
  2. 2Provide a clear portrait image when identity consistency matters.
  3. 3Keep prompts specific about style, framing, and motion.

Frequently Asked Questions

What inputs are required for LTX 2.3?

audio_url is required. image_url and prompt are optional controls.

What does the model output?

It outputs a generated video file.

Where is this model hosted?

The model runs on FAL.ai via fal.run/fal-ai/ltx-2.3/audio-to-video.