-
Notifications
You must be signed in to change notification settings - Fork 72
Add LTX-2.3 text-to-video generation support #402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
prishajain1
wants to merge
11
commits into
main
Choose a base branch
from
ltx23
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
6e5961c
Add LTX2.3 model support
prishajain1 b72e262
fast path (CFG) block scanning with stg disabled
prishajain1 737faf8
feat(ltx23): add enable_ml_diagnostics and profiling options to ltx2_…
prishajain1 4e327ca
config: clear profiler_gcs_path by default in ltx2 and ltx23 configs
prishajain1 4360984
perf(ltx23): restore main branch sharding (None, mlp) to NNXSimpleFee…
prishajain1 60107e3
refactor(ltx2): replace hardcoded upsample and downsample types with …
prishajain1 a1d71a0
docs(ltx2): document supported CFG and STG combinations in pipeline
prishajain1 9b44c13
docs(ltx2): add class docstring to LTX2AdaLayerNormSingle explaining …
prishajain1 d7bb228
mel bins default value updated
prishajain1 4121569
fix(ltx2): use replicated audio sharding in pipeline to resolve Indiv…
prishajain1 8c96c5f
test(ltx2): pass max_sequence_length to pipeline in smoke test to pre…
prishajain1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -52,6 +52,7 @@ | |
| WAN2_1 = "wan2.1" | ||
| WAN2_2 = "wan2.2" | ||
| LTX2_VIDEO = "ltx2_video" | ||
| LTX2_3 = "ltx2.3" | ||
|
|
||
| WAN_MODEL = WAN2_1 | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,161 @@ | ||
| #hardware | ||
| hardware: 'tpu' | ||
| skip_jax_distributed_system: False | ||
| attention: 'flash' | ||
| a2v_attention_kernel: 'flash' | ||
| v2a_attention_kernel: 'dot_product' | ||
| attention_sharding_uniform: True | ||
| precision: 'bf16' | ||
| scan_layers: True | ||
| names_which_can_be_saved: [] | ||
| names_which_can_be_offloaded: [] | ||
| remat_policy: "NONE" | ||
|
|
||
| jax_cache_dir: '' | ||
| weights_dtype: 'bfloat16' | ||
| activations_dtype: 'bfloat16' | ||
|
|
||
| run_name: 'ltx2_inference' | ||
| output_dir: '' | ||
| config_path: '' | ||
| save_config_to_gcs: False | ||
|
|
||
| #Checkpoints | ||
| max_sequence_length: 1024 | ||
| sampler: "from_checkpoint" | ||
|
|
||
| # Generation parameters (aligned with Diffusers LTX-2.3 docs: use_cross_timestep, modality + audio CFG) | ||
| global_batch_size_to_train_on: 1 | ||
| num_inference_steps: 40 | ||
| guidance_scale: 3.0 | ||
| guidance_rescale: 0.7 | ||
| audio_guidance_scale: 7.0 | ||
| audio_guidance_rescale: 0.7 | ||
| stg_scale: 1.0 | ||
| audio_stg_scale: 1.0 | ||
| modality_scale: 3.0 | ||
| audio_modality_scale: 3.0 | ||
| use_cross_timestep: true | ||
| spatio_temporal_guidance_blocks: [28] | ||
| fps: 24 | ||
| pipeline_type: multi-scale | ||
| prompt: "A man in a brightly lit room talks on a vintage telephone. In a low, heavy voice, he says, 'I understand. I won't call again. Goodbye.' He hangs up the receiver and looks down with a sad expression. He holds the black rotary phone to his right ear with his right hand, his left hand holding a rocks glass with amber liquid. He wears a brown suit jacket over a white shirt, and a gold ring on his left ring finger. His short hair is neatly combed, and he has light skin with visible wrinkles around his eyes. The camera remains stationary, focused on his face and upper body. The room is brightly lit by a warm light source off-screen to the left, casting shadows on the wall behind him. The scene appears to be from a dramatic movie." | ||
| negative_prompt: "shaky, glitchy, low quality, worst quality, deformed, distorted, disfigured, motion smear, motion artifacts, fused fingers, bad anatomy, weird hand, ugly, transition, static." | ||
| height: 512 | ||
| width: 768 | ||
| decode_timestep: 0.05 | ||
| decode_noise_scale: 0.025 | ||
| noise_scale: 0.0 | ||
| num_frames: 121 | ||
| quantization: "int8" | ||
| scan_diffusion_loop: True | ||
| use_bwe: True | ||
| #parallelism | ||
| mesh_axes: ['data', 'fsdp', 'context', 'tensor'] | ||
| logical_axis_rules: [ | ||
| ['batch', ['data', 'fsdp']], | ||
| ['activation_batch', ['data', 'fsdp']], | ||
| ['activation_self_attn_heads', ['context', 'tensor']], | ||
| ['activation_cross_attn_q_length', ['context', 'tensor']], | ||
| ['activation_length', 'context'], | ||
| ['activation_heads', 'tensor'], | ||
| ['mlp','tensor'], | ||
| ['embed', ['context', 'fsdp']], | ||
| ['heads', 'tensor'], | ||
| ['norm', 'tensor'], | ||
| ['conv_batch', ['data', 'context', 'fsdp']], | ||
| ['out_channels', 'tensor'], | ||
| ['conv_out', 'context'], | ||
| ] | ||
| data_sharding: ['data', 'fsdp', 'context', 'tensor'] | ||
|
|
||
| dcn_data_parallelism: 1 # recommended DCN axis to be auto-sharded | ||
| dcn_fsdp_parallelism: -1 | ||
|
|
||
| flash_block_sizes: { | ||
| block_q: 2048, | ||
| block_kv: 2048, | ||
| block_kv_compute: 1024, | ||
| block_q_dkv: 2048, | ||
| block_kv_dkv: 2048, | ||
| block_kv_dkv_compute: 2048, | ||
| use_fused_bwd_kernel: True, | ||
| } | ||
| flash_min_seq_length: 4096 | ||
| dcn_context_parallelism: 1 | ||
| dcn_tensor_parallelism: 1 | ||
| ici_data_parallelism: 1 | ||
| ici_fsdp_parallelism: 1 | ||
| ici_context_parallelism: -1 # recommended ICI axis to be auto-sharded | ||
| ici_tensor_parallelism: 1 | ||
| enable_profiler: False | ||
|
|
||
| # ML Diagnostics settings | ||
| enable_ml_diagnostics: True | ||
| profiler_gcs_path: "" | ||
| enable_ondemand_xprof: True | ||
| skip_first_n_steps_for_profiler: 0 | ||
| profiler_steps: 5 | ||
|
|
||
| replicate_vae: False | ||
|
|
||
| allow_split_physical_axes: False | ||
| learning_rate_schedule_steps: -1 | ||
| max_train_steps: 500 | ||
| pretrained_model_name_or_path: 'dg845/LTX-2.3-Diffusers' | ||
| model_name: "ltx2.3" | ||
| model_type: "T2V" | ||
| unet_checkpoint: '' | ||
| checkpoint_dir: "" | ||
| dataset_name: '' | ||
| train_split: 'train' | ||
| dataset_type: 'tfrecord' | ||
| cache_latents_text_encoder_outputs: True | ||
| per_device_batch_size: 1.0 | ||
| compile_topology_num_slices: -1 | ||
| quantization_local_shard_count: -1 | ||
| use_qwix_quantization: False | ||
| weight_quantization_calibration_method: "absmax" | ||
| act_quantization_calibration_method: "absmax" | ||
| bwd_quantization_calibration_method: "absmax" | ||
| qwix_module_path: ".*" | ||
| jit_initializers: True | ||
| enable_single_replica_ckpt_restoring: False | ||
| seed: 10 | ||
| audio_format: "s16" | ||
|
|
||
| # LoRA parameters | ||
| enable_lora: False | ||
|
|
||
| # Distilled LoRA | ||
| # lora_config: { | ||
| # lora_model_name_or_path: ["Lightricks/LTX-2"], | ||
| # weight_name: ["ltx-2-19b-distilled-lora-384.safetensors"], | ||
| # adapter_name: ["distilled-lora-384"], | ||
| # rank: [384] | ||
| # } | ||
|
|
||
| # Standard LoRA | ||
| lora_config: { | ||
| lora_model_name_or_path: ["Lightricks/LTX-2-19b-LoRA-Camera-Control-Dolly-In"], | ||
| weight_name: ["ltx-2-19b-lora-camera-control-dolly-in.safetensors"], | ||
| adapter_name: ["camera-control-dolly-in"], | ||
| rank: [32] | ||
| } | ||
|
|
||
|
|
||
| # LTX-2 Latent Upsampler | ||
| run_latent_upsampler: False | ||
| upsampler_model_path: "Lightricks/LTX-2.3" | ||
| # Following upsampler files are supported: | ||
| # ltx-2.3-spatial-upscaler-x2-1.0.safetensors | ||
| # ltx-2.3-spatial-upscaler-x2-1.1.safetensors | ||
| # ltx-2.3-spatial-upscaler-x1.5-1.0.safetensors | ||
| # ltx-2.3-temporal-upscaler-x2-1.0.safetensors | ||
| upsampler_filename: "ltx-2.3-spatial-upscaler-x2-1.0.safetensors" | ||
| upsampler_spatial_patch_size: 1 | ||
| upsampler_temporal_patch_size: 1 | ||
| upsampler_adain_factor: 0.0 | ||
| upsampler_tone_map_compression_ratio: 0.0 | ||
| upsampler_rational_spatial_scale: 2.0 | ||
| upsampler_output_type: "pil" | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.