Skip to content

fix: resolve issue #13811#13813

Open
Zhu1116 wants to merge 4 commits into
huggingface:mainfrom
Zhu1116:fix/issue-13811
Open

fix: resolve issue #13811#13813
Zhu1116 wants to merge 4 commits into
huggingface:mainfrom
Zhu1116:fix/issue-13811

Conversation

@Zhu1116
Copy link
Copy Markdown

@Zhu1116 Zhu1116 commented May 26, 2026

What does this PR do?

Fixes #13811

I found an issue in the cond ids processing in train_dreambooth_lora_flux2_img2img.py in diffusers 0.38.0, around lines 1703-1709 of the package. This issue also exists in the main branch.

With the original code:

cond_model_input_list = [cond_model_input[i].unsqueeze(0) for i in range(cond_model_input.shape[0])]
cond_model_input_ids = Flux2Pipeline._prepare_image_ids(cond_model_input_list).to(
    device=cond_model_input.device
)
cond_model_input_ids = cond_model_input_ids.view(
    cond_model_input.shape[0], -1, model_input_ids.shape[-1]
)

When batch size is 2, the output cond_model_input_ids looks like this:

tensor([[[10,  0,  0,  0],
         [10,  0,  1,  0],
         [10,  0,  2,  0],
         ...,
         [10, 31, 29,  0],
         [10, 31, 30,  0],
         [10, 31, 31,  0]],

        [[20,  0,  0,  0],
         [20,  0,  1,  0],
         [20,  0,  2,  0],
         ...,
         [20, 31, 29,  0],
         [20, 31, 30,  0],
         [20, 31, 31,  0]]], device='cuda:0')

However, cond ids within the same batch should not be different.
Flux2Pipeline._prepare_image_ids is designed for multiple conditioning images from the same sample.

With the fixed code:

model_input_ids = Flux2Pipeline._prepare_latent_ids(model_input).to(device=model_input.device)
cond_model_input_ids = Flux2Pipeline._prepare_image_ids([cond_model_input[0:1]]).to(
    device=cond_model_input.device
)
cond_model_input_ids = cond_model_input_ids.expand(
    cond_model_input.shape[0], -1, -1
)

The output becomes correct (same cond ids for the whole batch):

tensor([[[10,  0,  0,  0],
         [10,  0,  1,  0],
         [10,  0,  2,  0],
         ...,
         [10, 31, 29,  0],
         [10, 31, 30,  0],
         [10, 31, 31,  0]],

        [[10,  0,  0,  0],
         [10,  0,  1,  0],
         [10,  0,  2,  0],
         ...,
         [10, 31, 29,  0],
         [10, 31, 30,  0],
         [10, 31, 31,  0]]], device='cuda:0')

I'm sorry if I have misunderstood the underlying logic.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@github-actions github-actions Bot added fixes-issue examples size/S PR with diff < 50 LOC and removed fixes-issue labels May 26, 2026
@Zhu1116
Copy link
Copy Markdown
Author

Zhu1116 commented May 28, 2026

@sayakpaul Could you please review this PR? It follows up on #12855. Thanks a lot!

@sayakpaul
Copy link
Copy Markdown
Member

@askserge could you do a review?

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤗 Serge says:

The fix is correct and well-motivated. _prepare_image_ids is designed to take a list of conditioning images for the same sample (each getting a unique T-coordinate offset: 10, 20, 30…). The old code incorrectly split the batch dimension into this list, causing batch element 0 to get T=10 and batch element 1 to get T=20, when they should all share the same T=10 coordinate structure.

The fix passes only the first batch element [cond_model_input[0:1]] to produce a single (1, H*W, 4) ID tensor, then .expand()s it across the batch — matching the pipeline's own pattern (which uses .repeat(batch_size, 1, 1) at line 683 of pipeline_flux2.py). Using .expand() instead of .repeat() is fine since the IDs are read-only downstream.

No issues found.

7 LLM turns · 7 tool calls · 29.0s · 59557 in / 1233 out tokens

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Zhu1116
Copy link
Copy Markdown
Author

Zhu1116 commented May 28, 2026

@sayakpaul I have fixed the code formatting issues and also updated the same problem in train_dreambooth_lora_flux2_klein_img2img.py. Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix incorrect batch handling in _prepare_image_ids usage in train_dreambooth_lora_flux2_img2img.py

3 participants