Fix: Normalize batch inputs to 5D tensors for Qwen-Image-Edit #12698

akshan-main · 2025-11-22T04:42:32Z

What does this PR do?

Enables robust batch inference for QwenImageEditPlusPipeline by normalizing input tensor shapes, implementing a "resize" strategy, and handling variable-length prompt embeddings. Adds a new batch inference capability that did not exist previously. Also accepts list, tuples as input now, which didn't exist previously. Enables QwenImageEditPlusPipeline to be used for parallel production workflows.

Description

Addresses issue #12458.

I identified four blockers preventing batch inference in the current pipeline:

5D Tensor Requirement: The underlying VAE for Qwen2-VL model treats inputs as video (B, C, F, H, W) when batched. The pipeline was passing 4D tensors (B, C, H, W), causing immediate shape mismatches.
- Fix: Added a pre-processing step to explicitly handle the frame dimension for static images when batch_size > 1.
Tensors require images in a batch to be of equal size
- Fix: Implemented a Resize Strategy.
  - Single/Uniform Images: Preserves original aspect ratio and resolution (rounded to nearest 32).
  - Mixed Batches: Forces images to a standard resolution (e.g., 1024x1024) or User Defined height/width(first priority) to ensure tensor stackability without padding artifacts.[removed padding methodology from my previous commit and went ahead with upscaling/ resizing.
Tokenizer Batching Issues: The Qwen2VLProcessor produces variable-length embeddings for different prompts, which caused RuntimeError or IndexError when trying to batch encode them directly.
- Fix: Refactored encode_prompt to process prompts individually in a loop, then pad the resulting embeddings up to the maximum sequence length in the batch before concatenating.
The pipeline would crash if users accidentally passed a tuple of images.
- Fix: Added _sanitize_images which recursively unwraps inputs into a clean list.

Note on Batching Logic

To resolve the ambiguity between "Multi-Image Conditioning" and "Batch Inference", I implemented the following routing logic in encode_prompt:

Single String Prompt (prompt="string"):
- Behavior: Joint Condition. The pipeline treats all provided images as a single context for one generation task.
- Use Case: Style transfer or merging elements from multiple reference images.
List of Prompts (prompt=["s1", "s2"]):
- Behavior: Parallel Batch. The pipeline maps images to prompts 1-to-1.
- Use Case: Processing a dataset (e.g., editing 50 different images with 50 different instructions at once).

Fixes #12458

Before submitting

This PR fixes a typo or improves the docs.
Did you read the contributor guideline?
Did you read our philosophy doc?
Was this discussed/approved via a GitHub issue? (Issue [Qwen-image-edit] Batch Inference Issue / Feature Request #12458)
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests? (Verified via reproduction script)

Who can review?

@yiyixuxu @sayakpaul @DN6

akshan-main · 2025-11-28T03:49:55Z

Hey @sayakpaul @yiyixuxu, let me know if I have to make any changes. But the functionality works as intended!

yiyixuxu · 2025-12-08T09:55:18Z

the mask output from encode_prompt is not used in attention calculation https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_qwenimage.py#L338 (it's always None here)

we will not move forward with this PR.
We have a PR address a similar issue #12702 and we will focus on that instead:)

akshan-main · 2025-12-08T10:01:36Z

Thanks for the clarification, that helps.

Edit:
Regarding #12702: that PR fixes variable prompt-length handling inside the transformer, but it does not by itself turn QwenImageEditPlusPipeline into the dataset-style batch API feature requested in #12458, where image[i] is edited with prompt[i] in parallel. This PR is what adds that behavior at the pipeline level by introducing 1:1 image-prompt routing, tuple/list input sanitization, and a concrete strategy for mixed-resolution images so batches can actually be stacked without errors. In that sense, I believe #12702 solves the internal masking issue, while this PR is still required if the goal is to support the practical batched-inference workflow that originally motivated #12458. This PR address the requested feature addition.

akshan-main added 2 commits November 21, 2025 20:27

Fix: Normalize batch inputs to 5D tensors for Qwen-Image-Edit

0c49735

Fix: Handle variable sequence lengths in batch inference via padding

1b85230

akshan-main mentioned this pull request Nov 22, 2025

[Qwen-image-edit] Batch Inference Issue / Feature Request #12458

Open

akshan-main added 4 commits November 22, 2025 07:23

Fix: Add routing logic for 1-to-1 batch inference and tokenizer padding

da78501

Fix: Robust batch inference with padding and 1-to-1 stacking

2b05da8

Qwen-Image-Edit robust batch inference with padding

f84b079

Fix Qwen batching: final logic for input resizing and robust inference

22dd1cb

sayakpaul requested a review from yiyixuxu November 24, 2025 06:12

Merge branch 'main' into fix-qwen-image-batching

474b0e9

sayakpaul mentioned this pull request Dec 8, 2025

The Diffusers MVP 🚀 #12635

Open

sayakpaul requested a review from DN6 December 8, 2025 05:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Normalize batch inputs to 5D tensors for Qwen-Image-Edit #12698

Fix: Normalize batch inputs to 5D tensors for Qwen-Image-Edit #12698

akshan-main commented Nov 22, 2025 •

edited

Loading

Uh oh!

akshan-main commented Nov 28, 2025

Uh oh!

yiyixuxu commented Dec 8, 2025

Uh oh!

akshan-main commented Dec 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix: Normalize batch inputs to 5D tensors for Qwen-Image-Edit #12698

Are you sure you want to change the base?

Fix: Normalize batch inputs to 5D tensors for Qwen-Image-Edit #12698

Conversation

akshan-main commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Description

Note on Batching Logic

Before submitting

Who can review?

Uh oh!

akshan-main commented Nov 28, 2025

Uh oh!

yiyixuxu commented Dec 8, 2025

Uh oh!

akshan-main commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

akshan-main commented Nov 22, 2025 •

edited

Loading

akshan-main commented Dec 8, 2025 •

edited

Loading