Rare Text Semantics Were Always There in Your Diffusion Transformer

NeurIPS 2025 Accepted | arXiv

Seil Kang*, Woojung Han*, Dayun Ju, Seong Jae Hwang
Yonsei University

Overview

Teaser. Our method, ToRA, achieves superior semantic alignment in text-to-vision outputs for rare prompts while requiring neither finetuning, optimization, nor additional modules; Misfired phrases in the baseline and existing method outputs are highlighted in red.

Starting from flow- and diffusion-based transformers, Multi-modal Diffusion Transformers (MM-DiTs) have reshaped text-to-vision generation, gaining acclaim for exceptional visual fidelity. As these models advance, users continually push the boundary with imaginative or rare prompts, which advanced models still falter in generating, since their concepts are often too scarce to leave a strong imprint during pre-training. In this paper, we propose a simple yet effective intervention that surfaces rare semantics inside MM-DiTs without additional training steps, data, denoising-time optimization, or reliance on external modules (e.g., large language models). In particular, the joint-attention mechanism intrinsic to MM-DiT sequentially updates text embeddings alongside image embeddings throughout transformer blocks. We find that by mathematically expanding representational basins around text token embeddings via variance scale-up before the joint-attention blocks, rare semantics clearly emerge in MM-DiT's outputs. Furthermore, our results generalize effectively across text-to-vision tasks, including text-to-image, text-to-video, and text-driven image editing. Our work invites generative models to reveal the semantics that users intend, once hidden yet ready to surface.

Methods

Implementation

to be updated. stay tuned.

Citation

@misc{kang2025raretextsemanticsdiffusion,
    title={Rare Text Semantics Were Always There in Your Diffusion Transformer},
    author={Seil Kang and Woojung Han and Dayun Ju and Seong Jae Hwang},
    year={2025},
    eprint={2510.03886},
    archivePrefix={arXiv},
    primaryClass={cs.AI},
    url={https://arxiv.org/abs/2510.03886},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Rare Text Semantics Were Always There in Your Diffusion Transformer

NeurIPS 2025 Accepted | arXiv

Overview

Methods

Implementation

Citation

About

Uh oh!

Releases

Packages

seilk/tora

Folders and files

Latest commit

History

Repository files navigation

Rare Text Semantics Were Always There in Your Diffusion Transformer

NeurIPS 2025 Accepted | arXiv

Overview

Methods

Implementation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages