feature for triton rerope #497

xinSky00 · 2025-12-10T02:03:11Z

Purpose

This PR implements a Triton kernel for ReRoPE to enable efficient context length extension in vLLM. The ReRoPE-based approach delivers 3-5x speedup for long sequences.

It implements segment-wise attention computation, applying full rotary embeddings within a fixed window while constraining positional encodings beyond the window boundary, enabling models to handle sequences beyond their pre-training length without fine-tuning.

Modifications

New Environment Variable: VLLM_ATTENTION_BACKEND

Must be set to TRITON_ATTN_VLLM_V1 to enable the Triton-based backend
Usage: export VLLM_ATTENTION_BACKEND="TRITON_ATTN_VLLM_V1"
Default: FLASH_ATTN (standard FlashAttention backend)

Modified Model Configuration Parameter: max_position_embeddings

Users must adjust this parameter via --hf-overrides to match their target input length
This ensures the RoPE embeddings are properly computed for extended sequences
Example: --hf-overrides '{"max_position_embeddings": 327680}'

ReRoPE-specific parameters: rerope_window and training_length should be configured based on the model's original pre-training length

These values determine the segment boundaries for attention computation and must align with the model's original training configuration

Test

run the file of offline_inference_rerope.py

os.environ["VLLM_ATTENTION_BACKEND"] = "TRITON_ATTN_VLLM_V1"
os.environ["VLLM_USE_REROPE"] = "true"
model: Qwen2.5-14B-Instruct
Dataset: multifieldqa_zh.jsonl
results
- prompt length: about 130k tokens
- prompt length: about 315k tokens

feature for triton rerope

cda39e7

xinSky00 requested review from harrisonyhq, hek14, mag1c-h, qyh111, wangwenxin0312, wuhuxiao and ygwpz as code owners December 10, 2025 02:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature for triton rerope #497

feature for triton rerope #497

Uh oh!

xinSky00 commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feature for triton rerope #497

Are you sure you want to change the base?

feature for triton rerope #497

Uh oh!

Conversation

xinSky00 commented Dec 10, 2025

Purpose

Modifications

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant