-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Modified openai_server.py file to have control over KV cache memory size.
#9763
opened Dec 6, 2025 by
AzizulZahid
Loading…
1 task
[#9760][fix] Use RequestError for validation errors to prevent engine shutdown
Community want to contribute
PRs initiated from Community
#9761
opened Dec 6, 2025 by
tzulingk
Loading…
[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism
#9757
opened Dec 5, 2025 by
brb-nv
Loading…
1 task done
[#9753][feat] AutoDeploy: Implement add rms_norm fusion
#9754
opened Dec 5, 2025 by
nvchenghaoz
Loading…
[None] Fix warning when capturing CUDA graph
#9746
opened Dec 5, 2025 by
ziyixiong-nv
Loading…
1 task
[None][infra] Fail fast if SLURM entrypoint fails
#9744
opened Dec 5, 2025 by
mlefeb01
Loading…
1 task
[None][fix] Fix two tuning cache miss issues.
#9743
opened Dec 5, 2025 by
hyukn
Loading…
1 task done
Patch Whisper BLS example
Community want to contribute
PRs initiated from Community
#9742
opened Dec 5, 2025 by
protonicage
Loading…
1 task
[None][test] add ntp tolerance in time metrics verification
#9741
opened Dec 5, 2025 by
zhengd-nv
Loading…
1 task done
[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile
#9740
opened Dec 5, 2025 by
liji-nv
Loading…
1 task done
[None][chore] add multi-node service discovery disagg test
#9738
opened Dec 5, 2025 by
reasonsolo
•
Draft
1 task
[None][fix] unable cuda ipc on device without nvlink (L40s) for disagg test
#9735
opened Dec 5, 2025 by
chuangz0
Loading…
1 task done
[None][feat] update TRT-LLM Gen MoE for NvFp4 + bias with tileN=256
#9734
opened Dec 5, 2025 by
nekorobov
Loading…
1 task done
[https://nvbugs/5644632][fix] add fix_gpu_memory_fraction to disagg gptoss tests
#9732
opened Dec 5, 2025 by
chuangz0
Loading…
1 task done
[feat][chore] Add GB300 dlcluster support since it doesn't support segment
#9731
opened Dec 5, 2025 by
fredricz-20070104
Loading…
[https://nvbugs/5519544][feat] Adaptive number of thread workers when loading model concurrently
#9730
opened Dec 5, 2025 by
jieli-matrix
Loading…
1 task done
[None][feat] add fp4 gemm + allreduce
Community want to contribute
PRs initiated from Community
#9729
opened Dec 5, 2025 by
benzh-2025
Loading…
Previous Next
ProTip!
no:milestone will show everything without a milestone.