-
Notifications
You must be signed in to change notification settings - Fork 650
Pull requests: InternLM/lmdeploy
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Add] make llm-compressor symmetric model inference in TurboMind
#4305
opened Jan 28, 2026 by
43758726
Loading…
Compatible with transformers 5.0 at TurboMind side
improvement
#4304
opened Jan 28, 2026 by
lvhan028
Loading…
change ascend paged attention from BSH format to TND format for better performace
#4295
opened Jan 27, 2026 by
jinminxi104
•
Draft
Support ignore layers in quant config for qwen3 models
improvement
#4293
opened Jan 26, 2026 by
RunningLeon
Loading…
feat: implement online bf16-to-fp8 conversion and inference in TurboMind
improvement
#4237
opened Dec 25, 2025 by
43758726
Loading…
Support fp32 head for qwen and internlm models
improvement
#4160
opened Nov 27, 2025 by
RunningLeon
Loading…
Add step_map to track token decoding order in DLLM
#4057
opened Oct 21, 2025 by
Auraithm
Loading…
4 tasks done
quant blocked fp8
enhancement
New feature or request
#4018
opened Sep 29, 2025 by
CUHKSZzxy
Loading…
4 of 5 tasks
Previous Next
ProTip!
Exclude everything labeled
bug with -label:bug.