-
Notifications
You must be signed in to change notification settings - Fork 286
Enable Qwen3.5-MoE PTQ #897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4b6b9cb
2c57e62
ad271ef
ef7149a
aa18c20
6e85141
811ef61
bd029e5
86d35a5
c7bb291
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -734,6 +734,107 @@ def forward( | |||||||||||||
| return next_states | ||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
| class _Qwen35MoeExpertModule(nn.Module): | ||||||||||||||
| """Container for a single Qwen3.5 MoE expert's linear layers. | ||||||||||||||
|
|
||||||||||||||
| Produces the naming pattern: experts.{id}.gate_proj.weight | ||||||||||||||
| (consistent with standard Qwen3 MoE per-expert module structure). | ||||||||||||||
| """ | ||||||||||||||
|
|
||||||||||||||
| def __init__(self, hidden_dim: int, expert_dim: int): | ||||||||||||||
| super().__init__() | ||||||||||||||
| self.gate_proj = nn.Linear(hidden_dim, expert_dim, bias=False) | ||||||||||||||
| self.up_proj = nn.Linear(hidden_dim, expert_dim, bias=False) | ||||||||||||||
| self.down_proj = nn.Linear(expert_dim, hidden_dim, bias=False) | ||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
| class _QuantQwen35MoeExperts(QuantModule): | ||||||||||||||
| def _setup(self): | ||||||||||||||
| """Modify the Qwen3_5MoeExperts by using per-expert nn.Module containers. | ||||||||||||||
|
|
||||||||||||||
| This produces the naming pattern: experts.{id}.gate_proj.weight | ||||||||||||||
| (consistent with standard Qwen3 MoE). | ||||||||||||||
| """ | ||||||||||||||
| from accelerate import init_empty_weights | ||||||||||||||
|
|
||||||||||||||
| dtype, device = self.gate_up_proj.dtype, self.gate_up_proj.device | ||||||||||||||
|
|
||||||||||||||
| def _copy_weight(module, weight): | ||||||||||||||
| module.to_empty(device=device) | ||||||||||||||
| with torch.no_grad(): | ||||||||||||||
| module.weight.data = weight.detach().data.to(dtype=dtype, device=device) | ||||||||||||||
|
|
||||||||||||||
| expert_dim = self.intermediate_dim | ||||||||||||||
|
||||||||||||||
| expert_dim = self.intermediate_dim | |
| # Support both `intermediate_size` and `intermediate_dim` depending on the model config. | |
| if hasattr(self, "intermediate_size"): | |
| expert_dim = self.intermediate_size | |
| else: | |
| expert_dim = self.intermediate_dim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serialize global patching to avoid cross-export races.
The patch/unpatch sequence mutates module globals process-wide. Concurrent exports can interleave and restore the wrong function, causing flaky behavior.
🔒 Proposed fix (serialize patch window)
📝 Committable suggestion
🤖 Prompt for AI Agents