feat: support qwen3 next with muon by Prayer3th · Pull Request #2875 · AI-Hypercomputer/maxtext

Prayer3th · 2025-12-23T07:12:09Z

Description

support qwen3-next use muon optimizer

Tests

tests/muon_test.py

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

google-cla · 2025-12-23T07:12:14Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

RissyRan

LGTM! A few minor comments. Thank you for the change!

RissyRan · 2025-12-23T18:31:02Z

  model_name_arg = sys.argv[1]
  scan_layers_arg = sys.argv[2].lower() == "true"
-  get_model_mdn(model_name_arg, scan_layers_arg, verbose=True)
+  get_model_mdn(model_name_arg, scan_layers_arg, verbose=True)


nit: extra line, similar for other files.

RissyRan · 2025-12-23T18:54:31Z

-    An instance of `MuonDimensionNumbers` if a specific mapping is found,
-    `None` for excluded parameters, or a default `mdn` for standard weights.
+  1. Exclusions: Skip vectors/biases/embeddings (AdamW).
+  2. MoE: Handle both DeepSeek style (MoeBlock_0) and Qwen3-Next style (routed_experts).


Nit: quote the checkpoint name MoE: Handle both DeepSeek style ("MoeBlock_0") and Qwen3-Next style ("routed_experts").

RissyRan · 2025-12-23T19:07:21Z

@@ -48,51 +48,69 @@ def _is_path_contain_any(tuples, path):
 def transform_logic(path: Tuple[str, ...]) -> Optional[mdn]:


@shuningjin Could we have a refactor PR to pass in model name as follow up? I have seen checkpoint name divergence. It will be better we transform weights based on model, instead of checkpoint path.

RissyRan

Could you pass the screenshot or log, to train Qwen3 with Muon optimizer end-to-end?

codecov · 2026-01-22T05:05:00Z

Codecov Report

❌ Patch coverage is 0% with 22 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/MaxText/muon_utils.py	0.00%	22 Missing ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-02-21T16:07:47Z

This PR has been automatically marked as stale because it has not had recent activity. It will be closed soon if no further activity occurs. Thank you for your contributions.

github-actions · 2026-03-10T16:25:06Z

This PR was closed because it has been inactive for a while. Please reopen it if you are still working on it.

support qwen3 next with muon

f14c40d

Prayer3th requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners December 23, 2025 07:12

RissyRan assigned shuningjin, parambole and Rohan-Bierneni Dec 23, 2025

RissyRan reviewed Dec 23, 2025

View reviewed changes

github-actions Bot added the stale Automatically applied to stale PRs. label Feb 21, 2026

github-actions Bot closed this Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support qwen3 next with muon#2875

feat: support qwen3 next with muon#2875
Prayer3th wants to merge 1 commit intoAI-Hypercomputer:mainfrom
primatrix:feat/support-muon

Prayer3th commented Dec 23, 2025

Uh oh!

google-cla Bot commented Dec 23, 2025

Uh oh!

RissyRan left a comment

Uh oh!

RissyRan Dec 23, 2025

Uh oh!

RissyRan Dec 23, 2025

Uh oh!

RissyRan Dec 23, 2025

Uh oh!

RissyRan left a comment

Uh oh!

codecov Bot commented Jan 22, 2026

Uh oh!

github-actions Bot commented Feb 21, 2026

Uh oh!

github-actions Bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		@@ -48,51 +48,69 @@ def _is_path_contain_any(tuples, path):
		def transform_logic(path: Tuple[str, ...]) -> Optional[mdn]:

Conversation

Prayer3th commented Dec 23, 2025

Description

Tests

Checklist

Uh oh!

google-cla Bot commented Dec 23, 2025

Uh oh!

RissyRan left a comment

Choose a reason for hiding this comment

Uh oh!

RissyRan Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

RissyRan Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

RissyRan Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

RissyRan left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jan 22, 2026

Codecov Report

Uh oh!

github-actions Bot commented Feb 21, 2026

Uh oh!

github-actions Bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants