Add Qwen3-Next to checkpoint util & update test scripts#2973
Open
Rohan-Bierneni wants to merge 1 commit intomainfrom
Open
Add Qwen3-Next to checkpoint util & update test scripts#2973Rohan-Bierneni wants to merge 1 commit intomainfrom
Rohan-Bierneni wants to merge 1 commit intomainfrom
Conversation
4 tasks
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
8454259 to
866ee4e
Compare
parambole
reviewed
Jan 26, 2026
tests/end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/2_test_qwen3_next_80b_a3b.sh
Outdated
Show resolved
Hide resolved
parambole
reviewed
Jan 26, 2026
tests/end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/2_test_qwen3_next_80b_a3b.sh
Show resolved
Hide resolved
parambole
reviewed
Jan 26, 2026
tests/end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/2_test_qwen3_next_80b_a3b.sh
Outdated
Show resolved
Hide resolved
parambole
reviewed
Jan 26, 2026
parambole
reviewed
Jan 26, 2026
Collaborator
parambole
left a comment
There was a problem hiding this comment.
I have left a couple of comments. PTAL.
shuningjin
reviewed
Jan 27, 2026
Collaborator
There was a problem hiding this comment.
Thanks for adding the model to conversion tool, along with careful logit checks! Left a minor comment.
- For future reference, could you also add the conversion commands to the PR description? Would be nice to also add the conversion time in description, if you have it. Thank you!
For test script 2_test_qwen3_next_80b_a3b.sh:
- Maybe also add pre-training and finetuning (example). Training was omitted from DS3 as covered by ubench.
- Could you test this script and attach log to description?
- Thanks for updating the description. Maybe update PR title as well to accurately reflect the change: e.g., add "update test scripts".
- Will this be added to XLML in the other repo?
end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/1_test_qwen3_next_80b_a3b.sh
Outdated
Show resolved
Hide resolved
end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/1_test_qwen3_next_80b_a3b.sh
Outdated
Show resolved
Hide resolved
end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/1_test_qwen3_next_80b_a3b.sh
Show resolved
Hide resolved
4fd58f5 to
a6f97ae
Compare
8be00bd to
fa0242d
Compare
shuningjin
reviewed
Feb 20, 2026
be3d986 to
b668efa
Compare
shuningjin
approved these changes
Feb 20, 2026
Collaborator
shuningjin
left a comment
There was a problem hiding this comment.
Thanks for the great work! LGTM
parambole
approved these changes
Feb 20, 2026
b668efa to
aea63ab
Compare
Lower the decoding length to 128 oinstead of 512 update test script Add manual calcs for each hf shape instead of hardcoded values Run pylint Fix config values and remove unused vars Fix /configs path in script Fix decode path in script Update model ReadMe Update scripts to use new train path Reset qwen3 test files to match main Undo the temp fix to get training working Update reshape function to what other models use remove whitespaces
3cf0b4e to
f75c7e2
Compare
Collaborator
Author
|
Manually adding pull_ready tag since linter issues are from untouched files. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
[Ckpt Conversion] Support Qwen3-Next in Unified Checkpoint Conversion Utility
This PR migrates the Qwen3-Next (qwen3-next-80b-a3b
) from standalone conversion scripts to the centralizedMaxText.utils.ckpt_conversion` library.Previously, Qwen3-Next relied on ad-hoc scripts for checkpointing. Moving this to the unified utility enables:
Changes
src/MaxText/utils/ckpt_conversion/utils/hf_model_configs.py: Addedqwen3_next_80b_a3b_configusingtransformers.Qwen3NextConfigand registered it inHF_MODEL_CONFIGS.src/MaxText/utils/ckpt_conversion/utils/param_mapping.py:QWEN3_NEXT_MAXTEXT_TO_HF_PARAM_MAPPING: Handles the inhomogeneous layer cycle (mapping Full Attention vs. Linear/Hybrid Attention blocks based on layer index) and MoE components (Shared vs. Routed experts).QWEN3_NEXT_MAXTEXT_TO_HF_PARAM_HOOK_FNwith robust tensor handling:identityhooks) for 1D parameters likeA_log(shape[1]). This ensures that the scan axis is correctly handled during conversion (e.g., transforming to[1, 12]where appropriate) rather than incorrectly collapsing to[1,].permute_convto correctly handleconv1dkernels (HF:[C, 1, K]<-> MT:[K, 1, C]). This prevents dimensions with value1from being incorrectly squeezed or flattened during the permutation process.src/MaxText/utils/ckpt_conversion/utils/hf_shape.py: AddedQWEN3_NEXT_HF_WEIGHTS_TO_SHAPEto calculate expected HF tensor shapes for validation.end_to_end/tpu/qwen/next/...:1_test_qwen3_next_80b_a3b.shto usepython3 -m MaxText.utils.ckpt_conversion.to_maxtextinstead of the legacy script.2_test_qwen3_next_80b_a3b.shfor XLML tests to consume for forward_pass & decode verification.If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/469445683
Tests
The commands used to generate the checkpoints themselves: https://paste.googleplex.com/4921565475110912
Will run forward pass logit checker on converted checkpoint from Maxtext -> HF -> Maxtext for scanned and post results here:
Current status:
to_maxtext tests:
hf -> maxtext (scanned): https://paste.googleplex.com/5151438898593792
hf -> maxtext (unscanned): https://paste.googleplex.com/4721564912320512
to_huggingface tests:
Convert scanned & unscanned maxtext checkpoints from previous tests to hf format. Run forward_pass check against new hf checkpoints and existing maxtext checkpoints.
Maxtext (scanned) -> HF: https://paste.googleplex.com/4787924765900800
Maxtext (unscanned) -> HF: https://paste.googleplex.com/5256341314732032
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.