Add Qwen3-Next to checkpoint util & update test scripts by Rohan-Bierneni · Pull Request #2973 · AI-Hypercomputer/maxtext

Rohan-Bierneni · 2026-01-20T19:15:35Z

Description

[Ckpt Conversion] Support Qwen3-Next in Unified Checkpoint Conversion Utility

This PR migrates the Qwen3-Next (qwen3-next-80b-a3b) from standalone conversion scripts to the centralized MaxText.utils.ckpt_conversion` library.

Previously, Qwen3-Next relied on ad-hoc scripts for checkpointing. Moving this to the unified utility enables:

Bidirectional Conversion: Robust support for converting both HF -> MaxText and MaxText -> HF.
Scanned & Unscanned Support: Native handling of scanned layers (optimized for training) and unscanned layers (optimized for decoding/inference).
Maintainability: Centralizes logic for the hybrid attention architecture (interleaved Linear and Full attention layers) within the standard mapping infrastructure.

Changes

src/MaxText/utils/ckpt_conversion/utils/hf_model_configs.py: Added qwen3_next_80b_a3b_config using transformers.Qwen3NextConfig and registered it in HF_MODEL_CONFIGS.
src/MaxText/utils/ckpt_conversion/utils/param_mapping.py:
- Implemented QWEN3_NEXT_MAXTEXT_TO_HF_PARAM_MAPPING: Handles the inhomogeneous layer cycle (mapping Full Attention vs. Linear/Hybrid Attention blocks based on layer index) and MoE components (Shared vs. Routed experts).
- Implemented QWEN3_NEXT_MAXTEXT_TO_HF_PARAM_HOOK_FN with robust tensor handling:
  - Correct Transposition for Scanned 1D Tensors: Added specific handling (using identity hooks) for 1D parameters like A_log (shape [1]). This ensures that the scan axis is correctly handled during conversion (e.g., transforming to [1, 12] where appropriate) rather than incorrectly collapsing to [1,].
  - Preservation of Singleton Dimensions: Implemented permute_conv to correctly handle conv1d kernels (HF: [C, 1, K] <-> MT: [K, 1, C]). This prevents dimensions with value 1 from being incorrectly squeezed or flattened during the permutation process.
src/MaxText/utils/ckpt_conversion/utils/hf_shape.py: Added QWEN3_NEXT_HF_WEIGHTS_TO_SHAPE to calculate expected HF tensor shapes for validation.
end_to_end/tpu/qwen/next/...:
- Updated 1_test_qwen3_next_80b_a3b.sh to use python3 -m MaxText.utils.ckpt_conversion.to_maxtext instead of the legacy script.
- Added 2_test_qwen3_next_80b_a3b.shfor XLML tests to consume for forward_pass & decode verification.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/469445683

Tests

The commands used to generate the checkpoints themselves: https://paste.googleplex.com/4921565475110912

Will run forward pass logit checker on converted checkpoint from Maxtext -> HF -> Maxtext for scanned and post results here:

Current status:

to_maxtext tests:

hf -> maxtext (scanned): https://paste.googleplex.com/5151438898593792
hf -> maxtext (unscanned): https://paste.googleplex.com/4721564912320512

to_huggingface tests:

Convert scanned & unscanned maxtext checkpoints from previous tests to hf format. Run forward_pass check against new hf checkpoints and existing maxtext checkpoints.

Maxtext (scanned) -> HF: https://paste.googleplex.com/4787924765900800
Maxtext (unscanned) -> HF: https://paste.googleplex.com/5256341314732032

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-01-20T19:28:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

tests/end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/2_test_qwen3_next_80b_a3b.sh

src/maxtext/checkpoint_conversion/utils/hf_shape.py

parambole

I have left a couple of comments. PTAL.

shuningjin

Thanks for adding the model to conversion tool, along with careful logit checks! Left a minor comment.

For future reference, could you also add the conversion commands to the PR description? Would be nice to also add the conversion time in description, if you have it. Thank you!

For test script 2_test_qwen3_next_80b_a3b.sh:

Maybe also add pre-training and finetuning (example). Training was omitted from DS3 as covered by ubench.
Could you test this script and attach log to description?
Thanks for updating the description. Maybe update PR title as well to accurately reflect the change: e.g., add "update test scripts".
Will this be added to XLML in the other repo?

src/MaxText/utils/ckpt_conversion/utils/param_mapping.py

end_to_end/tpu/qwen/next/qwen3-next-80b-a3b/1_test_qwen3_next_80b_a3b.sh

src/maxtext/checkpoint_conversion/utils/param_mapping.py

src/maxtext/trainers/pre_train/train.py

shuningjin

Thanks for the great work! LGTM

Lower the decoding length to 128 oinstead of 512 update test script Add manual calcs for each hf shape instead of hardcoded values Run pylint Fix config values and remove unused vars Fix /configs path in script Fix decode path in script Update model ReadMe Update scripts to use new train path Reset qwen3 test files to match main Undo the temp fix to get training working Update reshape function to what other models use remove whitespaces

Rohan-Bierneni · 2026-02-21T02:16:38Z

Manually adding pull_ready tag since linter issues are from untouched files.

Rohan-Bierneni requested review from NicoGrande, RissyRan, bvandermoon, gagika, gobbleturk, hengtaoguo, jiangjy1982, parambole, richjames0, shralex, shuningjin and suexu1025 as code owners January 20, 2026 19:15

Rohan-Bierneni mentioned this pull request Jan 20, 2026

Migrate Qwen3 Next to Checkpoint Util #2972

Closed

4 tasks

Rohan-Bierneni requested review from A9isha, NuojCheng, SurbhiJainUSC, aireenmei, jesselu-google, khatwanimohit and vipannalla as code owners January 21, 2026 16:57

Rohan-Bierneni force-pushed the rbierneni-qwen3next-chkpt-util branch from 8454259 to 866ee4e Compare January 22, 2026 22:43

Rohan-Bierneni requested a review from jacoguzo as a code owner January 22, 2026 22:45

Rohan-Bierneni self-assigned this Jan 22, 2026