Skip to content

Comments

Add Qwen3-Next to checkpoint util & update test scripts#2973

Open
Rohan-Bierneni wants to merge 1 commit intomainfrom
rbierneni-qwen3next-chkpt-util
Open

Add Qwen3-Next to checkpoint util & update test scripts#2973
Rohan-Bierneni wants to merge 1 commit intomainfrom
rbierneni-qwen3next-chkpt-util

Conversation

@Rohan-Bierneni
Copy link
Collaborator

@Rohan-Bierneni Rohan-Bierneni commented Jan 20, 2026

Description

[Ckpt Conversion] Support Qwen3-Next in Unified Checkpoint Conversion Utility

This PR migrates the Qwen3-Next (qwen3-next-80b-a3b) from standalone conversion scripts to the centralized MaxText.utils.ckpt_conversion` library.

Previously, Qwen3-Next relied on ad-hoc scripts for checkpointing. Moving this to the unified utility enables:

  1. Bidirectional Conversion: Robust support for converting both HF -> MaxText and MaxText -> HF.
  2. Scanned & Unscanned Support: Native handling of scanned layers (optimized for training) and unscanned layers (optimized for decoding/inference).
  3. Maintainability: Centralizes logic for the hybrid attention architecture (interleaved Linear and Full attention layers) within the standard mapping infrastructure.

Changes

  • src/MaxText/utils/ckpt_conversion/utils/hf_model_configs.py: Added qwen3_next_80b_a3b_config using transformers.Qwen3NextConfig and registered it in HF_MODEL_CONFIGS.
  • src/MaxText/utils/ckpt_conversion/utils/param_mapping.py:
    • Implemented QWEN3_NEXT_MAXTEXT_TO_HF_PARAM_MAPPING: Handles the inhomogeneous layer cycle (mapping Full Attention vs. Linear/Hybrid Attention blocks based on layer index) and MoE components (Shared vs. Routed experts).
    • Implemented QWEN3_NEXT_MAXTEXT_TO_HF_PARAM_HOOK_FN with robust tensor handling:
      • Correct Transposition for Scanned 1D Tensors: Added specific handling (using identity hooks) for 1D parameters like A_log (shape [1]). This ensures that the scan axis is correctly handled during conversion (e.g., transforming to [1, 12] where appropriate) rather than incorrectly collapsing to [1,].
      • Preservation of Singleton Dimensions: Implemented permute_conv to correctly handle conv1d kernels (HF: [C, 1, K] <-> MT: [K, 1, C]). This prevents dimensions with value 1 from being incorrectly squeezed or flattened during the permutation process.
  • src/MaxText/utils/ckpt_conversion/utils/hf_shape.py: Added QWEN3_NEXT_HF_WEIGHTS_TO_SHAPE to calculate expected HF tensor shapes for validation.
  • end_to_end/tpu/qwen/next/...:
    • Updated 1_test_qwen3_next_80b_a3b.sh to use python3 -m MaxText.utils.ckpt_conversion.to_maxtext instead of the legacy script.
    • Added 2_test_qwen3_next_80b_a3b.shfor XLML tests to consume for forward_pass & decode verification.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/469445683

Tests

The commands used to generate the checkpoints themselves: https://paste.googleplex.com/4921565475110912

Will run forward pass logit checker on converted checkpoint from Maxtext -> HF -> Maxtext for scanned and post results here:

Current status:

to_maxtext tests:

hf -> maxtext (scanned): https://paste.googleplex.com/5151438898593792
hf -> maxtext (unscanned): https://paste.googleplex.com/4721564912320512

to_huggingface tests:

Convert scanned & unscanned maxtext checkpoints from previous tests to hf format. Run forward_pass check against new hf checkpoints and existing maxtext checkpoints.

Maxtext (scanned) -> HF: https://paste.googleplex.com/4787924765900800
Maxtext (unscanned) -> HF: https://paste.googleplex.com/5256341314732032

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link

codecov bot commented Jan 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Collaborator

@parambole parambole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left a couple of comments. PTAL.

Copy link
Collaborator

@shuningjin shuningjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the model to conversion tool, along with careful logit checks! Left a minor comment.

  • For future reference, could you also add the conversion commands to the PR description? Would be nice to also add the conversion time in description, if you have it. Thank you!

For test script 2_test_qwen3_next_80b_a3b.sh:

  • Maybe also add pre-training and finetuning (example). Training was omitted from DS3 as covered by ubench.
  • Could you test this script and attach log to description?
  • Thanks for updating the description. Maybe update PR title as well to accurately reflect the change: e.g., add "update test scripts".
  • Will this be added to XLML in the other repo?

@Rohan-Bierneni Rohan-Bierneni changed the title Add Qwen3-Next to checkpoint util Add Qwen3-Next to checkpoint util & update test scripts Jan 29, 2026
@Rohan-Bierneni Rohan-Bierneni force-pushed the rbierneni-qwen3next-chkpt-util branch 2 times, most recently from 4fd58f5 to a6f97ae Compare January 30, 2026 18:58
@Rohan-Bierneni Rohan-Bierneni force-pushed the rbierneni-qwen3next-chkpt-util branch 2 times, most recently from 8be00bd to fa0242d Compare February 20, 2026 20:21
@Rohan-Bierneni Rohan-Bierneni force-pushed the rbierneni-qwen3next-chkpt-util branch from be3d986 to b668efa Compare February 20, 2026 21:19
Copy link
Collaborator

@shuningjin shuningjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work! LGTM

@Rohan-Bierneni Rohan-Bierneni force-pushed the rbierneni-qwen3next-chkpt-util branch from b668efa to aea63ab Compare February 20, 2026 23:24
Lower the decoding length to 128 oinstead of 512

update test script

Add manual calcs for each hf shape instead of hardcoded values

Run pylint

Fix config values and remove unused vars

Fix /configs path in script

Fix decode path in script

Update model ReadMe

Update scripts to use new train path

Reset qwen3 test files to match main

Undo the temp fix to get training working

Update reshape function to what other models use

remove whitespaces
@Rohan-Bierneni Rohan-Bierneni force-pushed the rbierneni-qwen3next-chkpt-util branch from 3cf0b4e to f75c7e2 Compare February 20, 2026 23:31
@Rohan-Bierneni
Copy link
Collaborator Author

Manually adding pull_ready tag since linter issues are from untouched files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants