You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -22,18 +47,20 @@ To get started, you first need a MaxText-compatible checkpoint.
22
47
2. **Convert the Checkpoint**: Run the `convert_qwen3_next_scanned.py` script to convert the downloaded Hugging Face weights into the Orbax format required by MaxText.
scan_layers=true \ # Set to false for unscanned checkpoint
55
+
use_multimodal=false
29
56
```
30
57
31
58
* * * * *
32
59
33
-
Pre-training and Fine-tuning
60
+
Fine-tuning
34
61
----------------------------
35
62
36
-
After converting the checkpoint, you can use it for fine-tuning or start a pre-training run from scratch. The command below is an example for fine-tuning on a v5p-512 slice. To pre-train, simply remove the `load_parameters_path` argument.
63
+
After converting the checkpoint, you can use it for fine-tuning. The command below is an example for fine-tuning on a v5p-64 slice.
prompt="An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and outputs are all vectors. The output is " \
104
+
scan_layers=False
52
105
```
53
106
54
107
* * * * *
55
108
56
109
Correctness Validation
57
110
----------------------
58
111
59
-
To verify that the MaxText implementation is numerically equivalent to the original Hugging Face model, you can run the end-to-end test scripts. These scripts automate the logit comparison test for each model.
112
+
we perform two primary checks:
60
113
61
-
Before running, you must set the `MAXTEXT_CHECKPOINT_PATH` environment variable. You can also optionally set `HF_MODEL_PATH` to point to a local copy of the Hugging Face model.
114
+
***Logit Comparison**: We compare the logits generated by our implementation against those from a HuggingFace implementation for a set of given prompts.
115
+
***MMLU Score Validation**: We validate the MMLU score against established benchmarks.
62
116
63
-
### Qwen3-Next-80B-A3B
64
-
65
-
Bash
117
+
One example command to generate golden logits from HuggingFace for Qwen3-Next:
To run MMLU benchmarks and validate the model's performance, follow the instructions provided [here](../../../benchmarks/api_server/README.md).
156
+
79
157
## Supported MoE Strategies
80
158
81
159
This model implementation supports both **Token Dropping** and **Dropless** strategies for Mixture of Experts routing. Take a look at the MaxText [documentation](https://github.com/AI-Hypercomputer/maxtext/blob/main/docs/reference/core_concepts/moe_configuration.md) on MoE configs and flags to set based on desired strategy.
0 commit comments