Conversation
e3f6dab to
699e4fb
Compare
|
🤖 Hi @RissyRan, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
📋 Review Summary
This pull request introduces an excellent and comprehensive guide for community contributors on bringing up new models in Maxtext. The document is well-structured, clear, and covers the entire workflow from architecture analysis to verification.
🔍 General Feedback
- The guide is very detailed and will be a great resource for the community.
- I've pointed out a few minor typos and a suggestion to update a link to point to the
mainbranch instead of a specific commit. - Overall, this is a great addition to the documentation.
699e4fb to
3b21727
Compare
3b21727 to
89092f9
Compare
| ## 1. Architecture Analysis | ||
|
|
||
| The first phase involves determining how the new model's architecture aligns with MaxText's existing capabilities. | ||
|
|
There was a problem hiding this comment.
Similar to how you mentioned for the data pipeline, maybe provide a link to the sources of those files, or a README that mentions or has the latest state of available features?
|
|
||
| This step can be bypassed if the current MaxText codebase already supports all components required for your model architecture. However, if your model introduces unique logic or requires specific code refactoring, these modifications should be completed before you begin converting checkpoints. | ||
|
|
||
| **Sharding**: MaxText supports both auto and explicit sharding modes and provides dedicated sharding functions. We recommend developers use MaxText-specific sharding functions, such as `MaxText.sharding.maybe_shard_with_name`, instead of default JAX sharding hint like `jax.lax.with_sharding_constraint` for better performance. |
There was a problem hiding this comment.
This can be a different section?
There was a problem hiding this comment.
I slightly prefer to keep this as a note for now, or simply remove it to keep this bringup focused on functionality. We have this as a side note in the internal guide. HDYT?
|
|
||
| - [ ] Create a user guide and post an announcement in the MaxText repo. | ||
|
|
||
| ## Community Q&A (FAQ) |
There was a problem hiding this comment.
Should we also mention how to compile the model (which can be done on a CPU) to avoid errors, such as those related to sharding, later on?
There was a problem hiding this comment.
Good point. Let me add it in both internal and external user guide for alignment.
Description
Create a community version for model bringup user guide, based on internal doc. b/471279823.
Please note, there is a TODO
[scripts](TODO)depends on #3139Tests
Manually check
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.