-
Notifications
You must be signed in to change notification settings - Fork 509
Added OLMo(E) v1 #816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Added OLMo(E) v1 #816
Conversation
Originally from TransformerLensOrg#718.
Add OLMoE
Fix to OLMo 2 normalization
|
Hey @jonasrohw, looks like you've got this feature pretty much ready to go - just seeing type check failures blocking it. I'd be interested in taking a stab at fixing those type issues if you're not actively working on it. |
|
@taziksh Yeah, I didn't have time to fix some of the type issues. Go ahead! |
|
@jonasrohw
|
* added and tested: OLMo-1B,OLMo-7B * fixed: numpy do not do a major upgrade! * fixed: dimensions of 7b to be correct * tested: Loading checkpoints & model variations * Reimplement OLMoE changes. Originally from #718. * Implement TODO (norm_topk_prob) * Disable bos token for OLMoE. * Add q and k norm. * Correct normalization type for OLMoE. * ran formatting * tmp update for olmo2 * Fix: Olmo2 uses normalization after the attention/mlp * ran format * fixed some type issues * OLMo 2 RMS * OLMo 2 RMS * Tested Instruct models * fix: Olmo2DecoderLayer type issues * fix type assertions for attention * chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility * fix: sort imports in olmo2.py * docs: update Colab notebook for OLMo models * added and tested: OLMo-1B,OLMo-7B * fixed: dimensions of 7b to be correct * tested: Loading checkpoints & model variations * Reimplement OLMoE changes. Originally from #718. * Implement TODO (norm_topk_prob) * Disable bos token for OLMoE. * Add q and k norm. * Correct normalization type for OLMoE. * ran formatting * tmp update for olmo2 * Fix: Olmo2 uses normalization after the attention/mlp * ran format * fixed some type issues * OLMo 2 RMS * OLMo 2 RMS * Tested Instruct models * fix: Olmo2DecoderLayer type issues * fix type assertions for attention * chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility * fix: sort imports in olmo2.py * docs: update Colab notebook for OLMo models * Adjust error message to improve testing * conflict resolution * Updating lock * Fixed formatting, update error messages to properly test * more formatting * fixing type error * fix format error * Fix type issues * Fix type issues * Fix format issues * Fix format issues again * Fix format issues for black * another attempt at black formatting * Fix format issues for black again * Retyping the blocks in HookedTransformer and HookedEncoder * undo modulelist typing * Improve type checking in test_detect_head_with_invalid_head_name * removing unused import * Fixing Patchscopes_Generation_Demo.ipynb * Fixing the rest of the notebooks * Fixing the more notebooks * run_line_magic * BERT ipynb fix * Trying to fix the BERT set_grad cell * more set_grad cell fixes * Updated after rebase to fix missing 3.x changes * Updating OLMo PR to work with v3.x * Format fix * fix model ordering --------- Co-authored-by: Jonas Rohweder <[email protected]> Co-authored-by: Jonas Rohweder <[email protected]> Co-authored-by: Joel Burget <[email protected]> Co-authored-by: Jonas Rohw <[email protected]> Co-authored-by: Bryce Meyer <[email protected]> Co-authored-by: Jay Zhou <[email protected]> Co-authored-by: jleechung <[email protected]> Co-authored-by: Jonah Larson <[email protected]>

Description
Adds support for OLMo v1 model family and OLMoE.
Transformers>3.40will letnumpydo a major upgrade;pyproject.tomlprevents this now.OLMO v2 will require dropping python3.8 support because the required
Transformersversion also drops it. It will be added in a separate PR based on TransformerLens 3.This also completes PR: #718
Fixes # (issue)
Type of change
Please delete options that are not relevant.
Checklist: