-
Notifications
You must be signed in to change notification settings - Fork 509
Add allenai/OLMoE-1B-7B-0924.
#718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It looks like Compatibility Checks (3.9) failed because of incompatible numpy versions. |
|
There are a lot of issues in this pr due to dependency bumping. None of that has anything to do with what has been done here, but there are general issues at the moment with dependency versions. I started messing with it in this PR. In order to add these models officially, we probably need to get that resolved first. I will prioritize it a bit further up the line in order to allow you to finish what you are doing. |
|
@joelburget I am working on https://github.com/jonasrohw/TransformerLens/tree/OLMo; I think your MoE is very similar. I found the issue you were facing: the tokenizer is called again after |
|
Hey @jonasrohw, thanks for looping me in. Your code looks much more complete than mine, so I want to make sure I understand the bit that you're suggesting we merge in (and how). The two things this implementation has that yours doesn't:
Are you suggesting I merge my |
|
@joelburget Exactly. You can also conditionally add the MoE weights import into the Olmo file. You could include your model names, etc., in the preloading with the exact model configurations for MoE. |
Originally from TransformerLensOrg#718.
|
Thanks @jonasrohw. I opened jonasrohw#1. I still need to finish the one TODO and do testing but I can hopefully finish this weekend. |
|
Closing this because #816 |
* added and tested: OLMo-1B,OLMo-7B * fixed: numpy do not do a major upgrade! * fixed: dimensions of 7b to be correct * tested: Loading checkpoints & model variations * Reimplement OLMoE changes. Originally from #718. * Implement TODO (norm_topk_prob) * Disable bos token for OLMoE. * Add q and k norm. * Correct normalization type for OLMoE. * ran formatting * tmp update for olmo2 * Fix: Olmo2 uses normalization after the attention/mlp * ran format * fixed some type issues * OLMo 2 RMS * OLMo 2 RMS * Tested Instruct models * fix: Olmo2DecoderLayer type issues * fix type assertions for attention * chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility * fix: sort imports in olmo2.py * docs: update Colab notebook for OLMo models * added and tested: OLMo-1B,OLMo-7B * fixed: dimensions of 7b to be correct * tested: Loading checkpoints & model variations * Reimplement OLMoE changes. Originally from #718. * Implement TODO (norm_topk_prob) * Disable bos token for OLMoE. * Add q and k norm. * Correct normalization type for OLMoE. * ran formatting * tmp update for olmo2 * Fix: Olmo2 uses normalization after the attention/mlp * ran format * fixed some type issues * OLMo 2 RMS * OLMo 2 RMS * Tested Instruct models * fix: Olmo2DecoderLayer type issues * fix type assertions for attention * chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility * fix: sort imports in olmo2.py * docs: update Colab notebook for OLMo models * Adjust error message to improve testing * conflict resolution * Updating lock * Fixed formatting, update error messages to properly test * more formatting * fixing type error * fix format error * Fix type issues * Fix type issues * Fix format issues * Fix format issues again * Fix format issues for black * another attempt at black formatting * Fix format issues for black again * Retyping the blocks in HookedTransformer and HookedEncoder * undo modulelist typing * Improve type checking in test_detect_head_with_invalid_head_name * removing unused import * Fixing Patchscopes_Generation_Demo.ipynb * Fixing the rest of the notebooks * Fixing the more notebooks * run_line_magic * BERT ipynb fix * Trying to fix the BERT set_grad cell * more set_grad cell fixes * Updated after rebase to fix missing 3.x changes * Updating OLMo PR to work with v3.x * Format fix * fix model ordering --------- Co-authored-by: Jonas Rohweder <[email protected]> Co-authored-by: Jonas Rohweder <[email protected]> Co-authored-by: Joel Burget <[email protected]> Co-authored-by: Jonas Rohw <[email protected]> Co-authored-by: Bryce Meyer <[email protected]> Co-authored-by: Jay Zhou <[email protected]> Co-authored-by: jleechung <[email protected]> Co-authored-by: Jonah Larson <[email protected]>
Add
allenai/OLMoE-1B-7B-0924This is a new MoE model which I'd like to use with TL. Notes:
transformershasn't released a version with OlMoE support yet. We can updatepyproject.tomlto point to it instead of github once it's released. Will leave as a draft until then.router_aux_loss_coef/router_z_loss_coef: I don't plan on training OLMoE in TL so there's no need for these coefficients.norm_topk_probdefaults toFalseintransformersand I don't plan to use it.Commenting-out
add_bos_token=TrueThis is a temporary fix. When running without either location commented out:
Commenting out the location mentioned in the stack trace (
HookedTransformer.py:146):I'd appreciate advice on what's going wrong here. I'm a bit confused because I didn't change anything related to bos tokens (and e.g. the call to
AutoTokenizer.from_pretrainedinHookedTransformeralways specifiesadd_bos_token=Truebut neverbos_token).Type of change
Checklist: