Add `allenai/OLMoE-1B-7B-0924`. #718

joelburget · 2024-09-09T00:23:09Z

Add `allenai/OLMoE-1B-7B-0924`

This is a new MoE model which I'd like to use with TL. Notes:

transformers hasn't released a version with OlMoE support yet. We can update pyproject.toml to point to it instead of github once it's released. Will leave as a draft until then.
There are three features which I haven't implemented yet. You can see traces where they're commented-out in the code. I don't plan on using them and am inclined to not include them for now.
- router_aux_loss_coef / router_z_loss_coef: I don't plan on training OLMoE in TL so there's no need for these coefficients.
- norm_topk_prob defaults to False in transformers and I don't plan to use it.

Commenting-out `add_bos_token=True`

This is a temporary fix. When running without either location commented out:

joel@simplex ~/c/g/TransformerLens (add-olmoe) [1]> python3 test.py
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 3/3 [00:47<00:00, 15.95s/it]
Traceback (most recent call last):
  File "/Users/joel/code/github/TransformerLens/test.py", line 4, in <module>
    model = transformer_lens.HookedTransformer.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joel/code/github/TransformerLens/transformer_lens/HookedTransformer.py", line 1300, in from_pretrained
    model = cls(
            ^^^^
  File "/Users/joel/code/github/TransformerLens/transformer_lens/HookedTransformer.py", line 146, in __init__
    AutoTokenizer.from_pretrained(
  File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 901, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2214, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2448, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py", line 123, in __init__
    self.update_post_processor()
  File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py", line 159, in update_post_processor
    raise ValueError("add_bos_token = True but bos_token = None")
ValueError: add_bos_token = True but bos_token = None

Commenting out the location mentioned in the stack trace (HookedTransformer.py:146):

joel@simplex ~/c/g/TransformerLens (add-olmoe) [1]> python3 test.py
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 3/3 [00:46<00:00, 15.33s/it]
Traceback (most recent call last):
  File "/Users/joel/code/github/TransformerLens/test.py", line 4, in <module>
    model = transformer_lens.HookedTransformer.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joel/code/github/TransformerLens/transformer_lens/HookedTransformer.py", line 1300, in from_pretrained
    model = cls(
            ^^^^
  File "/Users/joel/code/github/TransformerLens/transformer_lens/HookedTransformer.py", line 145, in __init__
    self.set_tokenizer(
  File "/Users/joel/code/github/TransformerLens/transformer_lens/HookedTransformer.py", line 677, in set_tokenizer
    tokenizer_with_bos = utils.get_tokenizer_with_bos(tokenizer)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joel/code/github/TransformerLens/transformer_lens/utils.py", line 1172, in get_tokenizer_with_bos
    tokenizer_with_bos = AutoTokenizer.from_pretrained(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 901, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2214, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2448, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py", line 123, in __init__
    self.update_post_processor()
  File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py", line 159, in update_post_processor
    raise ValueError("add_bos_token = True but bos_token = None")
ValueError: add_bos_token = True but bos_token = None

I'd appreciate advice on what's going wrong here. I'm a bit confused because I didn't change anything related to bos tokens (and e.g. the call to AutoTokenizer.from_pretrained in HookedTransformer always specifies add_bos_token=True but never bos_token).

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

joelburget · 2024-09-09T15:16:27Z

It looks like Compatibility Checks (3.9) failed because of incompatible numpy versions.

bryce13950 · 2024-09-10T16:26:00Z

There are a lot of issues in this pr due to dependency bumping. None of that has anything to do with what has been done here, but there are general issues at the moment with dependency versions. I started messing with it in this PR. In order to add these models officially, we probably need to get that resolved first. I will prioritize it a bit further up the line in order to allow you to finish what you are doing.

jonasrohw · 2024-12-12T09:11:34Z

@joelburget I am working on https://github.com/jonasrohw/TransformerLens/tree/OLMo; I think your MoE is very similar. I found the issue you were facing: the tokenizer is called again after tokenizer_with_bos = utils.get_tokenizer_with_bos(tokenizer). Maybe you can merge your MoE implementation into this code? I am looking at OLMo-v2 now, and then we could ship it all together. WDYT?

joelburget · 2024-12-13T01:13:45Z

Hey @jonasrohw, thanks for looping me in. Your code looks much more complete than mine, so I want to make sure I understand the bit that you're suggesting we merge in (and how). The two things this implementation has that yours doesn't:

The change in transformer_lens/components/mlps/moe.py
Disabling add_bos_token in a few places.

Are you suggesting I merge my transformer_lens/components/mlps/moe.py into your branch?

jonasrohw · 2024-12-13T09:23:34Z

@joelburget Exactly. You can also conditionally add the MoE weights import into the Olmo file. You could include your model names, etc., in the preloading with the exact model configurations for MoE.

Originally from TransformerLensOrg#718.

joelburget · 2024-12-14T05:23:34Z

Thanks @jonasrohw. I opened jonasrohw#1. I still need to finish the one TODO and do testing but I can hopefully finish this weekend.

joelburget · 2024-12-17T00:13:03Z

Closing this because #816

Originally from #718.

* added and tested: OLMo-1B,OLMo-7B * fixed: numpy do not do a major upgrade! * fixed: dimensions of 7b to be correct * tested: Loading checkpoints & model variations * Reimplement OLMoE changes. Originally from #718. * Implement TODO (norm_topk_prob) * Disable bos token for OLMoE. * Add q and k norm. * Correct normalization type for OLMoE. * ran formatting * tmp update for olmo2 * Fix: Olmo2 uses normalization after the attention/mlp * ran format * fixed some type issues * OLMo 2 RMS * OLMo 2 RMS * Tested Instruct models * fix: Olmo2DecoderLayer type issues * fix type assertions for attention * chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility * fix: sort imports in olmo2.py * docs: update Colab notebook for OLMo models * added and tested: OLMo-1B,OLMo-7B * fixed: dimensions of 7b to be correct * tested: Loading checkpoints & model variations * Reimplement OLMoE changes. Originally from #718. * Implement TODO (norm_topk_prob) * Disable bos token for OLMoE. * Add q and k norm. * Correct normalization type for OLMoE. * ran formatting * tmp update for olmo2 * Fix: Olmo2 uses normalization after the attention/mlp * ran format * fixed some type issues * OLMo 2 RMS * OLMo 2 RMS * Tested Instruct models * fix: Olmo2DecoderLayer type issues * fix type assertions for attention * chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility * fix: sort imports in olmo2.py * docs: update Colab notebook for OLMo models * Adjust error message to improve testing * conflict resolution * Updating lock * Fixed formatting, update error messages to properly test * more formatting * fixing type error * fix format error * Fix type issues * Fix type issues * Fix format issues * Fix format issues again * Fix format issues for black * another attempt at black formatting * Fix format issues for black again * Retyping the blocks in HookedTransformer and HookedEncoder * undo modulelist typing * Improve type checking in test_detect_head_with_invalid_head_name * removing unused import * Fixing Patchscopes_Generation_Demo.ipynb * Fixing the rest of the notebooks * Fixing the more notebooks * run_line_magic * BERT ipynb fix * Trying to fix the BERT set_grad cell * more set_grad cell fixes * Updated after rebase to fix missing 3.x changes * Updating OLMo PR to work with v3.x * Format fix * fix model ordering --------- Co-authored-by: Jonas Rohweder <[email protected]> Co-authored-by: Jonas Rohweder <[email protected]> Co-authored-by: Joel Burget <[email protected]> Co-authored-by: Jonas Rohw <[email protected]> Co-authored-by: Bryce Meyer <[email protected]> Co-authored-by: Jay Zhou <[email protected]> Co-authored-by: jleechung <[email protected]> Co-authored-by: Jonah Larson <[email protected]>

Add olmoe.

74204c3

joelburget and others added 3 commits September 9, 2024 08:17

Fix make check-format.

ca14d4f

Add olmoe variants.

f677d9e

ran format

17393a2

Neelectric mentioned this pull request Dec 13, 2024

[Proposal] Compatibility for OLMo and OLMo2? #804

Open

1 task

joelburget added a commit to joelburget/TransformerLens that referenced this pull request Dec 14, 2024

Reimplement OLMoE changes.

f0a0a68

Originally from TransformerLensOrg#718.

joelburget mentioned this pull request Dec 14, 2024

Add OLMoE jonasrohw/TransformerLens#1

Merged

jonasrohw mentioned this pull request Dec 15, 2024

Added OLMo(E) v1 #816

Open

6 tasks

joelburget closed this Dec 17, 2024

jlarson4 pushed a commit that referenced this pull request Jan 16, 2026

Reimplement OLMoE changes.

6b1b4d6

Originally from #718.

jlarson4 pushed a commit that referenced this pull request Jan 16, 2026

Reimplement OLMoE changes.

89dc6df

Originally from #718.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `allenai/OLMoE-1B-7B-0924`. #718

Add `allenai/OLMoE-1B-7B-0924`. #718

Uh oh!

joelburget commented Sep 9, 2024

Uh oh!

joelburget commented Sep 9, 2024

Uh oh!

bryce13950 commented Sep 10, 2024

Uh oh!

jonasrohw commented Dec 12, 2024

Uh oh!

joelburget commented Dec 13, 2024

Uh oh!

jonasrohw commented Dec 13, 2024

Uh oh!

joelburget commented Dec 14, 2024

Uh oh!

joelburget commented Dec 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add allenai/OLMoE-1B-7B-0924. #718

Add allenai/OLMoE-1B-7B-0924. #718

Uh oh!

Conversation

joelburget commented Sep 9, 2024

Add allenai/OLMoE-1B-7B-0924

Commenting-out add_bos_token=True

Type of change

Checklist:

Uh oh!

joelburget commented Sep 9, 2024

Uh oh!

bryce13950 commented Sep 10, 2024

Uh oh!

jonasrohw commented Dec 12, 2024

Uh oh!

joelburget commented Dec 13, 2024

Uh oh!

jonasrohw commented Dec 13, 2024

Uh oh!

joelburget commented Dec 14, 2024

Uh oh!

joelburget commented Dec 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add `allenai/OLMoE-1B-7B-0924`. #718

Add `allenai/OLMoE-1B-7B-0924`. #718

Add `allenai/OLMoE-1B-7B-0924`

Commenting-out `add_bos_token=True`