Added OLMo(E) v1 #816

jonasrohw · 2024-12-15T09:26:21Z

Description

Adds support for OLMo v1 model family and OLMoE. Transformers>3.40 will let numpy do a major upgrade; pyproject.toml prevents this now.

OLMO v2 will require dropping python3.8 support because the required Transformers version also drops it. It will be added in a separate PR based on TransformerLens 3.
This also completes PR: #718

Fixes # (issue)

[Proposal] Compatibility for OLMo and OLMo2? #804 (partially)

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

Checklist:

I have commented my code, particularly in hard-to-understand areas
[] I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

Originally from TransformerLensOrg#718.

Add OLMoE

Fix to OLMo 2 normalization

taziksh · 2025-09-29T04:50:42Z

Hey @jonasrohw, looks like you've got this feature pretty much ready to go - just seeing type check failures blocking it. I'd be interested in taking a stab at fixing those type issues if you're not actively working on it.

jonasrohw · 2025-10-02T10:19:36Z

@taziksh Yeah, I didn't have time to fix some of the type issues. Go ahead!

taziksh · 2025-10-12T03:00:13Z

@jonasrohw
I've added type checking fixes to complete the OLMo implementation in #1081. Happy to collaborate however works best!

* added and tested: OLMo-1B,OLMo-7B * fixed: numpy do not do a major upgrade! * fixed: dimensions of 7b to be correct * tested: Loading checkpoints & model variations * Reimplement OLMoE changes. Originally from #718. * Implement TODO (norm_topk_prob) * Disable bos token for OLMoE. * Add q and k norm. * Correct normalization type for OLMoE. * ran formatting * tmp update for olmo2 * Fix: Olmo2 uses normalization after the attention/mlp * ran format * fixed some type issues * OLMo 2 RMS * OLMo 2 RMS * Tested Instruct models * fix: Olmo2DecoderLayer type issues * fix type assertions for attention * chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility * fix: sort imports in olmo2.py * docs: update Colab notebook for OLMo models * added and tested: OLMo-1B,OLMo-7B * fixed: dimensions of 7b to be correct * tested: Loading checkpoints & model variations * Reimplement OLMoE changes. Originally from #718. * Implement TODO (norm_topk_prob) * Disable bos token for OLMoE. * Add q and k norm. * Correct normalization type for OLMoE. * ran formatting * tmp update for olmo2 * Fix: Olmo2 uses normalization after the attention/mlp * ran format * fixed some type issues * OLMo 2 RMS * OLMo 2 RMS * Tested Instruct models * fix: Olmo2DecoderLayer type issues * fix type assertions for attention * chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility * fix: sort imports in olmo2.py * docs: update Colab notebook for OLMo models * Adjust error message to improve testing * conflict resolution * Updating lock * Fixed formatting, update error messages to properly test * more formatting * fixing type error * fix format error * Fix type issues * Fix type issues * Fix format issues * Fix format issues again * Fix format issues for black * another attempt at black formatting * Fix format issues for black again * Retyping the blocks in HookedTransformer and HookedEncoder * undo modulelist typing * Improve type checking in test_detect_head_with_invalid_head_name * removing unused import * Fixing Patchscopes_Generation_Demo.ipynb * Fixing the rest of the notebooks * Fixing the more notebooks * run_line_magic * BERT ipynb fix * Trying to fix the BERT set_grad cell * more set_grad cell fixes * Updated after rebase to fix missing 3.x changes * Updating OLMo PR to work with v3.x * Format fix * fix model ordering --------- Co-authored-by: Jonas Rohweder <[email protected]> Co-authored-by: Jonas Rohweder <[email protected]> Co-authored-by: Joel Burget <[email protected]> Co-authored-by: Jonas Rohw <[email protected]> Co-authored-by: Bryce Meyer <[email protected]> Co-authored-by: Jay Zhou <[email protected]> Co-authored-by: jleechung <[email protected]> Co-authored-by: Jonah Larson <[email protected]>

jonasrohw and others added 12 commits December 12, 2024 09:58

added and tested: OLMo-1B,OLMo-7B

1fe4d04

fixed: numpy do not do a major upgrade!

0f3e3b3

fixed: dimensions of 7b to be correct

3a101f4

tested: Loading checkpoints & model variations

1b34ccd

Reimplement OLMoE changes.

f0a0a68

Originally from TransformerLensOrg#718.

Implement TODO (norm_topk_prob)

8c094e5

Disable bos token for OLMoE.

7565c06

Add q and k norm.

04cd309

Correct normalization type for OLMoE.

68d6961

Merge pull request #1 from joelburget/olmoe

9afd032

Add OLMoE

Merge branch 'dev' into OLMo

96c1fbb

ran formatting

72fb903

joelburget mentioned this pull request Dec 17, 2024

Add allenai/OLMoE-1B-7B-0924. #718

Closed

7 tasks

bryce13950 and others added 8 commits February 5, 2025 00:27

Merge branch 'dev' into OLMo

9d3a85e

Merge branch 'dev' into OLMo

d4519b2

tmp update for olmo2

064310f

Fix: Olmo2 uses normalization after the attention/mlp

b1fd04b

Merge branch 'dev' into OLMo

871ba03

ran format

7939e8d

fixed some type issues

97fd1e7

Merge branch 'dev' into OLMo

9032fe7

bryce13950 added the pr-typing-issues label Jun 24, 2025

jleechung and others added 4 commits July 22, 2025 18:23

OLMo 2 RMS

39703c4

OLMo 2 RMS

1c283c1

Tested Instruct models

688a421

Merge pull request #3 from jleechung/OLMo

9febc5c

Fix to OLMo 2 normalization

taziksh mentioned this pull request Oct 12, 2025

Complete type checking for OLMo support (builds on #816) #1081

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added OLMo(E) v1 #816

Added OLMo(E) v1 #816

Uh oh!

jonasrohw commented Dec 15, 2024

Uh oh!

taziksh commented Sep 29, 2025

Uh oh!

jonasrohw commented Oct 2, 2025

Uh oh!

taziksh commented Oct 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Added OLMo(E) v1 #816

Are you sure you want to change the base?

Added OLMo(E) v1 #816

Uh oh!

Conversation

jonasrohw commented Dec 15, 2024

Description

Type of change

Checklist:

Uh oh!

taziksh commented Sep 29, 2025

Uh oh!

jonasrohw commented Oct 2, 2025

Uh oh!

taziksh commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

taziksh commented Oct 12, 2025 •

edited

Loading