Skip to content

Conversation

@jonasrohw
Copy link
Contributor

Description

Adds support for OLMo v1 model family and OLMoE. Transformers>3.40 will let numpy do a major upgrade; pyproject.toml prevents this now.

OLMO v2 will require dropping python3.8 support because the required Transformers version also drops it. It will be added in a separate PR based on TransformerLens 3.
This also completes PR: #718

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • [] I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@joelburget joelburget mentioned this pull request Dec 17, 2024
7 tasks
@taziksh
Copy link

taziksh commented Sep 29, 2025

Hey @jonasrohw, looks like you've got this feature pretty much ready to go - just seeing type check failures blocking it. I'd be interested in taking a stab at fixing those type issues if you're not actively working on it.

@jonasrohw
Copy link
Contributor Author

@taziksh Yeah, I didn't have time to fix some of the type issues. Go ahead!

@taziksh
Copy link

taziksh commented Oct 12, 2025

@jonasrohw
I've added type checking fixes to complete the OLMo implementation in #1081. Happy to collaborate however works best!

Screenshot 2025-10-11 at 7 59 52 PM

jlarson4 added a commit that referenced this pull request Feb 13, 2026
* added and tested: OLMo-1B,OLMo-7B

* fixed: numpy do not do a major upgrade!

* fixed: dimensions of 7b to be correct

* tested: Loading checkpoints & model variations

* Reimplement OLMoE changes.

Originally from #718.

* Implement TODO (norm_topk_prob)

* Disable bos token for OLMoE.

* Add q and k norm.

* Correct normalization type for OLMoE.

* ran formatting

* tmp update for olmo2

* Fix: Olmo2 uses normalization after the attention/mlp

* ran format

* fixed some type issues

* OLMo 2 RMS

* OLMo 2 RMS

* Tested Instruct models

* fix: Olmo2DecoderLayer type issues

* fix type assertions for attention

* chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility

* fix: sort imports in olmo2.py

* docs: update Colab notebook for OLMo models

* added and tested: OLMo-1B,OLMo-7B

* fixed: dimensions of 7b to be correct

* tested: Loading checkpoints & model variations

* Reimplement OLMoE changes.

Originally from #718.

* Implement TODO (norm_topk_prob)

* Disable bos token for OLMoE.

* Add q and k norm.

* Correct normalization type for OLMoE.

* ran formatting

* tmp update for olmo2

* Fix: Olmo2 uses normalization after the attention/mlp

* ran format

* fixed some type issues

* OLMo 2 RMS

* OLMo 2 RMS

* Tested Instruct models

* fix: Olmo2DecoderLayer type issues

* fix type assertions for attention

* chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility

* fix: sort imports in olmo2.py

* docs: update Colab notebook for OLMo models

* Adjust error message to improve testing

* conflict resolution

* Updating lock

* Fixed formatting, update error messages to properly test

* more formatting

* fixing type error

* fix format error

* Fix type issues

* Fix type issues

* Fix format issues

* Fix format issues again

* Fix format issues for black

* another attempt at black formatting

* Fix format issues for black again

* Retyping the blocks in HookedTransformer and HookedEncoder

* undo modulelist typing

* Improve type checking in test_detect_head_with_invalid_head_name

* removing unused import

* Fixing Patchscopes_Generation_Demo.ipynb

* Fixing the rest of the notebooks

* Fixing the more notebooks

* run_line_magic

* BERT ipynb fix

* Trying to fix the BERT set_grad cell

* more set_grad cell fixes

* Updated after rebase to fix missing 3.x changes

* Updating OLMo PR to work with v3.x

* Format fix

* fix model ordering

---------

Co-authored-by: Jonas Rohweder <[email protected]>
Co-authored-by: Jonas Rohweder <[email protected]>
Co-authored-by: Joel Burget <[email protected]>
Co-authored-by: Jonas Rohw <[email protected]>
Co-authored-by: Bryce Meyer <[email protected]>
Co-authored-by: Jay Zhou <[email protected]>
Co-authored-by: jleechung <[email protected]>
Co-authored-by: Jonah Larson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants