Move RL code from src/MaxText/rl/ to src/maxtext/trainers/post_train/rl/ by A9isha · Pull Request #3180 · AI-Hypercomputer/maxtext

A9isha · 2026-02-18T15:01:00Z

Description

Migrate RL training code to the new package structure following the same pattern as the SFT (PR #2988) and distillation moves. Old location files are replaced with backward-compatibility shims that delegate to the new modules with deprecation warnings.

Also, fixing the Jupyter notebook run test but disabling sft_qwen3_demo.ipynb for further investigation

Tests

Locally ran RL using the following commands:

## new
python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml   model_name=llama3.1-8b   t
okenizer_path=meta-llama/Llama-3.1-8B-Instruct   load_parameters_path=/path/to/checkpoint   run_name=maz-8b-$RANDOM   bas
e_output_directory=/path/to/storage   hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4


## old
python3 -m src.MaxText.rl.train_rl src/maxtext/configs/post_train/rl.yml   model_name=llama3.1-8b   t
okenizer_path=meta-llama/Llama-3.1-8B-Instruct   load_parameters_path=/path/to/checkpoint   run_name=maz-8b-$RANDOM   bas
e_output_directory=/path/to/storage   hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-02-18T15:22:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

bvandermoon

For the manual tests you ran, could you also try with the old commands? Not as critical as the train.py shims since RL is a newer feature, but still good to test them since they are added here

A9isha · 2026-02-18T21:56:06Z

Done testing with the old command and updated the description - thanks @bvandermoon

bvandermoon · 2026-02-19T00:19:05Z

Done testing with the old command and updated the description - thanks @bvandermoon

Thanks @A9isha. Just to double check, can you confirm you saw all logs as expected with the old command? For train.py, I needed to set logging.set_verbosity(logging.INFO) to see the standard completed step output logged

…ost_train/rl/ Imported from GitHub PR #3180 # Description Migrate RL training code to the new package structure following the same pattern as the SFT (PR #2988) and distillation moves. Old location files are replaced with backward-compatibility shims that delegate to the new modules with deprecation warnings. Also, fixing the Jupyter notebook run test but disabling `sft_qwen3_demo.ipynb` for further investigation # Tests Locally ran RL using the following commands: ``` ## new python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml model_name=llama3.1-8b t okenizer_path=meta-llama/Llama-3.1-8B-Instruct load_parameters_path=/path/to/checkpoint run_name=maz-8b-$RANDOM bas e_output_directory=/path/to/storage hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4 ## old python3 -m src.MaxText.rl.train_rl src/maxtext/configs/post_train/rl.yml model_name=llama3.1-8b t okenizer_path=meta-llama/Llama-3.1-8B-Instruct load_parameters_path=/path/to/checkpoint run_name=maz-8b-$RANDOM bas e_output_directory=/path/to/storage hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4 ``` # Checklist Before submitting this PR, please make sure (put X in square brackets): - [X] I have performed a self-review of my code. For an optional AI review, add the `gemini-review` label. - [X] I have necessary comments in my code, particularly in hard-to-understand areas. - [X] I have run end-to-end tests tests and provided workload links above if applicable. - [X] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in [our documentation](https://maxtext.readthedocs.io/en/latest/development.html#adding-new-documentation-files). Copybara import of the project: -- e99b7ec by A9isha <mazumdera@google.com>: Move RL code from src/MaxText/rl/ to src/maxtext/trainers/post_train/rl/ Merging this change closes #3180 FUTURE_COPYBARA_INTEGRATE_REVIEW=#3180 from AI-Hypercomputer:anisha-rl-refactor e99b7ec PiperOrigin-RevId: 872533178

A9isha requested review from NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jacoguzo, jesselu-google, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners February 18, 2026 15:01

bvandermoon approved these changes Feb 18, 2026

View reviewed changes

hengtaoguo approved these changes Feb 18, 2026

View reviewed changes

bvandermoon approved these changes Feb 18, 2026

View reviewed changes

A9isha added the pull ready label Feb 18, 2026

A9isha force-pushed the anisha-rl-refactor branch 2 times, most recently from 81e5072 to 8918852 Compare February 18, 2026 22:38

Move RL code from src/MaxText/rl/ to src/maxtext/trainers/post_train/rl/

e99b7ec

A9isha force-pushed the anisha-rl-refactor branch from 8918852 to e99b7ec Compare February 19, 2026 20:38

A9isha requested a review from parambole as a code owner February 19, 2026 20:38

copybara-service bot merged commit 5f1717b into main Feb 20, 2026
26 checks passed

copybara-service bot deleted the anisha-rl-refactor branch February 20, 2026 00:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move RL code from src/MaxText/rl/ to src/maxtext/trainers/post_train/rl/#3180

Move RL code from src/MaxText/rl/ to src/maxtext/trainers/post_train/rl/#3180
copybara-service[bot] merged 1 commit intomainfrom
anisha-rl-refactor

A9isha commented Feb 18, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 18, 2026

Uh oh!

bvandermoon left a comment

Uh oh!

A9isha commented Feb 18, 2026

Uh oh!

bvandermoon commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

A9isha commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov bot commented Feb 18, 2026

Codecov Report

Uh oh!

bvandermoon left a comment

Choose a reason for hiding this comment

Uh oh!

A9isha commented Feb 18, 2026

Uh oh!

bvandermoon commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

A9isha commented Feb 18, 2026 •

edited

Loading