Skip to content

Move RL code from src/MaxText/rl/ to src/maxtext/trainers/post_train/rl/#3180

Merged
copybara-service[bot] merged 1 commit intomainfrom
anisha-rl-refactor
Feb 20, 2026
Merged

Move RL code from src/MaxText/rl/ to src/maxtext/trainers/post_train/rl/#3180
copybara-service[bot] merged 1 commit intomainfrom
anisha-rl-refactor

Conversation

@A9isha
Copy link
Collaborator

@A9isha A9isha commented Feb 18, 2026

Description

Migrate RL training code to the new package structure following the same pattern as the SFT (PR #2988) and distillation moves. Old location files are replaced with backward-compatibility shims that delegate to the new modules with deprecation warnings.

Also, fixing the Jupyter notebook run test but disabling sft_qwen3_demo.ipynb for further investigation

Tests

Locally ran RL using the following commands:

## new
python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml   model_name=llama3.1-8b   t
okenizer_path=meta-llama/Llama-3.1-8B-Instruct   load_parameters_path=/path/to/checkpoint   run_name=maz-8b-$RANDOM   bas
e_output_directory=/path/to/storage   hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4


## old
python3 -m src.MaxText.rl.train_rl src/maxtext/configs/post_train/rl.yml   model_name=llama3.1-8b   t
okenizer_path=meta-llama/Llama-3.1-8B-Instruct   load_parameters_path=/path/to/checkpoint   run_name=maz-8b-$RANDOM   bas
e_output_directory=/path/to/storage   hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link

codecov bot commented Feb 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Collaborator

@bvandermoon bvandermoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the manual tests you ran, could you also try with the old commands? Not as critical as the train.py shims since RL is a newer feature, but still good to test them since they are added here

@A9isha
Copy link
Collaborator Author

A9isha commented Feb 18, 2026

Done testing with the old command and updated the description - thanks @bvandermoon

@A9isha A9isha force-pushed the anisha-rl-refactor branch 2 times, most recently from 81e5072 to 8918852 Compare February 18, 2026 22:38
@bvandermoon
Copy link
Collaborator

Done testing with the old command and updated the description - thanks @bvandermoon

Thanks @A9isha. Just to double check, can you confirm you saw all logs as expected with the old command? For train.py, I needed to set logging.set_verbosity(logging.INFO) to see the standard completed step output logged

@A9isha A9isha force-pushed the anisha-rl-refactor branch from 8918852 to e99b7ec Compare February 19, 2026 20:38
@A9isha A9isha requested a review from parambole as a code owner February 19, 2026 20:38
A9isha added a commit that referenced this pull request Feb 19, 2026
…ost_train/rl/

Imported from GitHub PR #3180

# Description

Migrate RL training code to the new package structure following the same pattern as the SFT (PR #2988) and distillation moves. Old location files are replaced with backward-compatibility shims that delegate to the new modules with deprecation warnings.

Also, fixing the Jupyter notebook run test but disabling `sft_qwen3_demo.ipynb` for further investigation
# Tests

Locally ran RL using the following commands:

```
## new
python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml   model_name=llama3.1-8b   t
okenizer_path=meta-llama/Llama-3.1-8B-Instruct   load_parameters_path=/path/to/checkpoint   run_name=maz-8b-$RANDOM   bas
e_output_directory=/path/to/storage   hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4

## old
python3 -m src.MaxText.rl.train_rl src/maxtext/configs/post_train/rl.yml   model_name=llama3.1-8b   t
okenizer_path=meta-llama/Llama-3.1-8B-Instruct   load_parameters_path=/path/to/checkpoint   run_name=maz-8b-$RANDOM   bas
e_output_directory=/path/to/storage   hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4

```

# Checklist

Before submitting this PR, please make sure (put X in square brackets):
- [X] I have performed a self-review of my code. For an optional AI review, add the `gemini-review` label.
- [X] I have necessary comments in my code, particularly in hard-to-understand areas.
- [X] I have run end-to-end tests tests and provided workload links above if applicable.
- [X] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in [our documentation](https://maxtext.readthedocs.io/en/latest/development.html#adding-new-documentation-files).

Copybara import of the project:

--
e99b7ec by A9isha <mazumdera@google.com>:

Move RL code from src/MaxText/rl/ to src/maxtext/trainers/post_train/rl/

Merging this change closes #3180

FUTURE_COPYBARA_INTEGRATE_REVIEW=#3180 from AI-Hypercomputer:anisha-rl-refactor e99b7ec
PiperOrigin-RevId: 872533178
A9isha added a commit that referenced this pull request Feb 19, 2026
…ost_train/rl/

Imported from GitHub PR #3180

# Description

Migrate RL training code to the new package structure following the same pattern as the SFT (PR #2988) and distillation moves. Old location files are replaced with backward-compatibility shims that delegate to the new modules with deprecation warnings.

Also, fixing the Jupyter notebook run test but disabling `sft_qwen3_demo.ipynb` for further investigation
# Tests

Locally ran RL using the following commands:

```
## new
python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml   model_name=llama3.1-8b   t
okenizer_path=meta-llama/Llama-3.1-8B-Instruct   load_parameters_path=/path/to/checkpoint   run_name=maz-8b-$RANDOM   bas
e_output_directory=/path/to/storage   hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4

## old
python3 -m src.MaxText.rl.train_rl src/maxtext/configs/post_train/rl.yml   model_name=llama3.1-8b   t
okenizer_path=meta-llama/Llama-3.1-8B-Instruct   load_parameters_path=/path/to/checkpoint   run_name=maz-8b-$RANDOM   bas
e_output_directory=/path/to/storage   hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4

```

# Checklist

Before submitting this PR, please make sure (put X in square brackets):
- [X] I have performed a self-review of my code. For an optional AI review, add the `gemini-review` label.
- [X] I have necessary comments in my code, particularly in hard-to-understand areas.
- [X] I have run end-to-end tests tests and provided workload links above if applicable.
- [X] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in [our documentation](https://maxtext.readthedocs.io/en/latest/development.html#adding-new-documentation-files).

Copybara import of the project:

--
e99b7ec by A9isha <mazumdera@google.com>:

Move RL code from src/MaxText/rl/ to src/maxtext/trainers/post_train/rl/

Merging this change closes #3180

FUTURE_COPYBARA_INTEGRATE_REVIEW=#3180 from AI-Hypercomputer:anisha-rl-refactor e99b7ec
PiperOrigin-RevId: 872533178
A9isha added a commit that referenced this pull request Feb 19, 2026
…ost_train/rl/

Imported from GitHub PR #3180

# Description

Migrate RL training code to the new package structure following the same pattern as the SFT (PR #2988) and distillation moves. Old location files are replaced with backward-compatibility shims that delegate to the new modules with deprecation warnings.

Also, fixing the Jupyter notebook run test but disabling `sft_qwen3_demo.ipynb` for further investigation
# Tests

Locally ran RL using the following commands:

```
## new
python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml   model_name=llama3.1-8b   t
okenizer_path=meta-llama/Llama-3.1-8B-Instruct   load_parameters_path=/path/to/checkpoint   run_name=maz-8b-$RANDOM   bas
e_output_directory=/path/to/storage   hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4

## old
python3 -m src.MaxText.rl.train_rl src/maxtext/configs/post_train/rl.yml   model_name=llama3.1-8b   t
okenizer_path=meta-llama/Llama-3.1-8B-Instruct   load_parameters_path=/path/to/checkpoint   run_name=maz-8b-$RANDOM   bas
e_output_directory=/path/to/storage   hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4

```

# Checklist

Before submitting this PR, please make sure (put X in square brackets):
- [X] I have performed a self-review of my code. For an optional AI review, add the `gemini-review` label.
- [X] I have necessary comments in my code, particularly in hard-to-understand areas.
- [X] I have run end-to-end tests tests and provided workload links above if applicable.
- [X] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in [our documentation](https://maxtext.readthedocs.io/en/latest/development.html#adding-new-documentation-files).

Copybara import of the project:

--
e99b7ec by A9isha <mazumdera@google.com>:

Move RL code from src/MaxText/rl/ to src/maxtext/trainers/post_train/rl/

Merging this change closes #3180

FUTURE_COPYBARA_INTEGRATE_REVIEW=#3180 from AI-Hypercomputer:anisha-rl-refactor e99b7ec
PiperOrigin-RevId: 872533178
A9isha added a commit that referenced this pull request Feb 19, 2026
…ost_train/rl/

Imported from GitHub PR #3180

# Description

Migrate RL training code to the new package structure following the same pattern as the SFT (PR #2988) and distillation moves. Old location files are replaced with backward-compatibility shims that delegate to the new modules with deprecation warnings.

Also, fixing the Jupyter notebook run test but disabling `sft_qwen3_demo.ipynb` for further investigation
# Tests

Locally ran RL using the following commands:

```
## new
python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml   model_name=llama3.1-8b   t
okenizer_path=meta-llama/Llama-3.1-8B-Instruct   load_parameters_path=/path/to/checkpoint   run_name=maz-8b-$RANDOM   bas
e_output_directory=/path/to/storage   hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4

## old
python3 -m src.MaxText.rl.train_rl src/maxtext/configs/post_train/rl.yml   model_name=llama3.1-8b   t
okenizer_path=meta-llama/Llama-3.1-8B-Instruct   load_parameters_path=/path/to/checkpoint   run_name=maz-8b-$RANDOM   bas
e_output_directory=/path/to/storage   hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4

```

# Checklist

Before submitting this PR, please make sure (put X in square brackets):
- [X] I have performed a self-review of my code. For an optional AI review, add the `gemini-review` label.
- [X] I have necessary comments in my code, particularly in hard-to-understand areas.
- [X] I have run end-to-end tests tests and provided workload links above if applicable.
- [X] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in [our documentation](https://maxtext.readthedocs.io/en/latest/development.html#adding-new-documentation-files).

Copybara import of the project:

--
e99b7ec by A9isha <mazumdera@google.com>:

Move RL code from src/MaxText/rl/ to src/maxtext/trainers/post_train/rl/

Merging this change closes #3180

FUTURE_COPYBARA_INTEGRATE_REVIEW=#3180 from AI-Hypercomputer:anisha-rl-refactor e99b7ec
PiperOrigin-RevId: 872533178
@copybara-service copybara-service bot merged commit 5f1717b into main Feb 20, 2026
26 checks passed
@copybara-service copybara-service bot deleted the anisha-rl-refactor branch February 20, 2026 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments