[LLM] Simplify IFEval reward aggregator#3543
[LLM] Simplify IFEval reward aggregator#3543vmoens wants to merge 2 commits intogh/vmoens/235/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3543
Note: Links to docs will display an error until the docs builds have been completed. ❌ 7 New Failures, 1 Unrelated FailureAs of commit 9e257f9 with merge base 4e2e787 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 79.4161μs | 78.3957μs | 12.7558 KOps/s | 12.4687 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1367ms | 0.1356ms | 7.3763 KOps/s | 7.2662 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 98.7174ms | 98.2088ms | 10.1824 Ops/s | 9.8596 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.4856μs | 2.4827μs | 402.7859 KOps/s | 401.7830 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 36.3587μs | 36.2400μs | 27.5938 KOps/s | 26.6103 KOps/s | |
| test_simple | 0.7740s | 0.7729s | 1.2939 Ops/s | 1.2522 Ops/s | |
| test_transformed | 1.3640s | 1.3617s | 0.7344 Ops/s | 0.7244 Ops/s | |
| test_serial | 2.2954s | 2.2708s | 0.4404 Ops/s | 0.4353 Ops/s | |
| test_parallel | 1.8988s | 1.8083s | 0.5530 Ops/s | 0.5640 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.4357ms | 40.0488μs | 24.9695 KOps/s | 25.1486 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 50.1110μs | 22.1486μs | 45.1496 KOps/s | 44.5849 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 50.2610μs | 22.6755μs | 44.1004 KOps/s | 44.0618 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 35.1210μs | 12.3015μs | 81.2909 KOps/s | 80.1024 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 80.9920μs | 42.8149μs | 23.3564 KOps/s | 22.7971 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 58.4710μs | 24.3549μs | 41.0595 KOps/s | 40.6015 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 90.5220μs | 25.1667μs | 39.7350 KOps/s | 37.6309 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 38.9910μs | 14.8114μs | 67.5154 KOps/s | 66.4719 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 70.9510μs | 44.9178μs | 22.2629 KOps/s | 22.3180 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 52.3710μs | 27.3528μs | 36.5594 KOps/s | 36.7846 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 56.5510μs | 24.9803μs | 40.0315 KOps/s | 39.0160 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 40.0710μs | 14.9966μs | 66.6817 KOps/s | 66.7574 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 0.1150ms | 47.5348μs | 21.0372 KOps/s | 20.4732 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 57.7210μs | 29.5447μs | 33.8470 KOps/s | 33.5018 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 56.9820μs | 28.6873μs | 34.8586 KOps/s | 34.5437 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 46.3120μs | 17.5721μs | 56.9083 KOps/s | 57.0869 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 92.4220μs | 45.6986μs | 21.8825 KOps/s | 21.6403 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 99.7420μs | 27.2553μs | 36.6902 KOps/s | 37.3508 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.5318ms | 29.4916μs | 33.9080 KOps/s | 34.6697 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 47.6320μs | 16.5035μs | 60.5930 KOps/s | 60.9044 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 0.1201ms | 47.7249μs | 20.9534 KOps/s | 20.8714 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 59.9110μs | 29.2546μs | 34.1827 KOps/s | 34.2042 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 62.1520μs | 30.7934μs | 32.4745 KOps/s | 31.9181 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 88.6620μs | 18.5744μs | 53.8375 KOps/s | 52.8615 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 78.7020μs | 49.6323μs | 20.1482 KOps/s | 19.6326 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 61.0820μs | 31.6975μs | 31.5482 KOps/s | 31.0240 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 62.9020μs | 30.7688μs | 32.5005 KOps/s | 32.5083 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 44.9410μs | 18.7394μs | 53.3635 KOps/s | 53.6137 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 84.6120μs | 51.9155μs | 19.2621 KOps/s | 18.4770 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 65.9520μs | 34.3502μs | 29.1119 KOps/s | 28.7049 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 63.3610μs | 32.9557μs | 30.3438 KOps/s | 30.5624 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 52.2810μs | 21.3311μs | 46.8798 KOps/s | 47.1476 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.7085s | 0.7035s | 1.4214 Ops/s | 1.3637 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.6932s | 0.5932s | 1.6856 Ops/s | 1.6725 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.6937s | 1.6085s | 0.6217 Ops/s | 0.6156 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.4715s | 1.3895s | 0.7197 Ops/s | 0.7152 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 1.9383s | 1.8534s | 0.5396 Ops/s | 0.5366 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.7160s | 1.6331s | 0.6123 Ops/s | 0.6092 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.6561s | 4.5546s | 0.2196 Ops/s | 0.2216 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.5280s | 4.4029s | 0.2271 Ops/s | 0.2307 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.9092s | 1.8243s | 0.5481 Ops/s | 0.5410 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.6965s | 1.5689s | 0.6374 Ops/s | 0.6461 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 21.2148ms | 20.2914ms | 49.2820 Ops/s | 50.2526 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 0.1316s | 3.5510ms | 281.6099 Ops/s | 277.7337 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.1087ms | 82.6262μs | 12.1027 KOps/s | 12.2500 KOps/s | |
| test_values[td1_return_estimate-False-False] | 48.3465ms | 47.9644ms | 20.8488 Ops/s | 21.2523 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 1.3297ms | 1.0842ms | 922.3679 Ops/s | 926.8046 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 84.5313ms | 79.6488ms | 12.5551 Ops/s | 12.9728 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 1.2843ms | 1.0865ms | 920.3838 Ops/s | 929.8435 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 21.6776ms | 20.3754ms | 49.0789 Ops/s | 50.4853 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.0345ms | 0.7544ms | 1.3256 KOps/s | 1.3306 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7199ms | 0.6737ms | 1.4843 KOps/s | 1.5008 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.6154ms | 1.5157ms | 659.7434 Ops/s | 672.8061 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.7413ms | 0.6861ms | 1.4575 KOps/s | 1.4635 KOps/s | |
| test_dqn_speed[False-None] | 1.8510ms | 1.5251ms | 655.6768 Ops/s | 675.7994 Ops/s | |
| test_dqn_speed[False-backward] | 2.3108ms | 2.1616ms | 462.6150 Ops/s | 471.0057 Ops/s | |
| test_dqn_speed[True-None] | 0.7391ms | 0.5800ms | 1.7243 KOps/s | 1.7304 KOps/s | |
| test_dqn_speed[True-backward] | 1.2893ms | 1.2396ms | 806.6809 Ops/s | 811.2419 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.6698ms | 0.5990ms | 1.6695 KOps/s | 1.6416 KOps/s | |
| test_ddpg_speed[False-None] | 3.2004ms | 2.8311ms | 353.2175 Ops/s | 356.7675 Ops/s | |
| test_ddpg_speed[False-backward] | 4.7088ms | 4.2793ms | 233.6821 Ops/s | 237.7335 Ops/s | |
| test_ddpg_speed[True-None] | 1.5105ms | 1.3779ms | 725.7543 Ops/s | 730.0601 Ops/s | |
| test_ddpg_speed[True-backward] | 2.6084ms | 2.5639ms | 390.0251 Ops/s | 391.4092 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.4780ms | 1.4026ms | 712.9500 Ops/s | 720.9210 Ops/s | |
| test_sac_speed[False-None] | 8.9594ms | 8.3185ms | 120.2137 Ops/s | 120.8201 Ops/s | |
| test_sac_speed[False-backward] | 11.9718ms | 11.5836ms | 86.3287 Ops/s | 86.9959 Ops/s | |
| test_sac_speed[True-None] | 2.2423ms | 1.9034ms | 525.3734 Ops/s | 534.7726 Ops/s | |
| test_sac_speed[True-backward] | 3.7403ms | 3.6919ms | 270.8619 Ops/s | 270.3296 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 16.1150ms | 9.8145ms | 101.8904 Ops/s | 100.6279 Ops/s | |
| test_redq_deprec_speed[False-None] | 10.2274ms | 9.3746ms | 106.6713 Ops/s | 107.6482 Ops/s | |
| test_redq_deprec_speed[False-backward] | 13.5442ms | 12.6806ms | 78.8604 Ops/s | 78.8616 Ops/s | |
| test_redq_deprec_speed[True-None] | 2.7738ms | 2.6084ms | 383.3837 Ops/s | 385.1620 Ops/s | |
| test_redq_deprec_speed[True-backward] | 4.6043ms | 4.1382ms | 241.6522 Ops/s | 243.3308 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 14.5219ms | 9.5308ms | 104.9232 Ops/s | 104.7432 Ops/s | |
| test_td3_speed[False-None] | 8.4606ms | 8.1829ms | 122.2066 Ops/s | 122.2809 Ops/s | |
| test_td3_speed[False-backward] | 11.1052ms | 10.6061ms | 94.2858 Ops/s | 47.0589 Ops/s | |
| test_td3_speed[True-None] | 1.7716ms | 1.7315ms | 577.5494 Ops/s | 603.0313 Ops/s | |
| test_td3_speed[True-backward] | 3.1275ms | 3.0743ms | 325.2804 Ops/s | 324.7876 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 83.9646ms | 25.2956ms | 39.5326 Ops/s | 40.2036 Ops/s | |
| test_cql_speed[False-None] | 17.7078ms | 17.3144ms | 57.7554 Ops/s | 58.3759 Ops/s | |
| test_cql_speed[False-backward] | 23.0525ms | 22.5591ms | 44.3281 Ops/s | 44.7404 Ops/s | |
| test_cql_speed[True-None] | 3.5006ms | 3.3778ms | 296.0538 Ops/s | 295.9705 Ops/s | |
| test_cql_speed[True-backward] | 6.1164ms | 5.7125ms | 175.0561 Ops/s | 176.3982 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 0.8407s | 17.1205ms | 58.4096 Ops/s | 83.8052 Ops/s | |
| test_a2c_speed[False-None] | 3.5178ms | 3.2244ms | 310.1362 Ops/s | 312.3408 Ops/s | |
| test_a2c_speed[False-backward] | 6.8597ms | 6.3734ms | 156.9026 Ops/s | 164.6208 Ops/s | |
| test_a2c_speed[True-None] | 1.4652ms | 1.3665ms | 731.8232 Ops/s | 736.4737 Ops/s | |
| test_a2c_speed[True-backward] | 3.2587ms | 3.2073ms | 311.7891 Ops/s | 310.1845 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 1.0874ms | 1.0127ms | 987.4830 Ops/s | 981.2232 Ops/s | |
| test_ppo_speed[False-None] | 3.9359ms | 3.8250ms | 261.4349 Ops/s | 260.6932 Ops/s | |
| test_ppo_speed[False-backward] | 7.7762ms | 7.1812ms | 139.2523 Ops/s | 137.5615 Ops/s | |
| test_ppo_speed[True-None] | 1.6153ms | 1.4743ms | 678.2817 Ops/s | 674.7192 Ops/s | |
| test_ppo_speed[True-backward] | 3.4181ms | 3.1731ms | 315.1494 Ops/s | 296.6266 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 1.1843ms | 1.0556ms | 947.3296 Ops/s | 920.5002 Ops/s | |
| test_reinforce_speed[False-None] | 2.3781ms | 2.2552ms | 443.4160 Ops/s | 444.8851 Ops/s | |
| test_reinforce_speed[False-backward] | 3.5124ms | 3.4328ms | 291.3036 Ops/s | 295.1485 Ops/s | |
| test_reinforce_speed[True-None] | 1.4663ms | 1.3229ms | 755.9339 Ops/s | 749.6711 Ops/s | |
| test_reinforce_speed[True-backward] | 3.2249ms | 3.1387ms | 318.6060 Ops/s | 316.9226 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 15.5045ms | 8.8124ms | 113.4763 Ops/s | 112.5877 Ops/s | |
| test_iql_speed[False-None] | 10.4761ms | 9.4648ms | 105.6545 Ops/s | 106.5425 Ops/s | |
| test_iql_speed[False-backward] | 13.6199ms | 13.2270ms | 75.6028 Ops/s | 74.8526 Ops/s | |
| test_iql_speed[True-None] | 2.3947ms | 2.2598ms | 442.5235 Ops/s | 442.7527 Ops/s | |
| test_iql_speed[True-backward] | 5.0307ms | 4.9322ms | 202.7503 Ops/s | 199.9765 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 15.8369ms | 9.8739ms | 101.2767 Ops/s | 100.8535 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.1729ms | 5.7848ms | 172.8676 Ops/s | 170.8776 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.0111ms | 0.3247ms | 3.0799 KOps/s | 3.1161 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5928ms | 0.3329ms | 3.0035 KOps/s | 3.1993 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.8512ms | 5.6032ms | 178.4686 Ops/s | 177.0096 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.7925ms | 0.3182ms | 3.1428 KOps/s | 2.6235 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6141ms | 0.3045ms | 3.2843 KOps/s | 2.7487 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.4818ms | 1.2594ms | 794.0514 Ops/s | 680.7101 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.3905ms | 1.1733ms | 852.2686 Ops/s | 735.5898 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 12.7205ms | 5.8718ms | 170.3061 Ops/s | 172.2861 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.9350ms | 0.4714ms | 2.1212 KOps/s | 1.8659 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8572ms | 0.4697ms | 2.1289 KOps/s | 2.0630 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.7032ms | 5.5686ms | 179.5790 Ops/s | 176.4551 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.7087ms | 0.2866ms | 3.4894 KOps/s | 2.7059 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5490ms | 0.2702ms | 3.7012 KOps/s | 3.0478 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.7338ms | 5.5534ms | 180.0687 Ops/s | 179.0451 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.9388ms | 0.2969ms | 3.3681 KOps/s | 2.6214 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.4984ms | 0.2674ms | 3.7397 KOps/s | 2.7985 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.1365ms | 5.7133ms | 175.0290 Ops/s | 170.7726 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.8500ms | 0.4450ms | 2.2471 KOps/s | 1.9590 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7174ms | 0.4529ms | 2.2081 KOps/s | 2.0388 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.9614s | 24.1059ms | 41.4836 Ops/s | 197.7809 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 10.0424ms | 1.9293ms | 518.3312 Ops/s | 561.9199 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 53.4649ms | 2.1944ms | 455.6991 Ops/s | 1.0199 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 6.8405ms | 4.9959ms | 200.1641 Ops/s | 194.6609 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 3.9555ms | 1.7909ms | 558.3807 Ops/s | 538.9639 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 1.1359ms | 0.9355ms | 1.0689 KOps/s | 1.0345 KOps/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 7.5341ms | 5.1905ms | 192.6609 Ops/s | 45.4567 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 11.1464ms | 2.1214ms | 471.3947 Ops/s | 502.5052 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 3.7016ms | 1.1892ms | 840.8935 Ops/s | 873.0440 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 39.9043ms | 37.6180ms | 26.5830 Ops/s | 25.7439 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 19.4268ms | 17.7521ms | 56.3312 Ops/s | 54.0038 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 42.6200ms | 39.3359ms | 25.4221 Ops/s | 25.0367 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 20.0884ms | 18.2338ms | 54.8432 Ops/s | 53.9529 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 42.3557ms | 40.7805ms | 24.5215 Ops/s | 23.9928 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 20.9460ms | 19.6006ms | 51.0189 Ops/s | 49.3514 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.9106ms | 0.2147ms | 4.6586 KOps/s | 4.4323 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.6520ms | 1.3948ms | 716.9575 Ops/s | 726.3995 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.6140ms | 2.3450ms | 426.4445 Ops/s | 426.0980 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.1539ms | 2.9467ms | 339.3646 Ops/s | 341.8142 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.2405ms | 0.1623ms | 6.1600 KOps/s | 6.0737 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3837ms | 0.2223ms | 4.4980 KOps/s | 4.5067 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 2.0592ms | 1.8273ms | 547.2517 Ops/s | 529.7240 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.7188ms | 1.4409ms | 693.9905 Ops/s | 718.6203 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.4259ms | 1.1246ms | 889.2136 Ops/s | 889.2022 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 7.5560ms | 3.5742ms | 279.7868 Ops/s | 268.7438 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 11.2727ms | 5.8084ms | 172.1641 Ops/s | 173.1760 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.7009ms | 7.2431ms | 138.0631 Ops/s | 142.7606 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4382ms | 0.2736ms | 3.6556 KOps/s | 3.7019 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.7430ms | 1.5095ms | 662.4526 Ops/s | 683.1131 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.8390ms | 2.5075ms | 398.8016 Ops/s | 408.9205 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.5478ms | 3.1529ms | 317.1664 Ops/s | 319.8388 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 33.2977ms | 32.3662ms | 30.8964 Ops/s | 31.1529 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 63.2353ms | 62.8162ms | 15.9194 Ops/s | 15.7511 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 37.1945ms | 36.4602ms | 27.4272 Ops/s | 26.9839 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 94.7857ms | 75.6199ms | 13.2240 Ops/s | 13.9985 Ops/s | |
| test_collector_without_rb_cuda[100-img_shape0-atari] | 55.2475ms | 54.5147ms | 18.3437 Ops/s | 18.4519 Ops/s | |
| test_collector_without_rb_cuda[200-img_shape1-large_batch] | 0.1087s | 0.1084s | 9.2263 Ops/s | 9.2742 Ops/s | |
| test_collector_with_rb_cuda[100-img_shape0-atari] | 56.5716ms | 56.2815ms | 17.7678 Ops/s | 17.8741 Ops/s | |
| test_collector_with_rb_cuda[200-img_shape1-large_batch] | 0.1127s | 0.1123s | 8.9068 Ops/s | 8.9366 Ops/s |
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
Replace the complex tiered multiplicative reward (structure multiplier, quality bonus thresholds, complexity scaling) with a simple weighted average of IFEval metrics plus a small additive format bonus. The new reward is: weighted_avg(strict/loose metrics) + format_bonus, where format_bonus is 0.1 for a single answer block and 0.05 for a single think block. Reward range: ~[0, 1.15]. Made-with: Cursor ghstack-source-id: 559c504 Pull-Request: #3543 ghstack-source-id: 559c504 Pull Request resolved: #3569
Stack from ghstack (oldest at bottom):
Replace the complex tiered multiplicative reward (structure multiplier,
quality bonus thresholds, complexity scaling) with a simple weighted
average of IFEval metrics plus a small additive format bonus.
The new reward is: weighted_avg(strict/loose metrics) + format_bonus,
where format_bonus is 0.1 for a single answer block and 0.05 for a
single think block. Reward range: ~[0, 1.15].
Made-with: Cursor