Skip to content

Comments

[BugFix] Fix shape mismatch in _set_index_in_td with trailing dims of 1#3517

Merged
vmoens merged 2 commits intomainfrom
fix-shape-pettingzoo
Feb 18, 2026
Merged

[BugFix] Fix shape mismatch in _set_index_in_td with trailing dims of 1#3517
vmoens merged 2 commits intomainfrom
fix-shape-pettingzoo

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Feb 18, 2026

Summary

  • Fix 1 (gh#3515): Fixes _set_index_in_td in TensorDictReplayBuffer where the numel()-based loop matched the wrong number of batch dimensions when trailing dimensions had size 1 (e.g. single-agent PettingZoo environments). Reverses the loop direction to iterate from the highest dim downward, preferring the most complete match.
  • Fix 2: Fixes _propagate_to_nested_keys in StepCounter where expand_as failed when propagating root-level truncated/done signals to nested agent-level keys in MARL environments. The parent tensor has fewer dimensions than the nested tensor (missing agent dims), so we unsqueeze before expanding.
  • Adds a regression test in TestPettingZoo for the replay buffer single-agent scenario.

Fixes #3515

Test plan

  • pytest test/test_libs.py::TestPettingZoo::test_single_agent_group_replay_buffer -xvs passes locally
  • pytest test/test_libs.py::TestPettingZoo::test_reset_parallel_env -xvs passes locally (was failing on main)
  • Verified replay buffer fix works for single-agent (n=1), multi-agent (n=3), and prefix-dim scenarios

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 18, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3517

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Pending, 1 Unrelated Failure

As of commit 4234329 with merge base 83c2101 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 18, 2026
@github-actions github-actions bot added Environments Adds or modifies an environment wrapper ReplayBuffers BugFix labels Feb 18, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 18, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.1180μs 80.2640μs 12.4589 KOps/s 11.9250 KOps/s $\color{#35bf28}+4.48\%$
test_tensor_to_bytestream_speed[torch.save] 0.1379ms 0.1377ms 7.2643 KOps/s 7.0735 KOps/s $\color{#35bf28}+2.70\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1127s 0.1122s 8.9092 Ops/s 9.0313 Ops/s $\color{#d91a1a}-1.35\%$
test_tensor_to_bytestream_speed[numpy] 2.8102μs 2.7968μs 357.5554 KOps/s 381.1284 KOps/s $\textbf{\color{#d91a1a}-6.19\%}$
test_tensor_to_bytestream_speed[safetensors] 38.1903μs 36.8784μs 27.1162 KOps/s 25.8671 KOps/s $\color{#35bf28}+4.83\%$
test_simple 0.5469s 0.5463s 1.8306 Ops/s 1.7507 Ops/s $\color{#35bf28}+4.56\%$
test_transformed 1.0970s 1.0956s 0.9128 Ops/s 0.8984 Ops/s $\color{#35bf28}+1.60\%$
test_serial 1.6834s 1.6776s 0.5961 Ops/s 0.5919 Ops/s $\color{#35bf28}+0.70\%$
test_parallel 1.0343s 1.0259s 0.9748 Ops/s 0.9820 Ops/s $\color{#d91a1a}-0.74\%$
test_step_mdp_speed[True-True-True-True-True] 0.1655ms 41.6536μs 24.0075 KOps/s 24.3510 KOps/s $\color{#d91a1a}-1.41\%$
test_step_mdp_speed[True-True-True-True-False] 98.4320μs 23.6329μs 42.3140 KOps/s 42.6466 KOps/s $\color{#d91a1a}-0.78\%$
test_step_mdp_speed[True-True-True-False-True] 63.6010μs 23.7748μs 42.0613 KOps/s 42.2237 KOps/s $\color{#d91a1a}-0.38\%$
test_step_mdp_speed[True-True-True-False-False] 43.3910μs 13.0582μs 76.5802 KOps/s 76.1427 KOps/s $\color{#35bf28}+0.57\%$
test_step_mdp_speed[True-True-False-True-True] 90.9720μs 45.1923μs 22.1277 KOps/s 22.2969 KOps/s $\color{#d91a1a}-0.76\%$
test_step_mdp_speed[True-True-False-True-False] 57.4910μs 26.0768μs 38.3483 KOps/s 38.3545 KOps/s $\color{#d91a1a}-0.02\%$
test_step_mdp_speed[True-True-False-False-True] 76.8920μs 26.1997μs 38.1684 KOps/s 38.1991 KOps/s $\color{#d91a1a}-0.08\%$
test_step_mdp_speed[True-True-False-False-False] 41.4800μs 15.7778μs 63.3802 KOps/s 63.6512 KOps/s $\color{#d91a1a}-0.43\%$
test_step_mdp_speed[True-False-True-True-True] 89.0220μs 48.6397μs 20.5594 KOps/s 20.7882 KOps/s $\color{#d91a1a}-1.10\%$
test_step_mdp_speed[True-False-True-True-False] 65.7720μs 29.3540μs 34.0669 KOps/s 34.5894 KOps/s $\color{#d91a1a}-1.51\%$
test_step_mdp_speed[True-False-True-False-True] 58.4910μs 27.0169μs 37.0139 KOps/s 37.6218 KOps/s $\color{#d91a1a}-1.62\%$
test_step_mdp_speed[True-False-True-False-False] 83.9820μs 15.9880μs 62.5469 KOps/s 63.3687 KOps/s $\color{#d91a1a}-1.30\%$
test_step_mdp_speed[True-False-False-True-True] 94.3020μs 51.3370μs 19.4791 KOps/s 20.1624 KOps/s $\color{#d91a1a}-3.39\%$
test_step_mdp_speed[True-False-False-True-False] 62.6010μs 31.9105μs 31.3377 KOps/s 31.5421 KOps/s $\color{#d91a1a}-0.65\%$
test_step_mdp_speed[True-False-False-False-True] 62.5310μs 29.1538μs 34.3008 KOps/s 34.2747 KOps/s $\color{#35bf28}+0.08\%$
test_step_mdp_speed[True-False-False-False-False] 69.3210μs 18.4296μs 54.2605 KOps/s 53.5204 KOps/s $\color{#35bf28}+1.38\%$
test_step_mdp_speed[False-True-True-True-True] 89.4220μs 49.0259μs 20.3974 KOps/s 20.9971 KOps/s $\color{#d91a1a}-2.86\%$
test_step_mdp_speed[False-True-True-True-False] 88.8220μs 29.1374μs 34.3202 KOps/s 34.2957 KOps/s $\color{#35bf28}+0.07\%$
test_step_mdp_speed[False-True-True-False-True] 2.4298ms 30.6616μs 32.6141 KOps/s 33.0163 KOps/s $\color{#d91a1a}-1.22\%$
test_step_mdp_speed[False-True-True-False-False] 52.7010μs 17.6754μs 56.5760 KOps/s 57.1903 KOps/s $\color{#d91a1a}-1.07\%$
test_step_mdp_speed[False-True-False-True-True] 0.1448ms 50.8863μs 19.6516 KOps/s 19.8709 KOps/s $\color{#d91a1a}-1.10\%$
test_step_mdp_speed[False-True-False-True-False] 68.5410μs 31.8194μs 31.4274 KOps/s 31.7016 KOps/s $\color{#d91a1a}-0.87\%$
test_step_mdp_speed[False-True-False-False-True] 85.3810μs 32.3285μs 30.9325 KOps/s 30.2377 KOps/s $\color{#35bf28}+2.30\%$
test_step_mdp_speed[False-True-False-False-False] 47.1410μs 19.9601μs 50.1000 KOps/s 49.7318 KOps/s $\color{#35bf28}+0.74\%$
test_step_mdp_speed[False-False-True-True-True] 87.7910μs 53.2898μs 18.7653 KOps/s 19.0308 KOps/s $\color{#d91a1a}-1.40\%$
test_step_mdp_speed[False-False-True-True-False] 77.4420μs 34.3738μs 29.0920 KOps/s 29.7030 KOps/s $\color{#d91a1a}-2.06\%$
test_step_mdp_speed[False-False-True-False-True] 62.3110μs 33.5885μs 29.7721 KOps/s 31.0152 KOps/s $\color{#d91a1a}-4.01\%$
test_step_mdp_speed[False-False-True-False-False] 81.6010μs 20.2958μs 49.2714 KOps/s 50.2441 KOps/s $\color{#d91a1a}-1.94\%$
test_step_mdp_speed[False-False-False-True-True] 0.1041ms 55.6000μs 17.9856 KOps/s 18.1766 KOps/s $\color{#d91a1a}-1.05\%$
test_step_mdp_speed[False-False-False-True-False] 77.0010μs 36.7685μs 27.1972 KOps/s 27.6036 KOps/s $\color{#d91a1a}-1.47\%$
test_step_mdp_speed[False-False-False-False-True] 80.5410μs 35.1695μs 28.4338 KOps/s 29.3176 KOps/s $\color{#d91a1a}-3.01\%$
test_step_mdp_speed[False-False-False-False-False] 0.1608ms 21.6980μs 46.0872 KOps/s 44.7000 KOps/s $\color{#35bf28}+3.10\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8475s 0.7502s 1.3330 Ops/s 1.3454 Ops/s $\color{#d91a1a}-0.92\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7121s 0.6155s 1.6247 Ops/s 1.6461 Ops/s $\color{#d91a1a}-1.30\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7331s 1.6540s 0.6046 Ops/s 0.6112 Ops/s $\color{#d91a1a}-1.09\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5109s 1.4320s 0.6983 Ops/s 0.7065 Ops/s $\color{#d91a1a}-1.16\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9858s 1.9048s 0.5250 Ops/s 0.5332 Ops/s $\color{#d91a1a}-1.54\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7571s 1.6805s 0.5951 Ops/s 0.6051 Ops/s $\color{#d91a1a}-1.66\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7585s 4.6211s 0.2164 Ops/s 0.2184 Ops/s $\color{#d91a1a}-0.89\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5258s 4.4704s 0.2237 Ops/s 0.2249 Ops/s $\color{#d91a1a}-0.53\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9665s 1.8875s 0.5298 Ops/s 0.5297 Ops/s $\color{#35bf28}+0.02\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6630s 1.5908s 0.6286 Ops/s 0.6284 Ops/s $\color{#35bf28}+0.03\%$
test_values[generalized_advantage_estimate-True-True] 11.3247ms 10.7040ms 93.4227 Ops/s 95.9991 Ops/s $\color{#d91a1a}-2.68\%$
test_values[vec_generalized_advantage_estimate-True-True] 20.4411ms 17.6553ms 56.6403 Ops/s 57.1406 Ops/s $\color{#d91a1a}-0.88\%$
test_values[td0_return_estimate-False-False] 3.3801ms 0.1903ms 5.2562 KOps/s 7.8591 KOps/s $\textbf{\color{#d91a1a}-33.12\%}$
test_values[td1_return_estimate-False-False] 29.9587ms 29.3588ms 34.0614 Ops/s 34.9508 Ops/s $\color{#d91a1a}-2.54\%$
test_values[vec_td1_return_estimate-False-False] 19.0664ms 17.7173ms 56.4422 Ops/s 56.9355 Ops/s $\color{#d91a1a}-0.87\%$
test_values[td_lambda_return_estimate-True-False] 47.6820ms 43.4460ms 23.0171 Ops/s 23.7172 Ops/s $\color{#d91a1a}-2.95\%$
test_values[vec_td_lambda_return_estimate-True-False] 21.4802ms 17.8945ms 55.8830 Ops/s 56.8738 Ops/s $\color{#d91a1a}-1.74\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 10.0846ms 9.4857ms 105.4222 Ops/s 108.1905 Ops/s $\color{#d91a1a}-2.56\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.6640ms 1.5175ms 658.9802 Ops/s 639.3856 Ops/s $\color{#35bf28}+3.06\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.6143ms 0.4384ms 2.2812 KOps/s 2.3232 KOps/s $\color{#d91a1a}-1.81\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 39.4908ms 34.9502ms 28.6121 Ops/s 28.6136 Ops/s $-0.01\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.8550ms 1.7198ms 581.4578 Ops/s 574.4973 Ops/s $\color{#35bf28}+1.21\%$
test_dqn_speed[False-None] 1.5072ms 1.4111ms 708.6522 Ops/s 706.3748 Ops/s $\color{#35bf28}+0.32\%$
test_dqn_speed[False-backward] 1.9770ms 1.9065ms 524.5205 Ops/s 517.5314 Ops/s $\color{#35bf28}+1.35\%$
test_dqn_speed[True-None] 0.8677ms 0.5513ms 1.8139 KOps/s 1.7394 KOps/s $\color{#35bf28}+4.28\%$
test_dqn_speed[True-backward] 1.0419ms 1.0116ms 988.4899 Ops/s 985.0106 Ops/s $\color{#35bf28}+0.35\%$
test_dqn_speed[reduce-overhead-None] 0.8981ms 0.5391ms 1.8548 KOps/s 1.7773 KOps/s $\color{#35bf28}+4.36\%$
test_ddpg_speed[False-None] 3.2175ms 2.8513ms 350.7223 Ops/s 350.6133 Ops/s $\color{#35bf28}+0.03\%$
test_ddpg_speed[False-backward] 4.2128ms 4.0633ms 246.1063 Ops/s 246.4281 Ops/s $\color{#d91a1a}-0.13\%$
test_ddpg_speed[True-None] 1.8505ms 1.4234ms 702.5607 Ops/s 700.6950 Ops/s $\color{#35bf28}+0.27\%$
test_ddpg_speed[True-backward] 2.5500ms 2.4273ms 411.9826 Ops/s 346.0980 Ops/s $\textbf{\color{#35bf28}+19.04\%}$
test_ddpg_speed[reduce-overhead-None] 1.8521ms 1.4224ms 703.0313 Ops/s 698.7293 Ops/s $\color{#35bf28}+0.62\%$
test_sac_speed[False-None] 8.7045ms 8.0917ms 123.5831 Ops/s 125.9258 Ops/s $\color{#d91a1a}-1.86\%$
test_sac_speed[False-backward] 11.7381ms 11.2736ms 88.7027 Ops/s 89.3204 Ops/s $\color{#d91a1a}-0.69\%$
test_sac_speed[True-None] 2.4469ms 2.1977ms 455.0241 Ops/s 451.3992 Ops/s $\color{#35bf28}+0.80\%$
test_sac_speed[True-backward] 4.5041ms 4.1231ms 242.5340 Ops/s 240.4572 Ops/s $\color{#35bf28}+0.86\%$
test_sac_speed[reduce-overhead-None] 2.5826ms 2.1720ms 460.4154 Ops/s 463.3017 Ops/s $\color{#d91a1a}-0.62\%$
test_redq_speed[False-None] 15.3740ms 10.6368ms 94.0129 Ops/s 92.3627 Ops/s $\color{#35bf28}+1.79\%$
test_redq_speed[False-backward] 18.7449ms 17.8764ms 55.9397 Ops/s 56.7937 Ops/s $\color{#d91a1a}-1.50\%$
test_redq_speed[True-None] 4.7856ms 4.3907ms 227.7563 Ops/s 228.4766 Ops/s $\color{#d91a1a}-0.32\%$
test_redq_speed[True-backward] 10.0374ms 9.6769ms 103.3391 Ops/s 101.2055 Ops/s $\color{#35bf28}+2.11\%$
test_redq_speed[reduce-overhead-None] 4.8897ms 4.4767ms 223.3778 Ops/s 231.1975 Ops/s $\color{#d91a1a}-3.38\%$
test_redq_deprec_speed[False-None] 11.6646ms 11.0227ms 90.7216 Ops/s 90.8189 Ops/s $\color{#d91a1a}-0.11\%$
test_redq_deprec_speed[False-backward] 16.2384ms 15.9170ms 62.8258 Ops/s 62.9004 Ops/s $\color{#d91a1a}-0.12\%$
test_redq_deprec_speed[True-None] 4.0146ms 3.7087ms 269.6330 Ops/s 274.0328 Ops/s $\color{#d91a1a}-1.61\%$
test_redq_deprec_speed[True-backward] 7.7963ms 7.5911ms 131.7336 Ops/s 128.0161 Ops/s $\color{#35bf28}+2.90\%$
test_redq_deprec_speed[reduce-overhead-None] 4.1643ms 3.6029ms 277.5561 Ops/s 272.9849 Ops/s $\color{#35bf28}+1.67\%$
test_td3_speed[False-None] 8.2052ms 8.0569ms 124.1166 Ops/s 125.0301 Ops/s $\color{#d91a1a}-0.73\%$
test_td3_speed[False-backward] 11.6066ms 10.9510ms 91.3160 Ops/s 91.5393 Ops/s $\color{#d91a1a}-0.24\%$
test_td3_speed[True-None] 1.9153ms 1.8657ms 535.9960 Ops/s 533.6555 Ops/s $\color{#35bf28}+0.44\%$
test_td3_speed[True-backward] 3.7692ms 3.6056ms 277.3439 Ops/s 229.8275 Ops/s $\textbf{\color{#35bf28}+20.67\%}$
test_td3_speed[reduce-overhead-None] 1.8593ms 1.8131ms 551.5484 Ops/s 542.7861 Ops/s $\color{#35bf28}+1.61\%$
test_cql_speed[False-None] 30.2229ms 26.1206ms 38.2840 Ops/s 39.3171 Ops/s $\color{#d91a1a}-2.63\%$
test_cql_speed[False-backward] 41.4287ms 35.6303ms 28.0660 Ops/s 28.6518 Ops/s $\color{#d91a1a}-2.04\%$
test_cql_speed[True-None] 12.6011ms 12.2281ms 81.7788 Ops/s 81.0431 Ops/s $\color{#35bf28}+0.91\%$
test_cql_speed[True-backward] 18.9038ms 18.3399ms 54.5260 Ops/s 57.6101 Ops/s $\textbf{\color{#d91a1a}-5.35\%}$
test_cql_speed[reduce-overhead-None] 12.7921ms 12.4866ms 80.0858 Ops/s 81.5676 Ops/s $\color{#d91a1a}-1.82\%$
test_a2c_speed[False-None] 6.4327ms 5.3727ms 186.1256 Ops/s 190.4640 Ops/s $\color{#d91a1a}-2.28\%$
test_a2c_speed[False-backward] 12.3973ms 11.7642ms 85.0033 Ops/s 86.2559 Ops/s $\color{#d91a1a}-1.45\%$
test_a2c_speed[True-None] 4.0664ms 3.6956ms 270.5896 Ops/s 260.3530 Ops/s $\color{#35bf28}+3.93\%$
test_a2c_speed[True-backward] 8.7953ms 8.4659ms 118.1204 Ops/s 106.1943 Ops/s $\textbf{\color{#35bf28}+11.23\%}$
test_a2c_speed[reduce-overhead-None] 4.1368ms 3.7231ms 268.5929 Ops/s 268.9129 Ops/s $\color{#d91a1a}-0.12\%$
test_ppo_speed[False-None] 6.3748ms 5.9859ms 167.0592 Ops/s 172.4115 Ops/s $\color{#d91a1a}-3.10\%$
test_ppo_speed[False-backward] 13.0881ms 12.4471ms 80.3401 Ops/s 80.8761 Ops/s $\color{#d91a1a}-0.66\%$
test_ppo_speed[True-None] 4.0086ms 3.6713ms 272.3863 Ops/s 270.3802 Ops/s $\color{#35bf28}+0.74\%$
test_ppo_speed[True-backward] 8.7707ms 8.5101ms 117.5071 Ops/s 116.2149 Ops/s $\color{#35bf28}+1.11\%$
test_ppo_speed[reduce-overhead-None] 3.7635ms 3.5832ms 279.0832 Ops/s 277.2310 Ops/s $\color{#35bf28}+0.67\%$
test_reinforce_speed[False-None] 4.8454ms 4.5276ms 220.8666 Ops/s 221.8172 Ops/s $\color{#d91a1a}-0.43\%$
test_reinforce_speed[False-backward] 7.5129ms 7.2879ms 137.2133 Ops/s 136.6270 Ops/s $\color{#35bf28}+0.43\%$
test_reinforce_speed[True-None] 3.0830ms 2.8973ms 345.1468 Ops/s 340.7662 Ops/s $\color{#35bf28}+1.29\%$
test_reinforce_speed[True-backward] 8.0882ms 7.7473ms 129.0773 Ops/s 128.2242 Ops/s $\color{#35bf28}+0.67\%$
test_reinforce_speed[reduce-overhead-None] 3.0315ms 2.8586ms 349.8231 Ops/s 351.6871 Ops/s $\color{#d91a1a}-0.53\%$
test_iql_speed[False-None] 25.4086ms 20.3832ms 49.0601 Ops/s 49.4623 Ops/s $\color{#d91a1a}-0.81\%$
test_iql_speed[False-backward] 35.8558ms 30.3627ms 32.9352 Ops/s 33.2444 Ops/s $\color{#d91a1a}-0.93\%$
test_iql_speed[True-None] 8.8415ms 8.5740ms 116.6323 Ops/s 113.5378 Ops/s $\color{#35bf28}+2.73\%$
test_iql_speed[True-backward] 17.1183ms 16.7372ms 59.7472 Ops/s 60.4830 Ops/s $\color{#d91a1a}-1.22\%$
test_iql_speed[reduce-overhead-None] 8.7489ms 8.5807ms 116.5402 Ops/s 111.7463 Ops/s $\color{#35bf28}+4.29\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1649ms 6.0271ms 165.9171 Ops/s 163.9184 Ops/s $\color{#35bf28}+1.22\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.5110ms 0.2849ms 3.5105 KOps/s 2.9370 KOps/s $\textbf{\color{#35bf28}+19.53\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4729ms 0.2655ms 3.7664 KOps/s 2.9532 KOps/s $\textbf{\color{#35bf28}+27.53\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0117ms 5.8019ms 172.3587 Ops/s 172.2496 Ops/s $\color{#35bf28}+0.06\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.6783ms 0.2793ms 3.5801 KOps/s 2.8720 KOps/s $\textbf{\color{#35bf28}+24.66\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4861ms 0.2610ms 3.8317 KOps/s 3.0278 KOps/s $\textbf{\color{#35bf28}+26.55\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5039ms 1.2876ms 776.6189 Ops/s 715.0499 Ops/s $\textbf{\color{#35bf28}+8.61\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.3596ms 1.2056ms 829.4615 Ops/s 765.6713 Ops/s $\textbf{\color{#35bf28}+8.33\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.9499ms 6.1085ms 163.7061 Ops/s 167.3298 Ops/s $\color{#d91a1a}-2.17\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9271ms 0.4306ms 2.3223 KOps/s 2.0414 KOps/s $\textbf{\color{#35bf28}+13.76\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6630ms 0.4160ms 2.4036 KOps/s 2.0207 KOps/s $\textbf{\color{#35bf28}+18.95\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0535ms 5.8592ms 170.6721 Ops/s 169.4940 Ops/s $\color{#35bf28}+0.70\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7246ms 0.3320ms 3.0124 KOps/s 2.6754 KOps/s $\textbf{\color{#35bf28}+12.60\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7022ms 0.3926ms 2.5469 KOps/s 2.8616 KOps/s $\textbf{\color{#d91a1a}-11.00\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0997ms 5.8605ms 170.6339 Ops/s 170.7647 Ops/s $\color{#d91a1a}-0.08\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6578ms 0.3515ms 2.8450 KOps/s 2.7588 KOps/s $\color{#35bf28}+3.12\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4982ms 0.3356ms 2.9801 KOps/s 2.8906 KOps/s $\color{#35bf28}+3.10\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.4717ms 6.0087ms 166.4242 Ops/s 165.9599 Ops/s $\color{#35bf28}+0.28\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0686ms 0.4774ms 2.0945 KOps/s 1.8802 KOps/s $\textbf{\color{#35bf28}+11.39\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6380ms 0.4443ms 2.2507 KOps/s 2.3613 KOps/s $\color{#d91a1a}-4.68\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.6099s 17.1751ms 58.2238 Ops/s 197.1506 Ops/s $\textbf{\color{#d91a1a}-70.47\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 9.9589ms 1.9128ms 522.7909 Ops/s 455.2679 Ops/s $\textbf{\color{#35bf28}+14.83\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 9.9886ms 1.2470ms 801.8953 Ops/s 1.1041 KOps/s $\textbf{\color{#d91a1a}-27.37\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.9921ms 5.0605ms 197.6103 Ops/s 57.1658 Ops/s $\textbf{\color{#35bf28}+245.68\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9781ms 1.7512ms 571.0292 Ops/s 502.9362 Ops/s $\textbf{\color{#35bf28}+13.54\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 8.7608ms 1.2176ms 821.3022 Ops/s 852.3062 Ops/s $\color{#d91a1a}-3.64\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5112s 15.4151ms 64.8715 Ops/s 189.0762 Ops/s $\textbf{\color{#d91a1a}-65.69\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 11.4704ms 2.1304ms 469.4028 Ops/s 483.2352 Ops/s $\color{#d91a1a}-2.86\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.2430ms 1.0573ms 945.7843 Ops/s 990.4231 Ops/s $\color{#d91a1a}-4.51\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 39.3953ms 36.4336ms 27.4472 Ops/s 26.9100 Ops/s $\color{#35bf28}+2.00\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.2331ms 18.4784ms 54.1173 Ops/s 53.2385 Ops/s $\color{#35bf28}+1.65\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 41.2040ms 37.9033ms 26.3829 Ops/s 26.7503 Ops/s $\color{#d91a1a}-1.37\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.3875ms 18.7412ms 53.3585 Ops/s 54.0565 Ops/s $\color{#d91a1a}-1.29\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.9972ms 39.2897ms 25.4520 Ops/s 25.5798 Ops/s $\color{#d91a1a}-0.50\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.4672ms 20.1584ms 49.6072 Ops/s 50.1896 Ops/s $\color{#d91a1a}-1.16\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8505ms 0.2260ms 4.4250 KOps/s 4.5150 KOps/s $\color{#d91a1a}-1.99\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7141ms 1.4133ms 707.5549 Ops/s 714.6718 Ops/s $\color{#d91a1a}-1.00\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.5871ms 2.3174ms 431.5131 Ops/s 434.1032 Ops/s $\color{#d91a1a}-0.60\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0886ms 2.9342ms 340.8129 Ops/s 340.7377 Ops/s $\color{#35bf28}+0.02\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2385ms 0.1351ms 7.3995 KOps/s 7.5290 KOps/s $\color{#d91a1a}-1.72\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3489ms 0.1827ms 5.4736 KOps/s 5.3277 KOps/s $\color{#35bf28}+2.74\%$
test_storage_write_contiguous[100-img_shape2-large_img] 2.0253ms 1.7837ms 560.6265 Ops/s 573.1510 Ops/s $\color{#d91a1a}-2.19\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.4615ms 1.3156ms 760.1013 Ops/s 762.7007 Ops/s $\color{#d91a1a}-0.34\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2543ms 1.1239ms 889.7240 Ops/s 885.3191 Ops/s $\color{#35bf28}+0.50\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.7022ms 3.5194ms 284.1386 Ops/s 280.7325 Ops/s $\color{#35bf28}+1.21\%$
test_collector_stack_then_write[100-img_shape2-large_img] 5.9124ms 5.7781ms 173.0666 Ops/s 174.8889 Ops/s $\color{#d91a1a}-1.04\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.5808ms 7.3275ms 136.4730 Ops/s 142.5413 Ops/s $\color{#d91a1a}-4.26\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4297ms 0.2768ms 3.6130 KOps/s 3.5949 KOps/s $\color{#35bf28}+0.50\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7095ms 1.5298ms 653.6689 Ops/s 661.6765 Ops/s $\color{#d91a1a}-1.21\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.6217ms 2.4550ms 407.3242 Ops/s 411.1214 Ops/s $\color{#d91a1a}-0.92\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3195ms 3.1555ms 316.9059 Ops/s 317.0019 Ops/s $\color{#d91a1a}-0.03\%$
test_collector_without_rb[100-img_shape0-atari] 34.1846ms 33.1190ms 30.1941 Ops/s 30.1849 Ops/s $\color{#35bf28}+0.03\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.3079ms 65.3708ms 15.2973 Ops/s 15.3880 Ops/s $\color{#d91a1a}-0.59\%$
test_collector_with_rb[100-img_shape0-atari] 39.0321ms 37.9230ms 26.3692 Ops/s 26.4736 Ops/s $\color{#d91a1a}-0.39\%$
test_collector_with_rb[200-img_shape1-large_batch] 74.9522ms 74.4011ms 13.4407 Ops/s 13.5920 Ops/s $\color{#d91a1a}-1.11\%$

@github-actions
Copy link
Contributor

github-actions bot commented Feb 18, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.0041μs 80.0640μs 12.4900 KOps/s 12.4492 KOps/s $\color{#35bf28}+0.33\%$
test_tensor_to_bytestream_speed[torch.save] 0.1410ms 0.1406ms 7.1102 KOps/s 7.0809 KOps/s $\color{#35bf28}+0.41\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1139s 0.1136s 8.8020 Ops/s 8.8003 Ops/s $\color{#35bf28}+0.02\%$
test_tensor_to_bytestream_speed[numpy] 2.6525μs 2.6477μs 377.6874 KOps/s 363.4272 KOps/s $\color{#35bf28}+3.92\%$
test_tensor_to_bytestream_speed[safetensors] 36.4083μs 36.2612μs 27.5777 KOps/s 25.3355 KOps/s $\textbf{\color{#35bf28}+8.85\%}$
test_simple 0.8001s 0.7922s 1.2623 Ops/s 1.2314 Ops/s $\color{#35bf28}+2.51\%$
test_transformed 1.3899s 1.3860s 0.7215 Ops/s 0.7241 Ops/s $\color{#d91a1a}-0.36\%$
test_serial 2.2979s 2.2931s 0.4361 Ops/s 0.4384 Ops/s $\color{#d91a1a}-0.52\%$
test_parallel 1.9085s 1.8336s 0.5454 Ops/s 0.5603 Ops/s $\color{#d91a1a}-2.66\%$
test_step_mdp_speed[True-True-True-True-True] 0.4601ms 41.6855μs 23.9891 KOps/s 23.9071 KOps/s $\color{#35bf28}+0.34\%$
test_step_mdp_speed[True-True-True-True-False] 60.6510μs 23.3500μs 42.8266 KOps/s 42.3275 KOps/s $\color{#35bf28}+1.18\%$
test_step_mdp_speed[True-True-True-False-True] 49.4210μs 23.2214μs 43.0638 KOps/s 41.9621 KOps/s $\color{#35bf28}+2.63\%$
test_step_mdp_speed[True-True-True-False-False] 39.7710μs 12.8866μs 77.5998 KOps/s 77.7951 KOps/s $\color{#d91a1a}-0.25\%$
test_step_mdp_speed[True-True-False-True-True] 0.1014ms 45.2564μs 22.0963 KOps/s 21.8894 KOps/s $\color{#35bf28}+0.95\%$
test_step_mdp_speed[True-True-False-True-False] 63.4310μs 26.2150μs 38.1460 KOps/s 37.7711 KOps/s $\color{#35bf28}+0.99\%$
test_step_mdp_speed[True-True-False-False-True] 57.0810μs 26.0271μs 38.4215 KOps/s 37.8472 KOps/s $\color{#35bf28}+1.52\%$
test_step_mdp_speed[True-True-False-False-False] 54.6310μs 15.5344μs 64.3733 KOps/s 64.2350 KOps/s $\color{#35bf28}+0.22\%$
test_step_mdp_speed[True-False-True-True-True] 77.6420μs 47.6130μs 21.0027 KOps/s 20.6460 KOps/s $\color{#35bf28}+1.73\%$
test_step_mdp_speed[True-False-True-True-False] 59.7510μs 28.8525μs 34.6591 KOps/s 34.1602 KOps/s $\color{#35bf28}+1.46\%$
test_step_mdp_speed[True-False-True-False-True] 67.1710μs 26.5384μs 37.6813 KOps/s 37.7444 KOps/s $\color{#d91a1a}-0.17\%$
test_step_mdp_speed[True-False-True-False-False] 43.1710μs 15.6243μs 64.0029 KOps/s 63.5868 KOps/s $\color{#35bf28}+0.65\%$
test_step_mdp_speed[True-False-False-True-True] 82.0220μs 49.8298μs 20.0683 KOps/s 19.8008 KOps/s $\color{#35bf28}+1.35\%$
test_step_mdp_speed[True-False-False-True-False] 63.2920μs 31.1384μs 32.1147 KOps/s 31.2993 KOps/s $\color{#35bf28}+2.61\%$
test_step_mdp_speed[True-False-False-False-True] 68.4620μs 28.8396μs 34.6745 KOps/s 34.9448 KOps/s $\color{#d91a1a}-0.77\%$
test_step_mdp_speed[True-False-False-False-False] 47.7210μs 18.2464μs 54.8054 KOps/s 54.7155 KOps/s $\color{#35bf28}+0.16\%$
test_step_mdp_speed[False-True-True-True-True] 88.1410μs 47.5642μs 21.0242 KOps/s 20.5744 KOps/s $\color{#35bf28}+2.19\%$
test_step_mdp_speed[False-True-True-True-False] 56.1910μs 28.8470μs 34.6657 KOps/s 34.3424 KOps/s $\color{#35bf28}+0.94\%$
test_step_mdp_speed[False-True-True-False-True] 2.4571ms 30.4554μs 32.8349 KOps/s 32.4603 KOps/s $\color{#35bf28}+1.15\%$
test_step_mdp_speed[False-True-True-False-False] 40.5900μs 17.4759μs 57.2216 KOps/s 57.2009 KOps/s $\color{#35bf28}+0.04\%$
test_step_mdp_speed[False-True-False-True-True] 94.8810μs 50.3704μs 19.8529 KOps/s 19.6364 KOps/s $\color{#35bf28}+1.10\%$
test_step_mdp_speed[False-True-False-True-False] 70.3120μs 31.6797μs 31.5660 KOps/s 31.1764 KOps/s $\color{#35bf28}+1.25\%$
test_step_mdp_speed[False-True-False-False-True] 65.7410μs 32.2415μs 31.0159 KOps/s 30.2391 KOps/s $\color{#35bf28}+2.57\%$
test_step_mdp_speed[False-True-False-False-False] 53.0210μs 19.8119μs 50.4747 KOps/s 50.2136 KOps/s $\color{#35bf28}+0.52\%$
test_step_mdp_speed[False-False-True-True-True] 98.8730μs 53.5388μs 18.6781 KOps/s 18.6197 KOps/s $\color{#35bf28}+0.31\%$
test_step_mdp_speed[False-False-True-True-False] 71.7920μs 33.7478μs 29.6316 KOps/s 29.0047 KOps/s $\color{#35bf28}+2.16\%$
test_step_mdp_speed[False-False-True-False-True] 65.9010μs 32.6292μs 30.6474 KOps/s 30.1994 KOps/s $\color{#35bf28}+1.48\%$
test_step_mdp_speed[False-False-True-False-False] 52.0910μs 19.5098μs 51.2563 KOps/s 49.8661 KOps/s $\color{#35bf28}+2.79\%$
test_step_mdp_speed[False-False-False-True-True] 0.1061ms 54.6945μs 18.2834 KOps/s 18.1176 KOps/s $\color{#35bf28}+0.92\%$
test_step_mdp_speed[False-False-False-True-False] 87.7820μs 36.5206μs 27.3818 KOps/s 27.3829 KOps/s $-0.00\%$
test_step_mdp_speed[False-False-False-False-True] 71.7310μs 34.2303μs 29.2138 KOps/s 29.0595 KOps/s $\color{#35bf28}+0.53\%$
test_step_mdp_speed[False-False-False-False-False] 48.1910μs 22.2205μs 45.0034 KOps/s 45.0158 KOps/s $\color{#d91a1a}-0.03\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8369s 0.7419s 1.3478 Ops/s 1.3412 Ops/s $\color{#35bf28}+0.49\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7033s 0.6086s 1.6432 Ops/s 1.6397 Ops/s $\color{#35bf28}+0.22\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7067s 1.6303s 0.6134 Ops/s 0.6093 Ops/s $\color{#35bf28}+0.68\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5118s 1.4291s 0.6998 Ops/s 0.7039 Ops/s $\color{#d91a1a}-0.59\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9691s 1.8919s 0.5286 Ops/s 0.5309 Ops/s $\color{#d91a1a}-0.43\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7389s 1.6613s 0.6019 Ops/s 0.6006 Ops/s $\color{#35bf28}+0.23\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7261s 4.6731s 0.2140 Ops/s 0.2126 Ops/s $\color{#35bf28}+0.67\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5581s 4.4685s 0.2238 Ops/s 0.2259 Ops/s $\color{#d91a1a}-0.93\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9450s 1.8862s 0.5302 Ops/s 0.5352 Ops/s $\color{#d91a1a}-0.94\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6905s 1.6058s 0.6227 Ops/s 0.6331 Ops/s $\color{#d91a1a}-1.63\%$
test_values[generalized_advantage_estimate-True-True] 22.4043ms 20.9016ms 47.8433 Ops/s 48.7156 Ops/s $\color{#d91a1a}-1.79\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1473s 3.8745ms 258.0983 Ops/s 261.1365 Ops/s $\color{#d91a1a}-1.16\%$
test_values[td0_return_estimate-False-False] 0.1085ms 84.5741μs 11.8240 KOps/s 11.9200 KOps/s $\color{#d91a1a}-0.81\%$
test_values[td1_return_estimate-False-False] 49.4866ms 49.2641ms 20.2988 Ops/s 20.4535 Ops/s $\color{#d91a1a}-0.76\%$
test_values[vec_td1_return_estimate-False-False] 1.3912ms 1.0997ms 909.3788 Ops/s 911.5814 Ops/s $\color{#d91a1a}-0.24\%$
test_values[td_lambda_return_estimate-True-False] 81.5002ms 80.3994ms 12.4379 Ops/s 12.3287 Ops/s $\color{#35bf28}+0.89\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3379ms 1.0963ms 912.1655 Ops/s 914.8322 Ops/s $\color{#d91a1a}-0.29\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 21.1262ms 20.8310ms 48.0054 Ops/s 48.1412 Ops/s $\color{#d91a1a}-0.28\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0425ms 0.7669ms 1.3040 KOps/s 1.3031 KOps/s $\color{#35bf28}+0.06\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7276ms 0.6851ms 1.4597 KOps/s 1.4502 KOps/s $\color{#35bf28}+0.66\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5622ms 1.5048ms 664.5418 Ops/s 667.3871 Ops/s $\color{#d91a1a}-0.43\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7620ms 0.7016ms 1.4252 KOps/s 1.4186 KOps/s $\color{#35bf28}+0.46\%$
test_dqn_speed[False-None] 1.7051ms 1.6019ms 624.2601 Ops/s 651.0712 Ops/s $\color{#d91a1a}-4.12\%$
test_dqn_speed[False-backward] 2.4240ms 2.1942ms 455.7531 Ops/s 458.2016 Ops/s $\color{#d91a1a}-0.53\%$
test_dqn_speed[True-None] 1.2120ms 0.5623ms 1.7783 KOps/s 1.7063 KOps/s $\color{#35bf28}+4.22\%$
test_dqn_speed[True-backward] 1.1803ms 1.1061ms 904.0920 Ops/s 822.4186 Ops/s $\textbf{\color{#35bf28}+9.93\%}$
test_dqn_speed[reduce-overhead-None] 0.7020ms 0.6152ms 1.6255 KOps/s 1.5987 KOps/s $\color{#35bf28}+1.68\%$
test_ddpg_speed[False-None] 3.4761ms 2.9115ms 343.4667 Ops/s 338.9731 Ops/s $\color{#35bf28}+1.33\%$
test_ddpg_speed[False-backward] 4.8099ms 4.2705ms 234.1632 Ops/s 230.3317 Ops/s $\color{#35bf28}+1.66\%$
test_ddpg_speed[True-None] 1.4561ms 1.3681ms 730.9399 Ops/s 735.1327 Ops/s $\color{#d91a1a}-0.57\%$
test_ddpg_speed[True-backward] 2.4322ms 2.3731ms 421.3830 Ops/s 396.4983 Ops/s $\textbf{\color{#35bf28}+6.28\%}$
test_ddpg_speed[reduce-overhead-None] 1.4607ms 1.3461ms 742.8933 Ops/s 731.8726 Ops/s $\color{#35bf28}+1.51\%$
test_sac_speed[False-None] 9.0100ms 8.5321ms 117.2048 Ops/s 118.1773 Ops/s $\color{#d91a1a}-0.82\%$
test_sac_speed[False-backward] 12.1680ms 11.5546ms 86.5456 Ops/s 84.9226 Ops/s $\color{#35bf28}+1.91\%$
test_sac_speed[True-None] 1.9207ms 1.8279ms 547.0774 Ops/s 535.6515 Ops/s $\color{#35bf28}+2.13\%$
test_sac_speed[True-backward] 3.6577ms 3.4607ms 288.9606 Ops/s 280.5728 Ops/s $\color{#35bf28}+2.99\%$
test_sac_speed[reduce-overhead-None] 19.2731ms 10.9539ms 91.2917 Ops/s 94.2397 Ops/s $\color{#d91a1a}-3.13\%$
test_redq_deprec_speed[False-None] 9.8867ms 9.2885ms 107.6600 Ops/s 74.6852 Ops/s $\textbf{\color{#35bf28}+44.15\%}$
test_redq_deprec_speed[False-backward] 13.0634ms 12.4989ms 80.0068 Ops/s 79.1754 Ops/s $\color{#35bf28}+1.05\%$
test_redq_deprec_speed[True-None] 2.6627ms 2.5391ms 393.8391 Ops/s 376.7248 Ops/s $\color{#35bf28}+4.54\%$
test_redq_deprec_speed[True-backward] 4.5115ms 4.1349ms 241.8411 Ops/s 224.9157 Ops/s $\textbf{\color{#35bf28}+7.53\%}$
test_redq_deprec_speed[reduce-overhead-None] 15.9062ms 9.8636ms 101.3824 Ops/s 102.1906 Ops/s $\color{#d91a1a}-0.79\%$
test_td3_speed[False-None] 8.3409ms 8.1635ms 122.4960 Ops/s 121.9737 Ops/s $\color{#35bf28}+0.43\%$
test_td3_speed[False-backward] 11.2828ms 10.5668ms 94.6356 Ops/s 91.9393 Ops/s $\color{#35bf28}+2.93\%$
test_td3_speed[True-None] 1.6775ms 1.6443ms 608.1446 Ops/s 602.1905 Ops/s $\color{#35bf28}+0.99\%$
test_td3_speed[True-backward] 3.1667ms 3.1061ms 321.9445 Ops/s 303.2681 Ops/s $\textbf{\color{#35bf28}+6.16\%}$
test_td3_speed[reduce-overhead-None] 47.3321ms 24.1713ms 41.3714 Ops/s 40.2578 Ops/s $\color{#35bf28}+2.77\%$
test_cql_speed[False-None] 17.5295ms 17.1994ms 58.1414 Ops/s 57.8398 Ops/s $\color{#35bf28}+0.52\%$
test_cql_speed[False-backward] 22.9810ms 22.5183ms 44.4083 Ops/s 43.4879 Ops/s $\color{#35bf28}+2.12\%$
test_cql_speed[True-None] 3.5108ms 3.2455ms 308.1171 Ops/s 302.5602 Ops/s $\color{#35bf28}+1.84\%$
test_cql_speed[True-backward] 6.3104ms 5.5597ms 179.8653 Ops/s 181.8923 Ops/s $\color{#d91a1a}-1.11\%$
test_cql_speed[reduce-overhead-None] 18.9396ms 11.8889ms 84.1118 Ops/s 83.4229 Ops/s $\color{#35bf28}+0.83\%$
test_a2c_speed[False-None] 3.9741ms 3.2510ms 307.5939 Ops/s 305.9974 Ops/s $\color{#35bf28}+0.52\%$
test_a2c_speed[False-backward] 6.9314ms 6.4591ms 154.8192 Ops/s 159.6786 Ops/s $\color{#d91a1a}-3.04\%$
test_a2c_speed[True-None] 1.4181ms 1.3475ms 742.1108 Ops/s 739.7497 Ops/s $\color{#35bf28}+0.32\%$
test_a2c_speed[True-backward] 3.5264ms 3.1056ms 322.0028 Ops/s 336.7436 Ops/s $\color{#d91a1a}-4.38\%$
test_a2c_speed[reduce-overhead-None] 1.0487ms 0.9733ms 1.0274 KOps/s 1.0290 KOps/s $\color{#d91a1a}-0.15\%$
test_ppo_speed[False-None] 4.1605ms 3.9019ms 256.2868 Ops/s 257.2384 Ops/s $\color{#d91a1a}-0.37\%$
test_ppo_speed[False-backward] 7.7258ms 7.2546ms 137.8429 Ops/s 140.3255 Ops/s $\color{#d91a1a}-1.77\%$
test_ppo_speed[True-None] 1.4973ms 1.3978ms 715.4354 Ops/s 703.1773 Ops/s $\color{#35bf28}+1.74\%$
test_ppo_speed[True-backward] 3.3178ms 3.2697ms 305.8416 Ops/s 316.6787 Ops/s $\color{#d91a1a}-3.42\%$
test_ppo_speed[reduce-overhead-None] 1.1526ms 1.0544ms 948.4372 Ops/s 917.5349 Ops/s $\color{#35bf28}+3.37\%$
test_reinforce_speed[False-None] 2.5074ms 2.3630ms 423.1925 Ops/s 431.7146 Ops/s $\color{#d91a1a}-1.97\%$
test_reinforce_speed[False-backward] 3.9830ms 3.5176ms 284.2830 Ops/s 290.3994 Ops/s $\color{#d91a1a}-2.11\%$
test_reinforce_speed[True-None] 1.3970ms 1.2959ms 771.6504 Ops/s 778.0584 Ops/s $\color{#d91a1a}-0.82\%$
test_reinforce_speed[True-backward] 3.1719ms 3.0682ms 325.9271 Ops/s 323.8683 Ops/s $\color{#35bf28}+0.64\%$
test_reinforce_speed[reduce-overhead-None] 0.5628s 10.6725ms 93.6991 Ops/s 105.8408 Ops/s $\textbf{\color{#d91a1a}-11.47\%}$
test_iql_speed[False-None] 9.9933ms 9.4222ms 106.1322 Ops/s 104.5581 Ops/s $\color{#35bf28}+1.51\%$
test_iql_speed[False-backward] 13.9246ms 13.4589ms 74.3005 Ops/s 73.9805 Ops/s $\color{#35bf28}+0.43\%$
test_iql_speed[True-None] 2.3025ms 2.1829ms 458.1112 Ops/s 455.1403 Ops/s $\color{#35bf28}+0.65\%$
test_iql_speed[True-backward] 5.0877ms 4.8472ms 206.3039 Ops/s 204.1917 Ops/s $\color{#35bf28}+1.03\%$
test_iql_speed[reduce-overhead-None] 17.5639ms 10.4671ms 95.5375 Ops/s 94.8414 Ops/s $\color{#35bf28}+0.73\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.4482ms 5.9941ms 166.8300 Ops/s 165.9094 Ops/s $\color{#35bf28}+0.55\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8826ms 0.2949ms 3.3913 KOps/s 2.7829 KOps/s $\textbf{\color{#35bf28}+21.86\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4957ms 0.2741ms 3.6487 KOps/s 2.8868 KOps/s $\textbf{\color{#35bf28}+26.39\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0153ms 5.7501ms 173.9114 Ops/s 166.6017 Ops/s $\color{#35bf28}+4.39\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6233ms 0.3474ms 2.8788 KOps/s 2.8797 KOps/s $\color{#d91a1a}-0.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5229ms 0.2651ms 3.7715 KOps/s 3.1827 KOps/s $\textbf{\color{#35bf28}+18.50\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6982ms 1.4201ms 704.1677 Ops/s 683.4771 Ops/s $\color{#35bf28}+3.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6124ms 1.3859ms 721.5439 Ops/s 775.0667 Ops/s $\textbf{\color{#d91a1a}-6.91\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0441ms 5.8906ms 169.7606 Ops/s 162.6687 Ops/s $\color{#35bf28}+4.36\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7552ms 0.5248ms 1.9053 KOps/s 2.1701 KOps/s $\textbf{\color{#d91a1a}-12.20\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7766ms 0.5018ms 1.9928 KOps/s 2.2455 KOps/s $\textbf{\color{#d91a1a}-11.26\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9783ms 5.8194ms 171.8394 Ops/s 165.2579 Ops/s $\color{#35bf28}+3.98\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6753ms 0.2842ms 3.5186 KOps/s 2.6888 KOps/s $\textbf{\color{#35bf28}+30.86\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4531ms 0.2705ms 3.6969 KOps/s 3.2878 KOps/s $\textbf{\color{#35bf28}+12.44\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1617ms 5.8948ms 169.6413 Ops/s 168.0545 Ops/s $\color{#35bf28}+0.94\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.8839ms 0.3658ms 2.7335 KOps/s 3.0681 KOps/s $\textbf{\color{#d91a1a}-10.91\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5725ms 0.3046ms 3.2835 KOps/s 3.7895 KOps/s $\textbf{\color{#d91a1a}-13.35\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1658ms 6.0160ms 166.2227 Ops/s 162.4292 Ops/s $\color{#35bf28}+2.34\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.1117ms 0.5108ms 1.9579 KOps/s 2.1810 KOps/s $\textbf{\color{#d91a1a}-10.23\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6557ms 0.4909ms 2.0371 KOps/s 2.2528 KOps/s $\textbf{\color{#d91a1a}-9.57\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.5349ms 5.0282ms 198.8794 Ops/s 48.7453 Ops/s $\textbf{\color{#35bf28}+308.00\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 11.5041ms 2.4263ms 412.1519 Ops/s 512.0733 Ops/s $\textbf{\color{#d91a1a}-19.51\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.3016ms 1.2029ms 831.3548 Ops/s 863.3580 Ops/s $\color{#d91a1a}-3.71\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5996s 17.0855ms 58.5292 Ops/s 189.3880 Ops/s $\textbf{\color{#d91a1a}-69.10\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 4.0392ms 1.9639ms 509.1935 Ops/s 542.7942 Ops/s $\textbf{\color{#d91a1a}-6.19\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 10.4667ms 1.3241ms 755.2349 Ops/s 702.1826 Ops/s $\textbf{\color{#35bf28}+7.56\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.8089ms 5.3417ms 187.2053 Ops/s 185.4430 Ops/s $\color{#35bf28}+0.95\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.1212ms 1.9516ms 512.3918 Ops/s 464.6395 Ops/s $\textbf{\color{#35bf28}+10.28\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.3720ms 1.1350ms 881.0869 Ops/s 895.6209 Ops/s $\color{#d91a1a}-1.62\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 39.7457ms 36.1064ms 27.6959 Ops/s 27.4707 Ops/s $\color{#35bf28}+0.82\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.9366ms 18.4826ms 54.1048 Ops/s 53.9253 Ops/s $\color{#35bf28}+0.33\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 41.5167ms 37.4560ms 26.6980 Ops/s 26.3202 Ops/s $\color{#35bf28}+1.44\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.2911ms 18.7219ms 53.4134 Ops/s 51.9545 Ops/s $\color{#35bf28}+2.81\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.4032ms 39.3421ms 25.4181 Ops/s 25.1647 Ops/s $\color{#35bf28}+1.01\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 0.5454s 30.9460ms 32.3144 Ops/s 48.9895 Ops/s $\textbf{\color{#d91a1a}-34.04\%}$
test_storage_write_lazystack[50-img_shape0-small] 0.9094ms 0.2303ms 4.3429 KOps/s 4.5114 KOps/s $\color{#d91a1a}-3.73\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7624ms 1.4149ms 706.7846 Ops/s 729.5795 Ops/s $\color{#d91a1a}-3.12\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.5056ms 2.2888ms 436.9125 Ops/s 447.8181 Ops/s $\color{#d91a1a}-2.44\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0850ms 2.9343ms 340.8025 Ops/s 342.0025 Ops/s $\color{#d91a1a}-0.35\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2414ms 0.1635ms 6.1154 KOps/s 6.1220 KOps/s $\color{#d91a1a}-0.11\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3761ms 0.2131ms 4.6929 KOps/s 3.9365 KOps/s $\textbf{\color{#35bf28}+19.21\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 1.8448ms 1.7368ms 575.7699 Ops/s 552.9617 Ops/s $\color{#35bf28}+4.12\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5252ms 1.3691ms 730.4222 Ops/s 763.6501 Ops/s $\color{#d91a1a}-4.35\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2527ms 1.1582ms 863.4235 Ops/s 865.1912 Ops/s $\color{#d91a1a}-0.20\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.8796ms 3.6328ms 275.2690 Ops/s 276.2811 Ops/s $\color{#d91a1a}-0.37\%$
test_collector_stack_then_write[100-img_shape2-large_img] 10.5118ms 5.8136ms 172.0110 Ops/s 173.3116 Ops/s $\color{#d91a1a}-0.75\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 15.0701ms 7.1856ms 139.1667 Ops/s 135.6664 Ops/s $\color{#35bf28}+2.58\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4594ms 0.2825ms 3.5395 KOps/s 3.5724 KOps/s $\color{#d91a1a}-0.92\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7265ms 1.5418ms 648.6101 Ops/s 676.3328 Ops/s $\color{#d91a1a}-4.10\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8235ms 2.4310ms 411.3477 Ops/s 415.9836 Ops/s $\color{#d91a1a}-1.11\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.2916ms 3.1507ms 317.3924 Ops/s 318.7558 Ops/s $\color{#d91a1a}-0.43\%$
test_collector_without_rb[100-img_shape0-atari] 33.7671ms 33.2291ms 30.0941 Ops/s 29.7304 Ops/s $\color{#35bf28}+1.22\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.3497ms 65.0195ms 15.3800 Ops/s 15.1062 Ops/s $\color{#35bf28}+1.81\%$
test_collector_with_rb[100-img_shape0-atari] 38.2242ms 37.5126ms 26.6577 Ops/s 25.9798 Ops/s $\color{#35bf28}+2.61\%$
test_collector_with_rb[200-img_shape1-large_batch] 74.9732ms 73.8251ms 13.5455 Ops/s 13.3520 Ops/s $\color{#35bf28}+1.45\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 56.2693ms 55.4863ms 18.0225 Ops/s 17.9800 Ops/s $\color{#35bf28}+0.24\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1131s 0.1112s 8.9924 Ops/s 8.9991 Ops/s $\color{#d91a1a}-0.07\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 58.9781ms 57.8127ms 17.2972 Ops/s 17.0898 Ops/s $\color{#35bf28}+1.21\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1155s 0.1146s 8.7255 Ops/s 8.5448 Ops/s $\color{#35bf28}+2.12\%$

Reverse the loop direction in _set_index_in_td to iterate from the
highest dim downward, so that dimensions of size 1 don't cause
premature numel() matches when reshaping the index tensor.

Fixes #3515

Co-authored-by: Cursor <cursoragent@cursor.com>
@vmoens vmoens force-pushed the fix-shape-pettingzoo branch from 2257f22 to 1f6772f Compare February 18, 2026 13:37
In _propagate_to_nested_keys, the parent (root-level) truncated/done
tensor has fewer dimensions than the nested (agent-level) tensor.
expand_as fails because it aligns from the right, mismatching batch
dims with agent dims. Fix by unsqueezing extra dims before expanding.

Co-authored-by: Cursor <cursoragent@cursor.com>
@vmoens vmoens merged commit e38347a into main Feb 18, 2026
116 of 121 checks passed
@vmoens vmoens deleted the fix-shape-pettingzoo branch February 18, 2026 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BugFix CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Environments Adds or modifies an environment wrapper ReplayBuffers Transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Shape mismatch in Transform with single-agent environments (Knights Archers Zombies)

1 participant