Skip to content

[Feature] Auto-batching inference server: threading transport#3493

Open
vmoens wants to merge 1 commit intogh/vmoens/235/basefrom
gh/vmoens/235/head
Open

[Feature] Auto-batching inference server: threading transport#3493
vmoens wants to merge 1 commit intogh/vmoens/235/basefrom
gh/vmoens/235/head

Conversation

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3493

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e96a9f9 with merge base 266e4aa (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 80.4039μs 79.2161μs 12.6237 KOps/s 12.5804 KOps/s $\color{#35bf28}+0.34\%$
test_tensor_to_bytestream_speed[torch.save] 0.1364ms 0.1360ms 7.3508 KOps/s 7.3116 KOps/s $\color{#35bf28}+0.54\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1123s 0.1122s 8.9107 Ops/s 8.9498 Ops/s $\color{#d91a1a}-0.44\%$
test_tensor_to_bytestream_speed[numpy] 2.4244μs 2.4200μs 413.2309 KOps/s 410.6580 KOps/s $\color{#35bf28}+0.63\%$
test_tensor_to_bytestream_speed[safetensors] 37.1421μs 36.5211μs 27.3814 KOps/s 27.3743 KOps/s $\color{#35bf28}+0.03\%$
test_simple 0.5369s 0.5361s 1.8652 Ops/s 1.7798 Ops/s $\color{#35bf28}+4.80\%$
test_transformed 1.2233s 1.1273s 0.8871 Ops/s 0.8830 Ops/s $\color{#35bf28}+0.46\%$
test_serial 1.7449s 1.6548s 0.6043 Ops/s 0.6029 Ops/s $\color{#35bf28}+0.23\%$
test_parallel 1.1174s 1.0218s 0.9786 Ops/s 0.9658 Ops/s $\color{#35bf28}+1.33\%$
test_step_mdp_speed[True-True-True-True-True] 0.3456ms 43.6050μs 22.9331 KOps/s 23.4085 KOps/s $\color{#d91a1a}-2.03\%$
test_step_mdp_speed[True-True-True-True-False] 71.4810μs 24.3307μs 41.1003 KOps/s 41.1093 KOps/s $\color{#d91a1a}-0.02\%$
test_step_mdp_speed[True-True-True-False-True] 69.7010μs 24.5631μs 40.7114 KOps/s 41.7508 KOps/s $\color{#d91a1a}-2.49\%$
test_step_mdp_speed[True-True-True-False-False] 45.2210μs 13.4986μs 74.0817 KOps/s 74.4027 KOps/s $\color{#d91a1a}-0.43\%$
test_step_mdp_speed[True-True-False-True-True] 85.4920μs 46.3868μs 21.5578 KOps/s 21.4323 KOps/s $\color{#35bf28}+0.59\%$
test_step_mdp_speed[True-True-False-True-False] 59.4910μs 27.3202μs 36.6029 KOps/s 36.7052 KOps/s $\color{#d91a1a}-0.28\%$
test_step_mdp_speed[True-True-False-False-True] 64.7920μs 27.1219μs 36.8706 KOps/s 36.5574 KOps/s $\color{#35bf28}+0.86\%$
test_step_mdp_speed[True-True-False-False-False] 44.5410μs 16.1486μs 61.9248 KOps/s 60.7460 KOps/s $\color{#35bf28}+1.94\%$
test_step_mdp_speed[True-False-True-True-True] 90.4920μs 49.1339μs 20.3526 KOps/s 20.1145 KOps/s $\color{#35bf28}+1.18\%$
test_step_mdp_speed[True-False-True-True-False] 65.7920μs 29.5185μs 33.8770 KOps/s 33.1505 KOps/s $\color{#35bf28}+2.19\%$
test_step_mdp_speed[True-False-True-False-True] 54.8710μs 27.0831μs 36.9234 KOps/s 36.6768 KOps/s $\color{#35bf28}+0.67\%$
test_step_mdp_speed[True-False-True-False-False] 46.6710μs 16.1435μs 61.9445 KOps/s 62.3634 KOps/s $\color{#d91a1a}-0.67\%$
test_step_mdp_speed[True-False-False-True-True] 84.3920μs 51.2106μs 19.5272 KOps/s 19.4976 KOps/s $\color{#35bf28}+0.15\%$
test_step_mdp_speed[True-False-False-True-False] 65.9620μs 32.3566μs 30.9056 KOps/s 30.7112 KOps/s $\color{#35bf28}+0.63\%$
test_step_mdp_speed[True-False-False-False-True] 64.7920μs 29.1824μs 34.2672 KOps/s 34.5724 KOps/s $\color{#d91a1a}-0.88\%$
test_step_mdp_speed[True-False-False-False-False] 48.3010μs 18.6878μs 53.5108 KOps/s 52.9729 KOps/s $\color{#35bf28}+1.02\%$
test_step_mdp_speed[False-True-True-True-True] 80.8210μs 48.3004μs 20.7038 KOps/s 20.8472 KOps/s $\color{#d91a1a}-0.69\%$
test_step_mdp_speed[False-True-True-True-False] 69.7720μs 29.8246μs 33.5294 KOps/s 33.6228 KOps/s $\color{#d91a1a}-0.28\%$
test_step_mdp_speed[False-True-True-False-True] 2.4417ms 31.2101μs 32.0409 KOps/s 32.4139 KOps/s $\color{#d91a1a}-1.15\%$
test_step_mdp_speed[False-True-True-False-False] 46.9910μs 17.8484μs 56.0273 KOps/s 54.5690 KOps/s $\color{#35bf28}+2.67\%$
test_step_mdp_speed[False-True-False-True-True] 89.7420μs 51.3665μs 19.4679 KOps/s 19.1879 KOps/s $\color{#35bf28}+1.46\%$
test_step_mdp_speed[False-True-False-True-False] 64.1020μs 31.9952μs 31.2547 KOps/s 30.9814 KOps/s $\color{#35bf28}+0.88\%$
test_step_mdp_speed[False-True-False-False-True] 68.2410μs 32.9234μs 30.3735 KOps/s 30.2632 KOps/s $\color{#35bf28}+0.36\%$
test_step_mdp_speed[False-True-False-False-False] 55.5110μs 20.3397μs 49.1650 KOps/s 49.9512 KOps/s $\color{#d91a1a}-1.57\%$
test_step_mdp_speed[False-False-True-True-True] 87.7420μs 54.0991μs 18.4846 KOps/s 18.1898 KOps/s $\color{#35bf28}+1.62\%$
test_step_mdp_speed[False-False-True-True-False] 71.6020μs 35.3497μs 28.2887 KOps/s 28.3119 KOps/s $\color{#d91a1a}-0.08\%$
test_step_mdp_speed[False-False-True-False-True] 75.6210μs 33.2882μs 30.0407 KOps/s 30.8206 KOps/s $\color{#d91a1a}-2.53\%$
test_step_mdp_speed[False-False-True-False-False] 53.8510μs 20.0895μs 49.7773 KOps/s 49.6875 KOps/s $\color{#35bf28}+0.18\%$
test_step_mdp_speed[False-False-False-True-True] 94.5720μs 55.2333μs 18.1050 KOps/s 17.8710 KOps/s $\color{#35bf28}+1.31\%$
test_step_mdp_speed[False-False-False-True-False] 74.6420μs 36.9386μs 27.0719 KOps/s 26.8390 KOps/s $\color{#35bf28}+0.87\%$
test_step_mdp_speed[False-False-False-False-True] 76.4920μs 34.7835μs 28.7492 KOps/s 28.4171 KOps/s $\color{#35bf28}+1.17\%$
test_step_mdp_speed[False-False-False-False-False] 55.4110μs 22.3739μs 44.6950 KOps/s 43.5916 KOps/s $\color{#35bf28}+2.53\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8545s 0.7555s 1.3236 Ops/s 1.3276 Ops/s $\color{#d91a1a}-0.31\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7202s 0.6221s 1.6075 Ops/s 1.6181 Ops/s $\color{#d91a1a}-0.66\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7339s 1.6543s 0.6045 Ops/s 0.6101 Ops/s $\color{#d91a1a}-0.93\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5104s 1.4267s 0.7009 Ops/s 0.7031 Ops/s $\color{#d91a1a}-0.32\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9670s 1.8895s 0.5292 Ops/s 0.5311 Ops/s $\color{#d91a1a}-0.34\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7540s 1.6709s 0.5985 Ops/s 0.6031 Ops/s $\color{#d91a1a}-0.76\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7286s 4.5934s 0.2177 Ops/s 0.2174 Ops/s $\color{#35bf28}+0.15\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.4827s 4.4070s 0.2269 Ops/s 0.2273 Ops/s $\color{#d91a1a}-0.19\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9650s 1.8944s 0.5279 Ops/s 0.5326 Ops/s $\color{#d91a1a}-0.89\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6626s 1.5846s 0.6311 Ops/s 0.6270 Ops/s $\color{#35bf28}+0.65\%$
test_values[generalized_advantage_estimate-True-True] 10.9503ms 10.2518ms 97.5442 Ops/s 96.2445 Ops/s $\color{#35bf28}+1.35\%$
test_values[vec_generalized_advantage_estimate-True-True] 22.6757ms 18.2115ms 54.9103 Ops/s 56.5580 Ops/s $\color{#d91a1a}-2.91\%$
test_values[td0_return_estimate-False-False] 0.2206ms 0.1281ms 7.8042 KOps/s 7.5676 KOps/s $\color{#35bf28}+3.13\%$
test_values[td1_return_estimate-False-False] 29.3791ms 28.1609ms 35.5102 Ops/s 34.9076 Ops/s $\color{#35bf28}+1.73\%$
test_values[vec_td1_return_estimate-False-False] 22.2692ms 18.1225ms 55.1799 Ops/s 56.2361 Ops/s $\color{#d91a1a}-1.88\%$
test_values[td_lambda_return_estimate-True-False] 43.4570ms 41.5525ms 24.0660 Ops/s 23.9022 Ops/s $\color{#35bf28}+0.69\%$
test_values[vec_td_lambda_return_estimate-True-False] 22.0082ms 18.1237ms 55.1763 Ops/s 56.0348 Ops/s $\color{#d91a1a}-1.53\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.1789ms 9.1070ms 109.8054 Ops/s 107.8048 Ops/s $\color{#35bf28}+1.86\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7974ms 1.5373ms 650.4857 Ops/s 629.0976 Ops/s $\color{#35bf28}+3.40\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5627ms 0.4335ms 2.3070 KOps/s 2.3844 KOps/s $\color{#d91a1a}-3.24\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 38.9253ms 35.2230ms 28.3905 Ops/s 28.4155 Ops/s $\color{#d91a1a}-0.09\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 5.3931ms 1.7524ms 570.6566 Ops/s 574.2768 Ops/s $\color{#d91a1a}-0.63\%$
test_dqn_speed[False-None] 1.5300ms 1.3888ms 720.0295 Ops/s 712.4052 Ops/s $\color{#35bf28}+1.07\%$
test_dqn_speed[False-backward] 2.0169ms 1.9254ms 519.3688 Ops/s 519.1654 Ops/s $\color{#35bf28}+0.04\%$
test_dqn_speed[True-None] 0.8796ms 0.5621ms 1.7790 KOps/s 1.7990 KOps/s $\color{#d91a1a}-1.11\%$
test_dqn_speed[True-backward] 1.0667ms 1.0236ms 976.9538 Ops/s 844.0477 Ops/s $\textbf{\color{#35bf28}+15.75\%}$
test_dqn_speed[reduce-overhead-None] 0.9362ms 0.5448ms 1.8355 KOps/s 1.7706 KOps/s $\color{#35bf28}+3.67\%$
test_ddpg_speed[False-None] 3.2175ms 2.8594ms 349.7177 Ops/s 349.4073 Ops/s $\color{#35bf28}+0.09\%$
test_ddpg_speed[False-backward] 4.2312ms 4.1199ms 242.7262 Ops/s 242.7527 Ops/s $\color{#d91a1a}-0.01\%$
test_ddpg_speed[True-None] 1.5384ms 1.4377ms 695.5490 Ops/s 687.5457 Ops/s $\color{#35bf28}+1.16\%$
test_ddpg_speed[True-backward] 2.5023ms 2.4566ms 407.0687 Ops/s 368.6518 Ops/s $\textbf{\color{#35bf28}+10.42\%}$
test_ddpg_speed[reduce-overhead-None] 1.8002ms 1.4141ms 707.1558 Ops/s 671.9478 Ops/s $\textbf{\color{#35bf28}+5.24\%}$
test_sac_speed[False-None] 8.5991ms 7.9865ms 125.2107 Ops/s 120.3638 Ops/s $\color{#35bf28}+4.03\%$
test_sac_speed[False-backward] 11.8783ms 11.3437ms 88.1548 Ops/s 86.7994 Ops/s $\color{#35bf28}+1.56\%$
test_sac_speed[True-None] 2.3845ms 2.2135ms 451.7826 Ops/s 454.8584 Ops/s $\color{#d91a1a}-0.68\%$
test_sac_speed[True-backward] 4.2708ms 4.1410ms 241.4891 Ops/s 216.9261 Ops/s $\textbf{\color{#35bf28}+11.32\%}$
test_sac_speed[reduce-overhead-None] 2.5173ms 2.2090ms 452.6890 Ops/s 447.3090 Ops/s $\color{#35bf28}+1.20\%$
test_redq_speed[False-None] 15.7166ms 10.8355ms 92.2889 Ops/s 94.9543 Ops/s $\color{#d91a1a}-2.81\%$
test_redq_speed[False-backward] 19.1800ms 18.1887ms 54.9792 Ops/s 54.6538 Ops/s $\color{#35bf28}+0.60\%$
test_redq_speed[True-None] 4.8812ms 4.6306ms 215.9534 Ops/s 212.9787 Ops/s $\color{#35bf28}+1.40\%$
test_redq_speed[True-backward] 10.3967ms 10.1043ms 98.9677 Ops/s 98.1965 Ops/s $\color{#35bf28}+0.79\%$
test_redq_speed[reduce-overhead-None] 4.8714ms 4.5419ms 220.1708 Ops/s 209.2098 Ops/s $\textbf{\color{#35bf28}+5.24\%}$
test_redq_deprec_speed[False-None] 11.9789ms 11.1904ms 89.3623 Ops/s 88.6953 Ops/s $\color{#35bf28}+0.75\%$
test_redq_deprec_speed[False-backward] 16.7233ms 16.2475ms 61.5480 Ops/s 61.1931 Ops/s $\color{#35bf28}+0.58\%$
test_redq_deprec_speed[True-None] 4.1861ms 3.8258ms 261.3843 Ops/s 261.6258 Ops/s $\color{#d91a1a}-0.09\%$
test_redq_deprec_speed[True-backward] 8.2456ms 7.9208ms 126.2504 Ops/s 122.5717 Ops/s $\color{#35bf28}+3.00\%$
test_redq_deprec_speed[reduce-overhead-None] 4.0837ms 3.7270ms 268.3101 Ops/s 269.2176 Ops/s $\color{#d91a1a}-0.34\%$
test_td3_speed[False-None] 48.5928ms 8.3878ms 119.2212 Ops/s 122.7589 Ops/s $\color{#d91a1a}-2.88\%$
test_td3_speed[False-backward] 11.3568ms 10.9801ms 91.0738 Ops/s 90.0763 Ops/s $\color{#35bf28}+1.11\%$
test_td3_speed[True-None] 1.9758ms 1.9250ms 519.4728 Ops/s 527.2891 Ops/s $\color{#d91a1a}-1.48\%$
test_td3_speed[True-backward] 4.1178ms 3.7794ms 264.5906 Ops/s 238.5573 Ops/s $\textbf{\color{#35bf28}+10.91\%}$
test_td3_speed[reduce-overhead-None] 1.9035ms 1.8641ms 536.4508 Ops/s 534.0580 Ops/s $\color{#35bf28}+0.45\%$
test_cql_speed[False-None] 29.8971ms 26.4313ms 37.8339 Ops/s 37.9669 Ops/s $\color{#d91a1a}-0.35\%$
test_cql_speed[False-backward] 39.3804ms 36.1667ms 27.6498 Ops/s 28.3213 Ops/s $\color{#d91a1a}-2.37\%$
test_cql_speed[True-None] 13.0940ms 12.6621ms 78.9757 Ops/s 80.0821 Ops/s $\color{#d91a1a}-1.38\%$
test_cql_speed[True-backward] 19.7151ms 19.0792ms 52.4132 Ops/s 52.8681 Ops/s $\color{#d91a1a}-0.86\%$
test_cql_speed[reduce-overhead-None] 12.9506ms 12.6526ms 79.0350 Ops/s 79.4030 Ops/s $\color{#d91a1a}-0.46\%$
test_a2c_speed[False-None] 6.0936ms 5.5481ms 180.2434 Ops/s 183.2182 Ops/s $\color{#d91a1a}-1.62\%$
test_a2c_speed[False-backward] 13.2114ms 12.2465ms 81.6559 Ops/s 82.2235 Ops/s $\color{#d91a1a}-0.69\%$
test_a2c_speed[True-None] 4.0580ms 3.8403ms 260.3968 Ops/s 259.9806 Ops/s $\color{#35bf28}+0.16\%$
test_a2c_speed[True-backward] 9.2491ms 8.8066ms 113.5514 Ops/s 112.7924 Ops/s $\color{#35bf28}+0.67\%$
test_a2c_speed[reduce-overhead-None] 3.9800ms 3.7947ms 263.5270 Ops/s 264.7112 Ops/s $\color{#d91a1a}-0.45\%$
test_ppo_speed[False-None] 6.3271ms 5.9876ms 167.0116 Ops/s 166.0523 Ops/s $\color{#35bf28}+0.58\%$
test_ppo_speed[False-backward] 13.2724ms 12.8751ms 77.6692 Ops/s 77.8461 Ops/s $\color{#d91a1a}-0.23\%$
test_ppo_speed[True-None] 4.0530ms 3.7454ms 266.9970 Ops/s 265.2422 Ops/s $\color{#35bf28}+0.66\%$
test_ppo_speed[True-backward] 8.8980ms 8.6540ms 115.5540 Ops/s 115.5894 Ops/s $\color{#d91a1a}-0.03\%$
test_ppo_speed[reduce-overhead-None] 3.8199ms 3.6956ms 270.5885 Ops/s 271.6026 Ops/s $\color{#d91a1a}-0.37\%$
test_reinforce_speed[False-None] 4.7918ms 4.6157ms 216.6507 Ops/s 213.5772 Ops/s $\color{#35bf28}+1.44\%$
test_reinforce_speed[False-backward] 7.7857ms 7.4754ms 133.7716 Ops/s 131.1130 Ops/s $\color{#35bf28}+2.03\%$
test_reinforce_speed[True-None] 3.3663ms 2.9444ms 339.6307 Ops/s 336.4119 Ops/s $\color{#35bf28}+0.96\%$
test_reinforce_speed[True-backward] 8.2057ms 7.8834ms 126.8491 Ops/s 126.4736 Ops/s $\color{#35bf28}+0.30\%$
test_reinforce_speed[reduce-overhead-None] 3.3445ms 2.9211ms 342.3411 Ops/s 337.4919 Ops/s $\color{#35bf28}+1.44\%$
test_iql_speed[False-None] 25.5152ms 20.6421ms 48.4446 Ops/s 48.7469 Ops/s $\color{#d91a1a}-0.62\%$
test_iql_speed[False-backward] 37.3224ms 31.3758ms 31.8717 Ops/s 31.9951 Ops/s $\color{#d91a1a}-0.39\%$
test_iql_speed[True-None] 10.5815ms 8.8826ms 112.5796 Ops/s 113.0580 Ops/s $\color{#d91a1a}-0.42\%$
test_iql_speed[True-backward] 18.1791ms 17.2561ms 57.9505 Ops/s 57.5762 Ops/s $\color{#35bf28}+0.65\%$
test_iql_speed[reduce-overhead-None] 9.1676ms 8.8286ms 113.2677 Ops/s 113.0453 Ops/s $\color{#35bf28}+0.20\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0525ms 5.8967ms 169.5866 Ops/s 172.6352 Ops/s $\color{#d91a1a}-1.77\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.0641ms 0.3577ms 2.7958 KOps/s 3.4929 KOps/s $\textbf{\color{#d91a1a}-19.96\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6647ms 0.3287ms 3.0422 KOps/s 3.2535 KOps/s $\textbf{\color{#d91a1a}-6.49\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9071ms 5.6509ms 176.9624 Ops/s 177.5601 Ops/s $\color{#d91a1a}-0.34\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7087ms 0.3382ms 2.9572 KOps/s 3.5548 KOps/s $\textbf{\color{#d91a1a}-16.81\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5176ms 0.2833ms 3.5298 KOps/s 3.7616 KOps/s $\textbf{\color{#d91a1a}-6.16\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5519ms 1.3171ms 759.2586 Ops/s 778.1310 Ops/s $\color{#d91a1a}-2.43\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4973ms 1.2386ms 807.3825 Ops/s 829.1898 Ops/s $\color{#d91a1a}-2.63\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.8973ms 5.7698ms 173.3148 Ops/s 174.1041 Ops/s $\color{#d91a1a}-0.45\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.9903ms 0.5100ms 1.9608 KOps/s 2.1185 KOps/s $\textbf{\color{#d91a1a}-7.44\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7284ms 0.4937ms 2.0256 KOps/s 2.2542 KOps/s $\textbf{\color{#d91a1a}-10.14\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.7776ms 5.6502ms 176.9860 Ops/s 176.2599 Ops/s $\color{#35bf28}+0.41\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0156ms 0.3810ms 2.6246 KOps/s 3.0462 KOps/s $\textbf{\color{#d91a1a}-13.84\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5391ms 0.3649ms 2.7405 KOps/s 3.1439 KOps/s $\textbf{\color{#d91a1a}-12.83\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.8122ms 5.5695ms 179.5492 Ops/s 178.8047 Ops/s $\color{#35bf28}+0.42\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7052ms 0.3734ms 2.6778 KOps/s 2.9201 KOps/s $\textbf{\color{#d91a1a}-8.30\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5342ms 0.3612ms 2.7686 KOps/s 2.6460 KOps/s $\color{#35bf28}+4.63\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.8681ms 5.7488ms 173.9508 Ops/s 172.8528 Ops/s $\color{#35bf28}+0.64\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.1576ms 0.4449ms 2.2477 KOps/s 2.0931 KOps/s $\textbf{\color{#35bf28}+7.39\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6390ms 0.4226ms 2.3665 KOps/s 2.1648 KOps/s $\textbf{\color{#35bf28}+9.31\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.5620s 16.1038ms 62.0972 Ops/s 57.5068 Ops/s $\textbf{\color{#35bf28}+7.98\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.8947ms 1.7273ms 578.9442 Ops/s 514.0624 Ops/s $\textbf{\color{#35bf28}+12.62\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 10.0157ms 1.2724ms 785.9359 Ops/s 830.9744 Ops/s $\textbf{\color{#d91a1a}-5.42\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.8910ms 5.0471ms 198.1353 Ops/s 202.3763 Ops/s $\color{#d91a1a}-2.10\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 10.5533ms 1.8931ms 528.2242 Ops/s 492.0217 Ops/s $\textbf{\color{#35bf28}+7.36\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 8.7892ms 1.2339ms 810.4336 Ops/s 913.1372 Ops/s $\textbf{\color{#d91a1a}-11.25\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.6822ms 5.1607ms 193.7724 Ops/s 55.0761 Ops/s $\textbf{\color{#35bf28}+251.83\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 11.1426ms 2.0735ms 482.2677 Ops/s 490.1054 Ops/s $\color{#d91a1a}-1.60\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.7598ms 1.0924ms 915.3877 Ops/s 954.1083 Ops/s $\color{#d91a1a}-4.06\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 37.5907ms 35.5978ms 28.0917 Ops/s 28.2165 Ops/s $\color{#d91a1a}-0.44\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.9824ms 18.3257ms 54.5681 Ops/s 55.8700 Ops/s $\color{#d91a1a}-2.33\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 39.8260ms 36.8444ms 27.1411 Ops/s 27.2005 Ops/s $\color{#d91a1a}-0.22\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.7651ms 18.4063ms 54.3292 Ops/s 53.9452 Ops/s $\color{#35bf28}+0.71\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.0198ms 38.3421ms 26.0810 Ops/s 25.8948 Ops/s $\color{#35bf28}+0.72\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.2711ms 19.7942ms 50.5200 Ops/s 49.6808 Ops/s $\color{#35bf28}+1.69\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8276ms 0.2127ms 4.7013 KOps/s 4.5999 KOps/s $\color{#35bf28}+2.20\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7407ms 1.4160ms 706.2287 Ops/s 695.3110 Ops/s $\color{#35bf28}+1.57\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7724ms 2.3726ms 421.4765 Ops/s 406.8679 Ops/s $\color{#35bf28}+3.59\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.2285ms 2.9796ms 335.6192 Ops/s 335.8528 Ops/s $\color{#d91a1a}-0.07\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2060ms 0.1333ms 7.5010 KOps/s 7.3493 KOps/s $\color{#35bf28}+2.06\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3234ms 0.1916ms 5.2187 KOps/s 5.2159 KOps/s $\color{#35bf28}+0.05\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9878ms 1.7998ms 555.6059 Ops/s 559.4689 Ops/s $\color{#d91a1a}-0.69\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.4714ms 1.3106ms 762.9817 Ops/s 768.7060 Ops/s $\color{#d91a1a}-0.74\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2456ms 1.0904ms 917.0872 Ops/s 920.4643 Ops/s $\color{#d91a1a}-0.37\%$
test_collector_stack_then_write[100-img_shape1-atari] 7.6636ms 3.5509ms 281.6177 Ops/s 284.6723 Ops/s $\color{#d91a1a}-1.07\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.0928ms 5.8434ms 171.1321 Ops/s 170.0840 Ops/s $\color{#35bf28}+0.62\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 15.0977ms 7.0837ms 141.1682 Ops/s 138.7681 Ops/s $\color{#35bf28}+1.73\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4050ms 0.2669ms 3.7474 KOps/s 3.7244 KOps/s $\color{#35bf28}+0.62\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6972ms 1.5371ms 650.5613 Ops/s 652.0438 Ops/s $\color{#d91a1a}-0.23\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 3.1770ms 2.5021ms 399.6715 Ops/s 388.7750 Ops/s $\color{#35bf28}+2.80\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.4448ms 3.1701ms 315.4520 Ops/s 315.4649 Ops/s $-0.00\%$
test_collector_without_rb[100-img_shape0-atari] 34.5205ms 33.4913ms 29.8585 Ops/s 29.7566 Ops/s $\color{#35bf28}+0.34\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.3383ms 65.9840ms 15.1552 Ops/s 15.1013 Ops/s $\color{#35bf28}+0.36\%$
test_collector_with_rb[100-img_shape0-atari] 38.7193ms 37.7465ms 26.4925 Ops/s 26.2627 Ops/s $\color{#35bf28}+0.88\%$
test_collector_with_rb[200-img_shape1-large_batch] 75.3236ms 74.6899ms 13.3887 Ops/s 13.3777 Ops/s $\color{#35bf28}+0.08\%$

@github-actions
Copy link
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}18$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.4859μs 80.5194μs 12.4194 KOps/s 11.7335 KOps/s $\textbf{\color{#35bf28}+5.85\%}$
test_tensor_to_bytestream_speed[torch.save] 0.1391ms 0.1387ms 7.2085 KOps/s 6.8030 KOps/s $\textbf{\color{#35bf28}+5.96\%}$
test_tensor_to_bytestream_speed[untyped_storage] 0.1095s 0.1090s 9.1780 Ops/s 9.8100 Ops/s $\textbf{\color{#d91a1a}-6.44\%}$
test_tensor_to_bytestream_speed[numpy] 2.5216μs 2.5125μs 398.0072 KOps/s 410.2145 KOps/s $\color{#d91a1a}-2.98\%$
test_tensor_to_bytestream_speed[safetensors] 36.5355μs 36.2310μs 27.6007 KOps/s 26.1909 KOps/s $\textbf{\color{#35bf28}+5.38\%}$
test_simple 0.7876s 0.7868s 1.2710 Ops/s 1.1981 Ops/s $\textbf{\color{#35bf28}+6.08\%}$
test_transformed 1.5381s 1.4452s 0.6920 Ops/s 0.7007 Ops/s $\color{#d91a1a}-1.25\%$
test_serial 2.4069s 2.3095s 0.4330 Ops/s 0.4375 Ops/s $\color{#d91a1a}-1.03\%$
test_parallel 1.9052s 1.8317s 0.5459 Ops/s 0.5575 Ops/s $\color{#d91a1a}-2.06\%$
test_step_mdp_speed[True-True-True-True-True] 0.1849ms 44.9933μs 22.2255 KOps/s 21.8750 KOps/s $\color{#35bf28}+1.60\%$
test_step_mdp_speed[True-True-True-True-False] 62.1110μs 25.4788μs 39.2483 KOps/s 38.6536 KOps/s $\color{#35bf28}+1.54\%$
test_step_mdp_speed[True-True-True-False-True] 61.1710μs 25.4201μs 39.3390 KOps/s 39.4558 KOps/s $\color{#d91a1a}-0.30\%$
test_step_mdp_speed[True-True-True-False-False] 41.0500μs 13.9714μs 71.5747 KOps/s 71.6405 KOps/s $\color{#d91a1a}-0.09\%$
test_step_mdp_speed[True-True-False-True-True] 81.5610μs 47.9937μs 20.8361 KOps/s 20.7515 KOps/s $\color{#35bf28}+0.41\%$
test_step_mdp_speed[True-True-False-True-False] 65.9710μs 28.2526μs 35.3950 KOps/s 35.6646 KOps/s $\color{#d91a1a}-0.76\%$
test_step_mdp_speed[True-True-False-False-True] 57.7410μs 28.1308μs 35.5482 KOps/s 35.6626 KOps/s $\color{#d91a1a}-0.32\%$
test_step_mdp_speed[True-True-False-False-False] 50.5010μs 16.7355μs 59.7532 KOps/s 59.5300 KOps/s $\color{#35bf28}+0.37\%$
test_step_mdp_speed[True-False-True-True-True] 0.1090ms 51.1787μs 19.5394 KOps/s 19.2589 KOps/s $\color{#35bf28}+1.46\%$
test_step_mdp_speed[True-False-True-True-False] 69.5310μs 31.2102μs 32.0408 KOps/s 31.9525 KOps/s $\color{#35bf28}+0.28\%$
test_step_mdp_speed[True-False-True-False-True] 69.3110μs 28.4271μs 35.1777 KOps/s 34.8029 KOps/s $\color{#35bf28}+1.08\%$
test_step_mdp_speed[True-False-True-False-False] 53.6710μs 16.9442μs 59.0173 KOps/s 59.3134 KOps/s $\color{#d91a1a}-0.50\%$
test_step_mdp_speed[True-False-False-True-True] 96.2020μs 53.3471μs 18.7452 KOps/s 18.4707 KOps/s $\color{#35bf28}+1.49\%$
test_step_mdp_speed[True-False-False-True-False] 65.4810μs 33.7794μs 29.6038 KOps/s 29.7525 KOps/s $\color{#d91a1a}-0.50\%$
test_step_mdp_speed[True-False-False-False-True] 70.5910μs 30.2288μs 33.0810 KOps/s 32.4864 KOps/s $\color{#35bf28}+1.83\%$
test_step_mdp_speed[True-False-False-False-False] 46.2310μs 19.6483μs 50.8950 KOps/s 51.5541 KOps/s $\color{#d91a1a}-1.28\%$
test_step_mdp_speed[False-True-True-True-True] 85.1720μs 51.4423μs 19.4393 KOps/s 19.4803 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[False-True-True-True-False] 71.2410μs 31.3027μs 31.9461 KOps/s 32.4436 KOps/s $\color{#d91a1a}-1.53\%$
test_step_mdp_speed[False-True-True-False-True] 2.2869ms 32.2064μs 31.0497 KOps/s 31.5691 KOps/s $\color{#d91a1a}-1.65\%$
test_step_mdp_speed[False-True-True-False-False] 46.5110μs 18.6890μs 53.5074 KOps/s 54.0056 KOps/s $\color{#d91a1a}-0.92\%$
test_step_mdp_speed[False-True-False-True-True] 91.7810μs 54.0444μs 18.5033 KOps/s 18.4129 KOps/s $\color{#35bf28}+0.49\%$
test_step_mdp_speed[False-True-False-True-False] 75.8210μs 33.5668μs 29.7914 KOps/s 29.4743 KOps/s $\color{#35bf28}+1.08\%$
test_step_mdp_speed[False-True-False-False-True] 78.1310μs 34.4238μs 29.0496 KOps/s 28.6808 KOps/s $\color{#35bf28}+1.29\%$
test_step_mdp_speed[False-True-False-False-False] 62.8610μs 21.3348μs 46.8719 KOps/s 46.8798 KOps/s $\color{#d91a1a}-0.02\%$
test_step_mdp_speed[False-False-True-True-True] 98.1710μs 55.8397μs 17.9084 KOps/s 17.4948 KOps/s $\color{#35bf28}+2.36\%$
test_step_mdp_speed[False-False-True-True-False] 74.9210μs 36.8033μs 27.1715 KOps/s 27.3534 KOps/s $\color{#d91a1a}-0.67\%$
test_step_mdp_speed[False-False-True-False-True] 72.0920μs 34.9152μs 28.6408 KOps/s 29.0273 KOps/s $\color{#d91a1a}-1.33\%$
test_step_mdp_speed[False-False-True-False-False] 50.4910μs 21.2510μs 47.0565 KOps/s 46.4620 KOps/s $\color{#35bf28}+1.28\%$
test_step_mdp_speed[False-False-False-True-True] 0.1246ms 58.4819μs 17.0993 KOps/s 16.7724 KOps/s $\color{#35bf28}+1.95\%$
test_step_mdp_speed[False-False-False-True-False] 80.8110μs 39.0133μs 25.6323 KOps/s 25.3904 KOps/s $\color{#35bf28}+0.95\%$
test_step_mdp_speed[False-False-False-False-True] 72.5420μs 36.5000μs 27.3972 KOps/s 26.8912 KOps/s $\color{#35bf28}+1.88\%$
test_step_mdp_speed[False-False-False-False-False] 61.2910μs 23.5425μs 42.4763 KOps/s 41.6722 KOps/s $\color{#35bf28}+1.93\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8760s 0.7762s 1.2883 Ops/s 1.2927 Ops/s $\color{#d91a1a}-0.34\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7306s 0.6363s 1.5715 Ops/s 1.5660 Ops/s $\color{#35bf28}+0.35\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7701s 1.6918s 0.5911 Ops/s 0.5904 Ops/s $\color{#35bf28}+0.12\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5341s 1.4564s 0.6866 Ops/s 0.6804 Ops/s $\color{#35bf28}+0.91\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0052s 1.9276s 0.5188 Ops/s 0.5158 Ops/s $\color{#35bf28}+0.59\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7881s 1.7067s 0.5859 Ops/s 0.5825 Ops/s $\color{#35bf28}+0.58\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7264s 4.6390s 0.2156 Ops/s 0.2192 Ops/s $\color{#d91a1a}-1.67\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.6475s 4.4768s 0.2234 Ops/s 0.2232 Ops/s $\color{#35bf28}+0.09\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9856s 1.9200s 0.5208 Ops/s 0.5155 Ops/s $\color{#35bf28}+1.03\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7526s 1.6369s 0.6109 Ops/s 0.6135 Ops/s $\color{#d91a1a}-0.42\%$
test_values[generalized_advantage_estimate-True-True] 20.4305ms 19.8174ms 50.4608 Ops/s 51.0788 Ops/s $\color{#d91a1a}-1.21\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1255s 3.4215ms 292.2726 Ops/s 264.7132 Ops/s $\textbf{\color{#35bf28}+10.41\%}$
test_values[td0_return_estimate-False-False] 0.1059ms 80.8227μs 12.3728 KOps/s 12.4900 KOps/s $\color{#d91a1a}-0.94\%$
test_values[td1_return_estimate-False-False] 48.1818ms 47.3766ms 21.1074 Ops/s 21.2203 Ops/s $\color{#d91a1a}-0.53\%$
test_values[vec_td1_return_estimate-False-False] 1.3631ms 1.0789ms 926.8750 Ops/s 919.3998 Ops/s $\color{#35bf28}+0.81\%$
test_values[td_lambda_return_estimate-True-False] 80.5118ms 78.1213ms 12.8006 Ops/s 12.7164 Ops/s $\color{#35bf28}+0.66\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2993ms 1.0735ms 931.5295 Ops/s 946.0283 Ops/s $\color{#d91a1a}-1.53\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 21.5644ms 21.0591ms 47.4855 Ops/s 50.9868 Ops/s $\textbf{\color{#d91a1a}-6.87\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0255ms 0.7435ms 1.3451 KOps/s 1.3682 KOps/s $\color{#d91a1a}-1.69\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7315ms 0.6821ms 1.4660 KOps/s 1.5300 KOps/s $\color{#d91a1a}-4.18\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5435ms 1.4819ms 674.8284 Ops/s 682.4351 Ops/s $\color{#d91a1a}-1.11\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7460ms 0.6852ms 1.4594 KOps/s 1.4907 KOps/s $\color{#d91a1a}-2.10\%$
test_dqn_speed[False-None] 1.6420ms 1.5447ms 647.3892 Ops/s 651.1558 Ops/s $\color{#d91a1a}-0.58\%$
test_dqn_speed[False-backward] 2.3540ms 2.1536ms 464.3484 Ops/s 466.5544 Ops/s $\color{#d91a1a}-0.47\%$
test_dqn_speed[True-None] 0.6575ms 0.5578ms 1.7929 KOps/s 1.7769 KOps/s $\color{#35bf28}+0.90\%$
test_dqn_speed[True-backward] 1.2485ms 1.1891ms 840.9631 Ops/s 935.9862 Ops/s $\textbf{\color{#d91a1a}-10.15\%}$
test_dqn_speed[reduce-overhead-None] 0.6608ms 0.5972ms 1.6745 KOps/s 1.6906 KOps/s $\color{#d91a1a}-0.95\%$
test_ddpg_speed[False-None] 3.2677ms 2.9001ms 344.8210 Ops/s 347.9230 Ops/s $\color{#d91a1a}-0.89\%$
test_ddpg_speed[False-backward] 4.5853ms 4.2498ms 235.3032 Ops/s 242.4137 Ops/s $\color{#d91a1a}-2.93\%$
test_ddpg_speed[True-None] 1.4254ms 1.3017ms 768.2379 Ops/s 767.9517 Ops/s $\color{#35bf28}+0.04\%$
test_ddpg_speed[True-backward] 2.5877ms 2.4976ms 400.3896 Ops/s 427.2228 Ops/s $\textbf{\color{#d91a1a}-6.28\%}$
test_ddpg_speed[reduce-overhead-None] 1.4365ms 1.3541ms 738.4784 Ops/s 751.0846 Ops/s $\color{#d91a1a}-1.68\%$
test_sac_speed[False-None] 8.8649ms 8.3050ms 120.4096 Ops/s 121.0333 Ops/s $\color{#d91a1a}-0.52\%$
test_sac_speed[False-backward] 11.8938ms 11.4736ms 87.1566 Ops/s 89.8323 Ops/s $\color{#d91a1a}-2.98\%$
test_sac_speed[True-None] 1.8830ms 1.7896ms 558.7700 Ops/s 550.5266 Ops/s $\color{#35bf28}+1.50\%$
test_sac_speed[True-backward] 3.6736ms 3.5772ms 279.5447 Ops/s 282.0163 Ops/s $\color{#d91a1a}-0.88\%$
test_sac_speed[reduce-overhead-None] 0.3616s 12.0549ms 82.9538 Ops/s 92.0069 Ops/s $\textbf{\color{#d91a1a}-9.84\%}$
test_redq_deprec_speed[False-None] 9.7966ms 9.2797ms 107.7620 Ops/s 108.8472 Ops/s $\color{#d91a1a}-1.00\%$
test_redq_deprec_speed[False-backward] 13.2480ms 12.6246ms 79.2101 Ops/s 80.2824 Ops/s $\color{#d91a1a}-1.34\%$
test_redq_deprec_speed[True-None] 2.7411ms 2.5118ms 398.1208 Ops/s 402.1107 Ops/s $\color{#d91a1a}-0.99\%$
test_redq_deprec_speed[True-backward] 4.5349ms 4.2479ms 235.4112 Ops/s 245.3385 Ops/s $\color{#d91a1a}-4.05\%$
test_redq_deprec_speed[reduce-overhead-None] 17.1235ms 9.9330ms 100.6746 Ops/s 100.5759 Ops/s $\color{#35bf28}+0.10\%$
test_td3_speed[False-None] 8.1910ms 8.1398ms 122.8529 Ops/s 124.4871 Ops/s $\color{#d91a1a}-1.31\%$
test_td3_speed[False-backward] 11.2005ms 10.6854ms 93.5860 Ops/s 95.9545 Ops/s $\color{#d91a1a}-2.47\%$
test_td3_speed[True-None] 1.7083ms 1.6663ms 600.1158 Ops/s 586.6834 Ops/s $\color{#35bf28}+2.29\%$
test_td3_speed[True-backward] 3.3173ms 3.2099ms 311.5400 Ops/s 325.4802 Ops/s $\color{#d91a1a}-4.28\%$
test_td3_speed[reduce-overhead-None] 46.2386ms 23.9808ms 41.7001 Ops/s 39.9124 Ops/s $\color{#35bf28}+4.48\%$
test_cql_speed[False-None] 17.3145ms 17.0212ms 58.7503 Ops/s 58.7420 Ops/s $\color{#35bf28}+0.01\%$
test_cql_speed[False-backward] 23.2414ms 22.6443ms 44.1612 Ops/s 45.2251 Ops/s $\color{#d91a1a}-2.35\%$
test_cql_speed[True-None] 3.3261ms 3.2099ms 311.5402 Ops/s 309.3186 Ops/s $\color{#35bf28}+0.72\%$
test_cql_speed[True-backward] 5.5587ms 5.4305ms 184.1435 Ops/s 178.3180 Ops/s $\color{#35bf28}+3.27\%$
test_cql_speed[reduce-overhead-None] 0.6868s 15.3980ms 64.9435 Ops/s 84.0050 Ops/s $\textbf{\color{#d91a1a}-22.69\%}$
test_a2c_speed[False-None] 3.8962ms 3.2035ms 312.1582 Ops/s 311.1865 Ops/s $\color{#35bf28}+0.31\%$
test_a2c_speed[False-backward] 6.7073ms 6.2977ms 158.7884 Ops/s 160.8034 Ops/s $\color{#d91a1a}-1.25\%$
test_a2c_speed[True-None] 1.4263ms 1.3259ms 754.2053 Ops/s 742.0280 Ops/s $\color{#35bf28}+1.64\%$
test_a2c_speed[True-backward] 3.1240ms 3.0699ms 325.7453 Ops/s 323.4720 Ops/s $\color{#35bf28}+0.70\%$
test_a2c_speed[reduce-overhead-None] 1.0500ms 0.9840ms 1.0162 KOps/s 1.0154 KOps/s $\color{#35bf28}+0.09\%$
test_ppo_speed[False-None] 3.9240ms 3.7934ms 263.6166 Ops/s 266.0730 Ops/s $\color{#d91a1a}-0.92\%$
test_ppo_speed[False-backward] 7.4125ms 7.0335ms 142.1771 Ops/s 145.8131 Ops/s $\color{#d91a1a}-2.49\%$
test_ppo_speed[True-None] 1.5389ms 1.4062ms 711.1268 Ops/s 704.2060 Ops/s $\color{#35bf28}+0.98\%$
test_ppo_speed[True-backward] 3.2736ms 3.2164ms 310.9054 Ops/s 311.1672 Ops/s $\color{#d91a1a}-0.08\%$
test_ppo_speed[reduce-overhead-None] 1.1344ms 1.0454ms 956.6064 Ops/s 925.5935 Ops/s $\color{#35bf28}+3.35\%$
test_reinforce_speed[False-None] 2.4053ms 2.2717ms 440.1904 Ops/s 443.0735 Ops/s $\color{#d91a1a}-0.65\%$
test_reinforce_speed[False-backward] 3.4040ms 3.3685ms 296.8722 Ops/s 299.3020 Ops/s $\color{#d91a1a}-0.81\%$
test_reinforce_speed[True-None] 1.4051ms 1.2847ms 778.3854 Ops/s 787.7119 Ops/s $\color{#d91a1a}-1.18\%$
test_reinforce_speed[True-backward] 3.0711ms 3.0127ms 331.9245 Ops/s 327.2665 Ops/s $\color{#35bf28}+1.42\%$
test_reinforce_speed[reduce-overhead-None] 17.7812ms 9.7048ms 103.0414 Ops/s 104.6867 Ops/s $\color{#d91a1a}-1.57\%$
test_iql_speed[False-None] 10.0341ms 9.4133ms 106.2330 Ops/s 107.2817 Ops/s $\color{#d91a1a}-0.98\%$
test_iql_speed[False-backward] 13.9276ms 13.3175ms 75.0890 Ops/s 75.6606 Ops/s $\color{#d91a1a}-0.76\%$
test_iql_speed[True-None] 2.2210ms 2.1646ms 461.9809 Ops/s 460.3144 Ops/s $\color{#35bf28}+0.36\%$
test_iql_speed[True-backward] 5.1917ms 4.7972ms 208.4541 Ops/s 206.0354 Ops/s $\color{#35bf28}+1.17\%$
test_iql_speed[reduce-overhead-None] 17.9759ms 10.5860ms 94.4644 Ops/s 74.2755 Ops/s $\textbf{\color{#35bf28}+27.18\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.4163ms 5.9875ms 167.0147 Ops/s 165.7407 Ops/s $\color{#35bf28}+0.77\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0181ms 0.3511ms 2.8480 KOps/s 3.5090 KOps/s $\textbf{\color{#d91a1a}-18.84\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7005ms 0.3541ms 2.8244 KOps/s 3.7506 KOps/s $\textbf{\color{#d91a1a}-24.69\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0573ms 5.7201ms 174.8220 Ops/s 170.1002 Ops/s $\color{#35bf28}+2.78\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.4766ms 0.3788ms 2.6401 KOps/s 3.1932 KOps/s $\textbf{\color{#d91a1a}-17.32\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 1.0484ms 0.3833ms 2.6091 KOps/s 3.0007 KOps/s $\textbf{\color{#d91a1a}-13.05\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6343ms 1.3826ms 723.2580 Ops/s 740.5349 Ops/s $\color{#d91a1a}-2.33\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4153ms 1.1849ms 843.9763 Ops/s 784.3088 Ops/s $\textbf{\color{#35bf28}+7.61\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0944ms 5.9415ms 168.3072 Ops/s 165.6268 Ops/s $\color{#35bf28}+1.62\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2572ms 0.4342ms 2.3032 KOps/s 2.0681 KOps/s $\textbf{\color{#35bf28}+11.37\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6335ms 0.4176ms 2.3948 KOps/s 2.4039 KOps/s $\color{#d91a1a}-0.38\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0059ms 5.8280ms 171.5865 Ops/s 169.6449 Ops/s $\color{#35bf28}+1.14\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.0428ms 0.3728ms 2.6821 KOps/s 3.4932 KOps/s $\textbf{\color{#d91a1a}-23.22\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6061ms 0.3550ms 2.8167 KOps/s 3.7210 KOps/s $\textbf{\color{#d91a1a}-24.30\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9784ms 5.7548ms 173.7687 Ops/s 172.7217 Ops/s $\color{#35bf28}+0.61\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8079ms 0.3745ms 2.6704 KOps/s 3.1181 KOps/s $\textbf{\color{#d91a1a}-14.36\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6829ms 0.3619ms 2.7635 KOps/s 3.2699 KOps/s $\textbf{\color{#d91a1a}-15.49\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0086ms 5.9247ms 168.7857 Ops/s 166.6001 Ops/s $\color{#35bf28}+1.31\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7702ms 0.4944ms 2.0228 KOps/s 2.0817 KOps/s $\color{#d91a1a}-2.83\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7211ms 0.4937ms 2.0255 KOps/s 2.1457 KOps/s $\textbf{\color{#d91a1a}-5.60\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.5823s 16.5682ms 60.3566 Ops/s 51.4755 Ops/s $\textbf{\color{#35bf28}+17.25\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 10.9177ms 2.0540ms 486.8621 Ops/s 526.1743 Ops/s $\textbf{\color{#d91a1a}-7.47\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.1439ms 0.9509ms 1.0517 KOps/s 766.2724 Ops/s $\textbf{\color{#35bf28}+37.24\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.5664ms 5.0600ms 197.6271 Ops/s 196.3292 Ops/s $\color{#35bf28}+0.66\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 12.9734ms 2.0546ms 486.7181 Ops/s 539.4349 Ops/s $\textbf{\color{#d91a1a}-9.77\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 3.0226ms 1.1676ms 856.4770 Ops/s 791.2159 Ops/s $\textbf{\color{#35bf28}+8.25\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 9.1855ms 5.1974ms 192.4054 Ops/s 189.2367 Ops/s $\color{#35bf28}+1.67\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 6.7326ms 2.0113ms 497.1792 Ops/s 62.7371 Ops/s $\textbf{\color{#35bf28}+692.48\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 13.6702ms 1.5323ms 652.6305 Ops/s 912.4520 Ops/s $\textbf{\color{#d91a1a}-28.48\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 37.6173ms 35.3074ms 28.3227 Ops/s 27.8968 Ops/s $\color{#35bf28}+1.53\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.8764ms 18.0242ms 55.4811 Ops/s 56.1255 Ops/s $\color{#d91a1a}-1.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 39.7887ms 36.7886ms 27.1823 Ops/s 26.7737 Ops/s $\color{#35bf28}+1.53\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.2494ms 18.2900ms 54.6747 Ops/s 53.8822 Ops/s $\color{#35bf28}+1.47\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.1569ms 38.4699ms 25.9943 Ops/s 25.3663 Ops/s $\color{#35bf28}+2.48\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.3543ms 19.5896ms 51.0474 Ops/s 50.2016 Ops/s $\color{#35bf28}+1.68\%$
test_storage_write_lazystack[50-img_shape0-small] 0.9079ms 0.2227ms 4.4908 KOps/s 4.4414 KOps/s $\color{#35bf28}+1.11\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7217ms 1.3965ms 716.0649 Ops/s 700.1252 Ops/s $\color{#35bf28}+2.28\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.5868ms 2.3597ms 423.7912 Ops/s 424.8060 Ops/s $\color{#d91a1a}-0.24\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0997ms 2.9385ms 340.3125 Ops/s 339.2042 Ops/s $\color{#35bf28}+0.33\%$
test_storage_write_contiguous[50-img_shape0-small] 0.5700ms 0.1686ms 5.9308 KOps/s 6.0916 KOps/s $\color{#d91a1a}-2.64\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.4014ms 0.2349ms 4.2580 KOps/s 4.3134 KOps/s $\color{#d91a1a}-1.29\%$
test_storage_write_contiguous[100-img_shape2-large_img] 2.1351ms 1.8583ms 538.1373 Ops/s 547.1128 Ops/s $\color{#d91a1a}-1.64\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5215ms 1.3650ms 732.6133 Ops/s 740.4249 Ops/s $\color{#d91a1a}-1.06\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2934ms 1.1509ms 868.8569 Ops/s 869.8784 Ops/s $\color{#d91a1a}-0.12\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.7978ms 3.6060ms 277.3169 Ops/s 274.5306 Ops/s $\color{#35bf28}+1.01\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.1878ms 5.7911ms 172.6773 Ops/s 170.8681 Ops/s $\color{#35bf28}+1.06\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 15.1886ms 7.2115ms 138.6683 Ops/s 135.0828 Ops/s $\color{#35bf28}+2.65\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4384ms 0.2719ms 3.6776 KOps/s 3.6272 KOps/s $\color{#35bf28}+1.39\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6851ms 1.5090ms 662.6702 Ops/s 652.6415 Ops/s $\color{#35bf28}+1.54\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8214ms 2.4600ms 406.5076 Ops/s 402.3386 Ops/s $\color{#35bf28}+1.04\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3017ms 3.1574ms 316.7173 Ops/s 315.2451 Ops/s $\color{#35bf28}+0.47\%$
test_collector_without_rb[100-img_shape0-atari] 34.0399ms 33.6278ms 29.7373 Ops/s 29.5171 Ops/s $\color{#35bf28}+0.75\%$
test_collector_without_rb[200-img_shape1-large_batch] 67.6592ms 66.1943ms 15.1070 Ops/s 15.0745 Ops/s $\color{#35bf28}+0.22\%$
test_collector_with_rb[100-img_shape0-atari] 39.1168ms 38.2141ms 26.1684 Ops/s 25.9780 Ops/s $\color{#35bf28}+0.73\%$
test_collector_with_rb[200-img_shape1-large_batch] 75.4612ms 74.6226ms 13.4008 Ops/s 13.1601 Ops/s $\color{#35bf28}+1.83\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 58.0469ms 56.3712ms 17.7395 Ops/s 17.4614 Ops/s $\color{#35bf28}+1.59\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1114s 0.1113s 8.9873 Ops/s 8.8206 Ops/s $\color{#35bf28}+1.89\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 60.1740ms 58.1879ms 17.1857 Ops/s 17.1112 Ops/s $\color{#35bf28}+0.44\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1194s 0.1158s 8.6375 Ops/s 8.6566 Ops/s $\color{#d91a1a}-0.22\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Feature New feature Modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant