Skip to content

[Feature] AsyncBatchedCollector: backend params and performance optimizations#3511

Open
vmoens wants to merge 3 commits intogh/vmoens/242/basefrom
gh/vmoens/242/head
Open

[Feature] AsyncBatchedCollector: backend params and performance optimizations#3511
vmoens wants to merge 3 commits intogh/vmoens/242/basefrom
gh/vmoens/242/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Feb 16, 2026

Stack from ghstack (oldest at bottom):


  • Three-tier backend system: backend (global default), env_backend
    (env pool override), policy_backend (transport override), mirroring
    the device parameter pattern.
  • Lock-free SlotTransport: per-env slots with no shared lock, replacing
    ThreadingTransport as the default for in-process threading.
  • min_batch_size parameter for InferenceServer to accumulate requests.
  • Batch drain from result queue (get_nowait after first blocking get).
  • Remove redundant .copy() in ProcessorAsyncEnvPool._env_exec.

Co-authored-by: Cursor cursoragent@cursor.com

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 16, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3511

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 8c2309c with merge base 266e4aa (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Feb 16, 2026
…izations

- Three-tier backend system: `backend` (global default), `env_backend`
  (env pool override), `policy_backend` (transport override), mirroring
  the device parameter pattern.
- Lock-free SlotTransport: per-env slots with no shared lock, replacing
  ThreadingTransport as the default for in-process threading.
- min_batch_size parameter for InferenceServer to accumulate requests.
- Batch drain from result queue (get_nowait after first blocking get).
- Remove redundant .copy() in ProcessorAsyncEnvPool._env_exec.

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 58cc17b
Pull-Request: #3511
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 16, 2026
@github-actions github-actions bot added the Feature New feature label Feb 16, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 16, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 79.2115μs 77.9842μs 12.8231 KOps/s 12.6970 KOps/s $\color{#35bf28}+0.99\%$
test_tensor_to_bytestream_speed[torch.save] 0.1417ms 0.1397ms 7.1597 KOps/s 7.3139 KOps/s $\color{#d91a1a}-2.11\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1058s 0.1052s 9.5066 Ops/s 9.3113 Ops/s $\color{#35bf28}+2.10\%$
test_tensor_to_bytestream_speed[numpy] 2.5454μs 2.5362μs 394.2973 KOps/s 407.6992 KOps/s $\color{#d91a1a}-3.29\%$
test_tensor_to_bytestream_speed[safetensors] 38.9635μs 36.5686μs 27.3459 KOps/s 28.0423 KOps/s $\color{#d91a1a}-2.48\%$
test_simple 0.5350s 0.5332s 1.8754 Ops/s 1.7823 Ops/s $\textbf{\color{#35bf28}+5.22\%}$
test_transformed 1.0622s 1.0598s 0.9436 Ops/s 0.9137 Ops/s $\color{#35bf28}+3.26\%$
test_serial 1.6421s 1.6239s 0.6158 Ops/s 0.6057 Ops/s $\color{#35bf28}+1.67\%$
test_parallel 1.0119s 0.9965s 1.0036 Ops/s 0.9681 Ops/s $\color{#35bf28}+3.66\%$
test_step_mdp_speed[True-True-True-True-True] 0.2386ms 40.8660μs 24.4702 KOps/s 23.6145 KOps/s $\color{#35bf28}+3.62\%$
test_step_mdp_speed[True-True-True-True-False] 52.2700μs 22.8716μs 43.7223 KOps/s 43.4124 KOps/s $\color{#35bf28}+0.71\%$
test_step_mdp_speed[True-True-True-False-True] 71.6910μs 23.0151μs 43.4497 KOps/s 43.3024 KOps/s $\color{#35bf28}+0.34\%$
test_step_mdp_speed[True-True-True-False-False] 44.0310μs 12.7381μs 78.5045 KOps/s 79.2769 KOps/s $\color{#d91a1a}-0.97\%$
test_step_mdp_speed[True-True-False-True-True] 0.1316ms 43.8525μs 22.8037 KOps/s 22.6741 KOps/s $\color{#35bf28}+0.57\%$
test_step_mdp_speed[True-True-False-True-False] 56.2210μs 25.7860μs 38.7807 KOps/s 39.8193 KOps/s $\color{#d91a1a}-2.61\%$
test_step_mdp_speed[True-True-False-False-True] 56.1910μs 25.3932μs 39.3806 KOps/s 38.9824 KOps/s $\color{#35bf28}+1.02\%$
test_step_mdp_speed[True-True-False-False-False] 45.6210μs 15.1624μs 65.9526 KOps/s 65.1300 KOps/s $\color{#35bf28}+1.26\%$
test_step_mdp_speed[True-False-True-True-True] 81.6410μs 46.0547μs 21.7133 KOps/s 21.5023 KOps/s $\color{#35bf28}+0.98\%$
test_step_mdp_speed[True-False-True-True-False] 86.7920μs 27.8061μs 35.9633 KOps/s 35.4423 KOps/s $\color{#35bf28}+1.47\%$
test_step_mdp_speed[True-False-True-False-True] 52.6510μs 25.5630μs 39.1190 KOps/s 39.0787 KOps/s $\color{#35bf28}+0.10\%$
test_step_mdp_speed[True-False-True-False-False] 46.7900μs 15.5207μs 64.4299 KOps/s 65.1665 KOps/s $\color{#d91a1a}-1.13\%$
test_step_mdp_speed[True-False-False-True-True] 87.0010μs 49.2978μs 20.2849 KOps/s 20.4517 KOps/s $\color{#d91a1a}-0.82\%$
test_step_mdp_speed[True-False-False-True-False] 55.9810μs 30.5996μs 32.6801 KOps/s 32.1767 KOps/s $\color{#35bf28}+1.56\%$
test_step_mdp_speed[True-False-False-False-True] 59.9310μs 27.8750μs 35.8744 KOps/s 35.8139 KOps/s $\color{#35bf28}+0.17\%$
test_step_mdp_speed[True-False-False-False-False] 50.8510μs 17.2330μs 58.0283 KOps/s 56.5989 KOps/s $\color{#35bf28}+2.53\%$
test_step_mdp_speed[False-True-True-True-True] 93.9320μs 47.1912μs 21.1904 KOps/s 21.1856 KOps/s $\color{#35bf28}+0.02\%$
test_step_mdp_speed[False-True-True-True-False] 66.9710μs 28.0472μs 35.6542 KOps/s 35.3780 KOps/s $\color{#35bf28}+0.78\%$
test_step_mdp_speed[False-True-True-False-True] 2.6123ms 30.0248μs 33.3058 KOps/s 33.7948 KOps/s $\color{#d91a1a}-1.45\%$
test_step_mdp_speed[False-True-True-False-False] 48.5010μs 17.2549μs 57.9546 KOps/s 59.0429 KOps/s $\color{#d91a1a}-1.84\%$
test_step_mdp_speed[False-True-False-True-True] 86.6920μs 48.9473μs 20.4301 KOps/s 20.1016 KOps/s $\color{#35bf28}+1.63\%$
test_step_mdp_speed[False-True-False-True-False] 65.0410μs 30.2472μs 33.0610 KOps/s 32.8396 KOps/s $\color{#35bf28}+0.67\%$
test_step_mdp_speed[False-True-False-False-True] 59.7410μs 31.1280μs 32.1254 KOps/s 31.6438 KOps/s $\color{#35bf28}+1.52\%$
test_step_mdp_speed[False-True-False-False-False] 56.6410μs 19.3735μs 51.6169 KOps/s 51.9633 KOps/s $\color{#d91a1a}-0.67\%$
test_step_mdp_speed[False-False-True-True-True] 88.1310μs 52.3796μs 19.0914 KOps/s 19.1589 KOps/s $\color{#d91a1a}-0.35\%$
test_step_mdp_speed[False-False-True-True-False] 63.0410μs 33.6985μs 29.6749 KOps/s 30.4206 KOps/s $\color{#d91a1a}-2.45\%$
test_step_mdp_speed[False-False-True-False-True] 73.1120μs 30.6136μs 32.6652 KOps/s 31.3877 KOps/s $\color{#35bf28}+4.07\%$
test_step_mdp_speed[False-False-True-False-False] 46.7510μs 18.7581μs 53.3104 KOps/s 52.0909 KOps/s $\color{#35bf28}+2.34\%$
test_step_mdp_speed[False-False-False-True-True] 92.6520μs 52.6655μs 18.9878 KOps/s 18.7532 KOps/s $\color{#35bf28}+1.25\%$
test_step_mdp_speed[False-False-False-True-False] 66.5010μs 35.4081μs 28.2421 KOps/s 28.3480 KOps/s $\color{#d91a1a}-0.37\%$
test_step_mdp_speed[False-False-False-False-True] 64.4220μs 33.1148μs 30.1980 KOps/s 29.6989 KOps/s $\color{#35bf28}+1.68\%$
test_step_mdp_speed[False-False-False-False-False] 56.7210μs 21.3462μs 46.8467 KOps/s 45.8121 KOps/s $\color{#35bf28}+2.26\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8150s 0.7150s 1.3985 Ops/s 1.3765 Ops/s $\color{#35bf28}+1.60\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.6816s 0.5872s 1.7029 Ops/s 1.6759 Ops/s $\color{#35bf28}+1.62\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.6684s 1.5912s 0.6285 Ops/s 0.6251 Ops/s $\color{#35bf28}+0.53\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4537s 1.3731s 0.7283 Ops/s 0.7243 Ops/s $\color{#35bf28}+0.55\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9133s 1.8269s 0.5474 Ops/s 0.5393 Ops/s $\color{#35bf28}+1.50\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.6917s 1.6106s 0.6209 Ops/s 0.6117 Ops/s $\color{#35bf28}+1.49\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6539s 4.5567s 0.2195 Ops/s 0.2183 Ops/s $\color{#35bf28}+0.54\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5173s 4.3675s 0.2290 Ops/s 0.2269 Ops/s $\color{#35bf28}+0.90\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9734s 1.8490s 0.5408 Ops/s 0.5517 Ops/s $\color{#d91a1a}-1.97\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6682s 1.5619s 0.6402 Ops/s 0.6376 Ops/s $\color{#35bf28}+0.42\%$
test_values[generalized_advantage_estimate-True-True] 10.5528ms 10.3080ms 97.0124 Ops/s 95.0322 Ops/s $\color{#35bf28}+2.08\%$
test_values[vec_generalized_advantage_estimate-True-True] 20.0319ms 17.6528ms 56.6482 Ops/s 56.0074 Ops/s $\color{#35bf28}+1.14\%$
test_values[td0_return_estimate-False-False] 0.2113ms 0.1269ms 7.8810 KOps/s 7.5935 KOps/s $\color{#35bf28}+3.79\%$
test_values[td1_return_estimate-False-False] 29.3529ms 28.2736ms 35.3687 Ops/s 35.4054 Ops/s $\color{#d91a1a}-0.10\%$
test_values[vec_td1_return_estimate-False-False] 18.5514ms 17.5995ms 56.8197 Ops/s 56.4711 Ops/s $\color{#35bf28}+0.62\%$
test_values[td_lambda_return_estimate-True-False] 43.9023ms 41.6660ms 24.0004 Ops/s 23.6950 Ops/s $\color{#35bf28}+1.29\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.0395ms 17.6234ms 56.7426 Ops/s 56.1170 Ops/s $\color{#35bf28}+1.11\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.2100ms 9.0736ms 110.2093 Ops/s 109.3946 Ops/s $\color{#35bf28}+0.74\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7756ms 1.5356ms 651.2064 Ops/s 642.4394 Ops/s $\color{#35bf28}+1.36\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5951ms 0.4380ms 2.2830 KOps/s 2.3727 KOps/s $\color{#d91a1a}-3.78\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 34.9711ms 34.5446ms 28.9481 Ops/s 28.9266 Ops/s $\color{#35bf28}+0.07\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.1365ms 1.7329ms 577.0600 Ops/s 561.4934 Ops/s $\color{#35bf28}+2.77\%$
test_dqn_speed[False-None] 1.4976ms 1.3878ms 720.5796 Ops/s 709.1614 Ops/s $\color{#35bf28}+1.61\%$
test_dqn_speed[False-backward] 2.2536ms 1.9320ms 517.5889 Ops/s 525.6402 Ops/s $\color{#d91a1a}-1.53\%$
test_dqn_speed[True-None] 0.8083ms 0.5421ms 1.8448 KOps/s 1.7947 KOps/s $\color{#35bf28}+2.79\%$
test_dqn_speed[True-backward] 1.1132ms 1.0193ms 981.0974 Ops/s 861.4567 Ops/s $\textbf{\color{#35bf28}+13.89\%}$
test_dqn_speed[reduce-overhead-None] 0.9566ms 0.5375ms 1.8605 KOps/s 1.8266 KOps/s $\color{#35bf28}+1.86\%$
test_ddpg_speed[False-None] 3.2024ms 2.8472ms 351.2276 Ops/s 347.1918 Ops/s $\color{#35bf28}+1.16\%$
test_ddpg_speed[False-backward] 4.1309ms 4.0313ms 248.0600 Ops/s 243.0394 Ops/s $\color{#35bf28}+2.07\%$
test_ddpg_speed[True-None] 1.8496ms 1.4045ms 712.0172 Ops/s 711.3732 Ops/s $\color{#35bf28}+0.09\%$
test_ddpg_speed[True-backward] 2.4387ms 2.3912ms 418.2045 Ops/s 377.8507 Ops/s $\textbf{\color{#35bf28}+10.68\%}$
test_ddpg_speed[reduce-overhead-None] 1.8233ms 1.4041ms 712.1968 Ops/s 718.6938 Ops/s $\color{#d91a1a}-0.90\%$
test_sac_speed[False-None] 8.4713ms 7.9113ms 126.4020 Ops/s 127.8629 Ops/s $\color{#d91a1a}-1.14\%$
test_sac_speed[False-backward] 11.6910ms 11.2047ms 89.2481 Ops/s 90.3094 Ops/s $\color{#d91a1a}-1.18\%$
test_sac_speed[True-None] 2.5556ms 2.1620ms 462.5416 Ops/s 455.9840 Ops/s $\color{#35bf28}+1.44\%$
test_sac_speed[True-backward] 4.1895ms 4.0512ms 246.8405 Ops/s 221.9256 Ops/s $\textbf{\color{#35bf28}+11.23\%}$
test_sac_speed[reduce-overhead-None] 2.3358ms 2.1446ms 466.2837 Ops/s 461.0807 Ops/s $\color{#35bf28}+1.13\%$
test_redq_speed[False-None] 15.1476ms 10.4778ms 95.4397 Ops/s 94.3723 Ops/s $\color{#35bf28}+1.13\%$
test_redq_speed[False-backward] 21.0729ms 17.9142ms 55.8217 Ops/s 56.4207 Ops/s $\color{#d91a1a}-1.06\%$
test_redq_speed[True-None] 4.9637ms 4.4154ms 226.4807 Ops/s 221.3040 Ops/s $\color{#35bf28}+2.34\%$
test_redq_speed[True-backward] 10.3449ms 9.8422ms 101.6029 Ops/s 100.4487 Ops/s $\color{#35bf28}+1.15\%$
test_redq_speed[reduce-overhead-None] 5.0072ms 4.4036ms 227.0884 Ops/s 222.0870 Ops/s $\color{#35bf28}+2.25\%$
test_redq_deprec_speed[False-None] 11.3217ms 10.9335ms 91.4619 Ops/s 91.3156 Ops/s $\color{#35bf28}+0.16\%$
test_redq_deprec_speed[False-backward] 16.1288ms 15.8352ms 63.1504 Ops/s 63.2218 Ops/s $\color{#d91a1a}-0.11\%$
test_redq_deprec_speed[True-None] 3.8720ms 3.6684ms 272.5957 Ops/s 265.1238 Ops/s $\color{#35bf28}+2.82\%$
test_redq_deprec_speed[True-backward] 7.7662ms 7.5250ms 132.8911 Ops/s 124.9672 Ops/s $\textbf{\color{#35bf28}+6.34\%}$
test_redq_deprec_speed[reduce-overhead-None] 3.8687ms 3.6183ms 276.3732 Ops/s 279.4834 Ops/s $\color{#d91a1a}-1.11\%$
test_td3_speed[False-None] 8.2300ms 7.8917ms 126.7158 Ops/s 126.1692 Ops/s $\color{#35bf28}+0.43\%$
test_td3_speed[False-backward] 11.1024ms 10.7600ms 92.9366 Ops/s 92.7507 Ops/s $\color{#35bf28}+0.20\%$
test_td3_speed[True-None] 1.9113ms 1.8660ms 535.9102 Ops/s 539.8611 Ops/s $\color{#d91a1a}-0.73\%$
test_td3_speed[True-backward] 4.1885ms 3.7073ms 269.7396 Ops/s 272.0306 Ops/s $\color{#d91a1a}-0.84\%$
test_td3_speed[reduce-overhead-None] 1.8716ms 1.8095ms 552.6360 Ops/s 555.7492 Ops/s $\color{#d91a1a}-0.56\%$
test_cql_speed[False-None] 31.0065ms 26.3166ms 37.9988 Ops/s 39.7532 Ops/s $\color{#d91a1a}-4.41\%$
test_cql_speed[False-backward] 39.0827ms 35.5040ms 28.1658 Ops/s 28.8957 Ops/s $\color{#d91a1a}-2.53\%$
test_cql_speed[True-None] 15.3783ms 12.4995ms 80.0030 Ops/s 85.5663 Ops/s $\textbf{\color{#d91a1a}-6.50\%}$
test_cql_speed[True-backward] 19.1658ms 18.4686ms 54.1460 Ops/s 55.8333 Ops/s $\color{#d91a1a}-3.02\%$
test_cql_speed[reduce-overhead-None] 15.4089ms 12.6739ms 78.9026 Ops/s 81.7164 Ops/s $\color{#d91a1a}-3.44\%$
test_a2c_speed[False-None] 5.6665ms 5.4073ms 184.9355 Ops/s 194.9802 Ops/s $\textbf{\color{#d91a1a}-5.15\%}$
test_a2c_speed[False-backward] 12.6022ms 11.8898ms 84.1054 Ops/s 85.0148 Ops/s $\color{#d91a1a}-1.07\%$
test_a2c_speed[True-None] 3.9387ms 3.7063ms 269.8082 Ops/s 274.4100 Ops/s $\color{#d91a1a}-1.68\%$
test_a2c_speed[True-backward] 8.8021ms 8.5701ms 116.6846 Ops/s 120.4066 Ops/s $\color{#d91a1a}-3.09\%$
test_a2c_speed[reduce-overhead-None] 3.8838ms 3.7023ms 270.1051 Ops/s 288.9657 Ops/s $\textbf{\color{#d91a1a}-6.53\%}$
test_ppo_speed[False-None] 6.1152ms 5.9470ms 168.1532 Ops/s 177.0032 Ops/s $\color{#d91a1a}-5.00\%$
test_ppo_speed[False-backward] 12.9161ms 12.5942ms 79.4014 Ops/s 80.6236 Ops/s $\color{#d91a1a}-1.52\%$
test_ppo_speed[True-None] 3.7681ms 3.6244ms 275.9091 Ops/s 296.4822 Ops/s $\textbf{\color{#d91a1a}-6.94\%}$
test_ppo_speed[True-backward] 8.7813ms 8.4597ms 118.2074 Ops/s 122.2512 Ops/s $\color{#d91a1a}-3.31\%$
test_ppo_speed[reduce-overhead-None] 3.8114ms 3.6005ms 277.7405 Ops/s 296.1050 Ops/s $\textbf{\color{#d91a1a}-6.20\%}$
test_reinforce_speed[False-None] 4.8005ms 4.5501ms 219.7737 Ops/s 234.8828 Ops/s $\textbf{\color{#d91a1a}-6.43\%}$
test_reinforce_speed[False-backward] 7.5830ms 7.3459ms 136.1308 Ops/s 139.7494 Ops/s $\color{#d91a1a}-2.59\%$
test_reinforce_speed[True-None] 3.1354ms 2.8782ms 347.4382 Ops/s 342.9707 Ops/s $\color{#35bf28}+1.30\%$
test_reinforce_speed[True-backward] 8.1254ms 7.7760ms 128.6009 Ops/s 122.4248 Ops/s $\textbf{\color{#35bf28}+5.04\%}$
test_reinforce_speed[reduce-overhead-None] 3.1484ms 2.8325ms 353.0394 Ops/s 349.9170 Ops/s $\color{#35bf28}+0.89\%$
test_iql_speed[False-None] 25.0005ms 19.9892ms 50.0270 Ops/s 49.4987 Ops/s $\color{#35bf28}+1.07\%$
test_iql_speed[False-backward] 34.0581ms 30.1861ms 33.1279 Ops/s 33.2723 Ops/s $\color{#d91a1a}-0.43\%$
test_iql_speed[True-None] 8.7556ms 8.4950ms 117.7157 Ops/s 117.1216 Ops/s $\color{#35bf28}+0.51\%$
test_iql_speed[True-backward] 17.1770ms 16.7382ms 59.7435 Ops/s 60.0208 Ops/s $\color{#d91a1a}-0.46\%$
test_iql_speed[reduce-overhead-None] 8.8852ms 8.5616ms 116.8001 Ops/s 116.0667 Ops/s $\color{#35bf28}+0.63\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0859ms 5.9348ms 168.4982 Ops/s 167.6082 Ops/s $\color{#35bf28}+0.53\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.8784ms 0.2804ms 3.5664 KOps/s 3.1916 KOps/s $\textbf{\color{#35bf28}+11.74\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5435ms 0.2617ms 3.8210 KOps/s 3.4048 KOps/s $\textbf{\color{#35bf28}+12.23\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9824ms 5.6520ms 176.9297 Ops/s 176.1030 Ops/s $\color{#35bf28}+0.47\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.6171ms 0.3420ms 2.9242 KOps/s 2.7795 KOps/s $\textbf{\color{#35bf28}+5.21\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5989ms 0.3054ms 3.2746 KOps/s 2.9292 KOps/s $\textbf{\color{#35bf28}+11.79\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6427ms 1.3810ms 724.1134 Ops/s 700.6756 Ops/s $\color{#35bf28}+3.35\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4832ms 1.2707ms 786.9951 Ops/s 737.0010 Ops/s $\textbf{\color{#35bf28}+6.78\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 12.3340ms 5.9457ms 168.1885 Ops/s 171.2893 Ops/s $\color{#d91a1a}-1.81\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8759ms 0.4836ms 2.0677 KOps/s 2.0699 KOps/s $\color{#d91a1a}-0.11\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7864ms 0.4673ms 2.1401 KOps/s 2.1303 KOps/s $\color{#35bf28}+0.46\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.8902ms 5.6357ms 177.4405 Ops/s 176.1011 Ops/s $\color{#35bf28}+0.76\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7006ms 0.3770ms 2.6529 KOps/s 2.7532 KOps/s $\color{#d91a1a}-3.64\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6304ms 0.3690ms 2.7102 KOps/s 2.8651 KOps/s $\textbf{\color{#d91a1a}-5.41\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9219ms 5.6817ms 176.0034 Ops/s 175.3644 Ops/s $\color{#35bf28}+0.36\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.2956ms 0.3772ms 2.6513 KOps/s 2.7451 KOps/s $\color{#d91a1a}-3.42\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5507ms 0.3640ms 2.7474 KOps/s 2.8922 KOps/s $\textbf{\color{#d91a1a}-5.01\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0110ms 5.8575ms 170.7222 Ops/s 168.5743 Ops/s $\color{#35bf28}+1.27\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0415ms 0.5244ms 1.9069 KOps/s 1.9641 KOps/s $\color{#d91a1a}-2.91\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7411ms 0.5098ms 1.9617 KOps/s 2.0251 KOps/s $\color{#d91a1a}-3.13\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.3268ms 4.9186ms 203.3114 Ops/s 199.7045 Ops/s $\color{#35bf28}+1.81\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 9.5305ms 2.1537ms 464.3206 Ops/s 499.1528 Ops/s $\textbf{\color{#d91a1a}-6.98\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.2687ms 0.9123ms 1.0961 KOps/s 1.1313 KOps/s $\color{#d91a1a}-3.11\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5538s 16.0104ms 62.4595 Ops/s 57.5349 Ops/s $\textbf{\color{#35bf28}+8.56\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.8743ms 1.7683ms 565.4996 Ops/s 528.9605 Ops/s $\textbf{\color{#35bf28}+6.91\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.5418ms 1.1998ms 833.4818 Ops/s 1.0994 KOps/s $\textbf{\color{#d91a1a}-24.19\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.5948ms 5.1791ms 193.0853 Ops/s 190.0320 Ops/s $\color{#35bf28}+1.61\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 12.9018ms 2.0386ms 490.5397 Ops/s 521.8464 Ops/s $\textbf{\color{#d91a1a}-6.00\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.4184ms 1.0555ms 947.3914 Ops/s 935.7781 Ops/s $\color{#35bf28}+1.24\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 37.7627ms 35.4955ms 28.1726 Ops/s 27.1692 Ops/s $\color{#35bf28}+3.69\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.7715ms 18.1132ms 55.2084 Ops/s 54.7486 Ops/s $\color{#35bf28}+0.84\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 39.5102ms 36.6806ms 27.2624 Ops/s 26.3786 Ops/s $\color{#35bf28}+3.35\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.8729ms 18.3694ms 54.4382 Ops/s 52.4847 Ops/s $\color{#35bf28}+3.72\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.2743ms 38.1797ms 26.1920 Ops/s 25.6063 Ops/s $\color{#35bf28}+2.29\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.0135ms 19.9076ms 50.2320 Ops/s 49.4764 Ops/s $\color{#35bf28}+1.53\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8642ms 0.2141ms 4.6711 KOps/s 4.5396 KOps/s $\color{#35bf28}+2.90\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7362ms 1.3758ms 726.8560 Ops/s 716.6511 Ops/s $\color{#35bf28}+1.42\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7244ms 2.3070ms 433.4598 Ops/s 430.2206 Ops/s $\color{#35bf28}+0.75\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.2964ms 2.9335ms 340.8896 Ops/s 341.4891 Ops/s $\color{#d91a1a}-0.18\%$
test_storage_write_contiguous[50-img_shape0-small] 0.4269ms 0.1309ms 7.6366 KOps/s 7.4973 KOps/s $\color{#35bf28}+1.86\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3454ms 0.1886ms 5.3027 KOps/s 5.1524 KOps/s $\color{#35bf28}+2.92\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9850ms 1.7382ms 575.2968 Ops/s 579.8405 Ops/s $\color{#d91a1a}-0.78\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5352ms 1.3046ms 766.5144 Ops/s 779.6783 Ops/s $\color{#d91a1a}-1.69\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2389ms 1.0937ms 914.3609 Ops/s 917.7867 Ops/s $\color{#d91a1a}-0.37\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.7143ms 3.4673ms 288.4123 Ops/s 286.8234 Ops/s $\color{#35bf28}+0.55\%$
test_collector_stack_then_write[100-img_shape2-large_img] 10.0676ms 5.5834ms 179.1018 Ops/s 175.9225 Ops/s $\color{#35bf28}+1.81\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 14.8340ms 6.9004ms 144.9199 Ops/s 139.9969 Ops/s $\color{#35bf28}+3.52\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4536ms 0.2738ms 3.6524 KOps/s 3.6406 KOps/s $\color{#35bf28}+0.33\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6523ms 1.4839ms 673.9178 Ops/s 665.2571 Ops/s $\color{#35bf28}+1.30\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8315ms 2.4118ms 414.6199 Ops/s 409.8002 Ops/s $\color{#35bf28}+1.18\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3199ms 3.1283ms 319.6604 Ops/s 321.7141 Ops/s $\color{#d91a1a}-0.64\%$
test_collector_without_rb[100-img_shape0-atari] 32.8951ms 32.4435ms 30.8228 Ops/s 20.2428 Ops/s $\textbf{\color{#35bf28}+52.27\%}$
test_collector_without_rb[200-img_shape1-large_batch] 64.0047ms 63.7554ms 15.6850 Ops/s 15.4532 Ops/s $\color{#35bf28}+1.50\%$
test_collector_with_rb[100-img_shape0-atari] 37.6856ms 36.9439ms 27.0681 Ops/s 26.9864 Ops/s $\color{#35bf28}+0.30\%$
test_collector_with_rb[200-img_shape1-large_batch] 72.3349ms 71.8573ms 13.9165 Ops/s 13.8487 Ops/s $\color{#35bf28}+0.49\%$

@github-actions
Copy link
Contributor

github-actions bot commented Feb 16, 2026

Result of GPU Benchmark Tests

Expand to view detailed results
Name Max Mean Ops
test_tensor_to_bytestream_speed[pickle] 81.9909μs 81.0066μs 12.3447 KOps/s
test_tensor_to_bytestream_speed[torch.save] 0.1402ms 0.1397ms 7.1557 KOps/s
test_tensor_to_bytestream_speed[untyped_storage] 0.1105s 0.1103s 9.0676 Ops/s
test_tensor_to_bytestream_speed[numpy] 2.4306μs 2.4260μs 412.2071 KOps/s
test_tensor_to_bytestream_speed[safetensors] 39.5149μs 38.0331μs 26.2929 KOps/s
test_simple 0.8104s 0.8055s 1.2414 Ops/s
test_transformed 1.3679s 1.3674s 0.7313 Ops/s
test_serial 2.2706s 2.2688s 0.4408 Ops/s
test_parallel 1.8983s 1.8115s 0.5520 Ops/s
test_step_mdp_speed[True-True-True-True-True] 0.2693ms 42.3107μs 23.6347 KOps/s
test_step_mdp_speed[True-True-True-True-False] 69.9210μs 23.7062μs 42.1831 KOps/s
test_step_mdp_speed[True-True-True-False-True] 58.8210μs 23.4857μs 42.5790 KOps/s
test_step_mdp_speed[True-True-True-False-False] 50.8910μs 13.1312μs 76.1546 KOps/s
test_step_mdp_speed[True-True-False-True-True] 80.8810μs 45.6721μs 21.8952 KOps/s
test_step_mdp_speed[True-True-False-True-False] 57.4820μs 25.8257μs 38.7212 KOps/s
test_step_mdp_speed[True-True-False-False-True] 59.0310μs 26.5154μs 37.7139 KOps/s
test_step_mdp_speed[True-True-False-False-False] 42.3710μs 15.8550μs 63.0714 KOps/s
test_step_mdp_speed[True-False-True-True-True] 83.4710μs 47.9404μs 20.8592 KOps/s
test_step_mdp_speed[True-False-True-True-False] 58.5310μs 29.6116μs 33.7705 KOps/s
test_step_mdp_speed[True-False-True-False-True] 73.9520μs 26.4789μs 37.7659 KOps/s
test_step_mdp_speed[True-False-True-False-False] 56.2210μs 15.6750μs 63.7960 KOps/s
test_step_mdp_speed[True-False-False-True-True] 87.4710μs 50.5661μs 19.7761 KOps/s
test_step_mdp_speed[True-False-False-True-False] 64.0810μs 31.2760μs 31.9734 KOps/s
test_step_mdp_speed[True-False-False-False-True] 66.6410μs 29.0620μs 34.4092 KOps/s
test_step_mdp_speed[True-False-False-False-False] 45.9400μs 18.4732μs 54.1324 KOps/s
test_step_mdp_speed[False-True-True-True-True] 81.0120μs 47.9749μs 20.8442 KOps/s
test_step_mdp_speed[False-True-True-True-False] 57.6010μs 29.0245μs 34.4537 KOps/s
test_step_mdp_speed[False-True-True-False-True] 2.4539ms 30.5022μs 32.7845 KOps/s
test_step_mdp_speed[False-True-True-False-False] 54.1610μs 17.8090μs 56.1515 KOps/s
test_step_mdp_speed[False-True-False-True-True] 87.6820μs 49.9580μs 20.0168 KOps/s
test_step_mdp_speed[False-True-False-True-False] 67.1120μs 31.2298μs 32.0207 KOps/s
test_step_mdp_speed[False-True-False-False-True] 67.9520μs 33.4068μs 29.9341 KOps/s
test_step_mdp_speed[False-True-False-False-False] 51.6920μs 19.9860μs 50.0350 KOps/s
test_step_mdp_speed[False-False-True-True-True] 90.7020μs 54.2566μs 18.4309 KOps/s
test_step_mdp_speed[False-False-True-True-False] 73.7510μs 34.6215μs 28.8838 KOps/s
test_step_mdp_speed[False-False-True-False-True] 63.8110μs 32.7232μs 30.5593 KOps/s
test_step_mdp_speed[False-False-True-False-False] 50.2410μs 20.0934μs 49.7676 KOps/s
test_step_mdp_speed[False-False-False-True-True] 96.1520μs 55.3875μs 18.0546 KOps/s
test_step_mdp_speed[False-False-False-True-False] 70.7420μs 36.3503μs 27.5101 KOps/s
test_step_mdp_speed[False-False-False-False-True] 73.6810μs 34.8829μs 28.6673 KOps/s
test_step_mdp_speed[False-False-False-False-False] 55.4510μs 22.0200μs 45.4133 KOps/s
test_non_tensor_env_rollout_speed[1000-single-True] 0.8418s 0.7419s 1.3480 Ops/s
test_non_tensor_env_rollout_speed[1000-single-False] 0.7054s 0.6060s 1.6503 Ops/s
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7104s 1.6334s 0.6122 Ops/s
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4879s 1.4080s 0.7102 Ops/s
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9546s 1.8750s 0.5333 Ops/s
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7282s 1.6451s 0.6079 Ops/s
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7143s 4.6095s 0.2169 Ops/s
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5756s 4.3938s 0.2276 Ops/s
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9285s 1.8499s 0.5406 Ops/s
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7604s 1.6212s 0.6168 Ops/s
test_values[generalized_advantage_estimate-True-True] 20.6478ms 20.1696ms 49.5795 Ops/s
test_values[vec_generalized_advantage_estimate-True-True] 0.1578s 4.0787ms 245.1764 Ops/s
test_values[td0_return_estimate-False-False] 0.1102ms 83.6412μs 11.9558 KOps/s
test_values[td1_return_estimate-False-False] 48.7407ms 47.8571ms 20.8955 Ops/s
test_values[vec_td1_return_estimate-False-False] 1.3152ms 1.0988ms 910.1138 Ops/s
test_values[td_lambda_return_estimate-True-False] 78.6114ms 77.8795ms 12.8403 Ops/s
test_values[vec_td_lambda_return_estimate-True-False] 1.2375ms 1.0926ms 915.2170 Ops/s
test_gae_speed[generalized_advantage_estimate-False-1-512] 20.7871ms 20.5649ms 48.6264 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0710ms 0.7938ms 1.2598 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8360ms 0.6999ms 1.4288 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5904ms 1.4983ms 667.4364 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7687ms 0.7194ms 1.3900 KOps/s
test_dqn_speed[False-None] 1.6536ms 1.5481ms 645.9650 Ops/s
test_dqn_speed[False-backward] 2.4976ms 2.1897ms 456.6766 Ops/s
test_dqn_speed[True-None] 0.6448ms 0.5789ms 1.7276 KOps/s
test_dqn_speed[True-backward] 1.2539ms 1.2085ms 827.4849 Ops/s
test_dqn_speed[reduce-overhead-None] 0.6405ms 0.5826ms 1.7163 KOps/s
test_ddpg_speed[False-None] 3.2668ms 2.9029ms 344.4860 Ops/s
test_ddpg_speed[False-backward] 4.5325ms 4.3064ms 232.2106 Ops/s
test_ddpg_speed[True-None] 1.3820ms 1.3098ms 763.4475 Ops/s
test_ddpg_speed[True-backward] 2.5539ms 2.5139ms 397.7913 Ops/s
test_ddpg_speed[reduce-overhead-None] 1.4010ms 1.3347ms 749.2583 Ops/s
test_sac_speed[False-None] 8.7247ms 8.2961ms 120.5392 Ops/s
test_sac_speed[False-backward] 12.2707ms 11.6387ms 85.9200 Ops/s
test_sac_speed[True-None] 2.1445ms 1.8108ms 552.2333 Ops/s
test_sac_speed[True-backward] 3.7064ms 3.5711ms 280.0233 Ops/s
test_sac_speed[reduce-overhead-None] 19.2855ms 10.9404ms 91.4041 Ops/s
test_redq_deprec_speed[False-None] 9.9642ms 9.3506ms 106.9450 Ops/s
test_redq_deprec_speed[False-backward] 13.2125ms 12.7874ms 78.2018 Ops/s
test_redq_deprec_speed[True-None] 2.6506ms 2.5375ms 394.0824 Ops/s
test_redq_deprec_speed[True-backward] 4.6862ms 4.2957ms 232.7894 Ops/s
test_redq_deprec_speed[reduce-overhead-None] 15.7802ms 9.7774ms 102.2768 Ops/s
test_td3_speed[False-None] 8.3918ms 8.2048ms 121.8794 Ops/s
test_td3_speed[False-backward] 11.3363ms 10.8866ms 91.8561 Ops/s
test_td3_speed[True-None] 1.7159ms 1.6392ms 610.0580 Ops/s
test_td3_speed[True-backward] 3.3274ms 3.2316ms 309.4466 Ops/s
test_td3_speed[reduce-overhead-None] 86.7240ms 24.5576ms 40.7207 Ops/s
test_cql_speed[False-None] 17.5519ms 17.2822ms 57.8631 Ops/s
test_cql_speed[False-backward] 23.7608ms 22.9791ms 43.5178 Ops/s
test_cql_speed[True-None] 3.3063ms 3.2512ms 307.5831 Ops/s
test_cql_speed[True-backward] 5.9639ms 5.5461ms 180.3080 Ops/s
test_cql_speed[reduce-overhead-None] 19.2483ms 11.9802ms 83.4709 Ops/s
test_a2c_speed[False-None] 4.0885ms 3.2483ms 307.8493 Ops/s
test_a2c_speed[False-backward] 6.8801ms 6.4418ms 155.2370 Ops/s
test_a2c_speed[True-None] 1.4207ms 1.3321ms 750.7176 Ops/s
test_a2c_speed[True-backward] 3.1847ms 3.0873ms 323.9044 Ops/s
test_a2c_speed[reduce-overhead-None] 1.0226ms 0.9731ms 1.0277 KOps/s
test_ppo_speed[False-None] 4.1359ms 3.8627ms 258.8887 Ops/s
test_ppo_speed[False-backward] 7.6485ms 7.2237ms 138.4331 Ops/s
test_ppo_speed[True-None] 1.4687ms 1.4154ms 706.5025 Ops/s
test_ppo_speed[True-backward] 3.2806ms 3.2389ms 308.7470 Ops/s
test_ppo_speed[reduce-overhead-None] 1.2370ms 1.0548ms 948.0244 Ops/s
test_reinforce_speed[False-None] 2.4160ms 2.3161ms 431.7658 Ops/s
test_reinforce_speed[False-backward] 3.8323ms 3.4642ms 288.6647 Ops/s
test_reinforce_speed[True-None] 1.3717ms 1.2917ms 774.1582 Ops/s
test_reinforce_speed[True-backward] 3.0995ms 3.0353ms 329.4568 Ops/s
test_reinforce_speed[reduce-overhead-None] 0.4402s 10.4945ms 95.2879 Ops/s
test_iql_speed[False-None] 10.1055ms 9.4978ms 105.2881 Ops/s
test_iql_speed[False-backward] 14.0059ms 13.5680ms 73.7028 Ops/s
test_iql_speed[True-None] 2.2385ms 2.1753ms 459.6989 Ops/s
test_iql_speed[True-backward] 5.6311ms 4.8717ms 205.2668 Ops/s
test_iql_speed[reduce-overhead-None] 18.2094ms 10.5459ms 94.8237 Ops/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.4171ms 5.9677ms 167.5680 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.1314ms 0.3207ms 3.1179 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6009ms 0.3085ms 3.2411 KOps/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9615ms 5.7622ms 173.5458 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.4546ms 0.2776ms 3.6022 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5499ms 0.2980ms 3.3554 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5370ms 1.2999ms 769.2734 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4302ms 1.2006ms 832.9180 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1267ms 5.9148ms 169.0660 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1565ms 0.5237ms 1.9095 KOps/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8243ms 0.5112ms 1.9561 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1171ms 5.7646ms 173.4730 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0585ms 0.2836ms 3.5264 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5383ms 0.2644ms 3.7815 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0849ms 5.7628ms 173.5275 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.2478ms 0.2929ms 3.4136 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4890ms 0.2609ms 3.8329 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2164ms 6.0053ms 166.5206 Ops/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.4301ms 0.4304ms 2.3235 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6324ms 0.4144ms 2.4131 KOps/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.3493ms 4.9468ms 202.1523 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 11.3277ms 2.3263ms 429.8597 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.1805ms 0.9737ms 1.0270 KOps/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5808s 16.5520ms 60.4155 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9969ms 1.8960ms 527.4160 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.1096ms 1.1300ms 884.9504 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 8.1601ms 5.2919ms 188.9665 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.2599ms 1.9939ms 501.5272 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.2783ms 1.0716ms 933.1426 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 37.9170ms 35.5220ms 28.1516 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.4573ms 18.0296ms 55.4644 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 41.0124ms 37.1390ms 26.9259 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.3827ms 18.3482ms 54.5014 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.4780ms 38.8295ms 25.7536 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.4674ms 20.0132ms 49.9671 Ops/s
test_storage_write_lazystack[50-img_shape0-small] 0.7605ms 0.2138ms 4.6773 KOps/s
test_storage_write_lazystack[100-img_shape1-atari] 1.8531ms 1.3858ms 721.5965 Ops/s
test_storage_write_lazystack[100-img_shape2-large_img] 2.8025ms 2.3425ms 426.8866 Ops/s
test_storage_write_lazystack[200-img_shape3-large_batch] 3.3126ms 2.9082ms 343.8592 Ops/s
test_storage_write_contiguous[50-img_shape0-small] 0.2411ms 0.1613ms 6.1999 KOps/s
test_storage_write_contiguous[100-img_shape1-atari] 0.3844ms 0.2253ms 4.4395 KOps/s
test_storage_write_contiguous[100-img_shape2-large_img] 2.3777ms 1.8473ms 541.3355 Ops/s
test_storage_write_contiguous[200-img_shape3-large_batch] 1.8171ms 1.4071ms 710.6585 Ops/s
test_collector_stack_then_write[50-img_shape0-small] 1.2017ms 1.1521ms 867.9906 Ops/s
test_collector_stack_then_write[100-img_shape1-atari] 4.1171ms 3.6842ms 271.4276 Ops/s
test_collector_stack_then_write[100-img_shape2-large_img] 6.4172ms 5.8877ms 169.8457 Ops/s
test_collector_stack_then_write[200-img_shape3-large_batch] 7.4829ms 7.0803ms 141.2362 Ops/s
test_collector_lazystack_then_write[50-img_shape0-small] 0.5072s 0.4971ms 2.0118 KOps/s
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7820ms 1.5670ms 638.1522 Ops/s
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.7456ms 2.4925ms 401.2067 Ops/s
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.6662ms 3.2331ms 309.2966 Ops/s
test_collector_without_rb[100-img_shape0-atari] 33.5679ms 32.9395ms 30.3587 Ops/s
test_collector_without_rb[200-img_shape1-large_batch] 65.3111ms 64.5181ms 15.4995 Ops/s
test_collector_with_rb[100-img_shape0-atari] 37.8623ms 37.3365ms 26.7835 Ops/s
test_collector_with_rb[200-img_shape1-large_batch] 72.8572ms 72.4099ms 13.8103 Ops/s
test_collector_without_rb_cuda[100-img_shape0-atari] 55.6495ms 55.2276ms 18.1069 Ops/s
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1104s 0.1100s 9.0895 Ops/s
test_collector_with_rb_cuda[100-img_shape0-atari] 57.8526ms 57.2914ms 17.4546 Ops/s
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1146s 0.1142s 8.7562 Ops/s

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 16, 2026
…izations

- Three-tier backend system: `backend` (global default), `env_backend`
  (env pool override), `policy_backend` (transport override), mirroring
  the device parameter pattern.
- Lock-free SlotTransport: per-env slots with no shared lock, replacing
  ThreadingTransport as the default for in-process threading.
- min_batch_size parameter for InferenceServer to accumulate requests.
- Batch drain from result queue (get_nowait after first blocking get).
- Remove redundant .copy() in ProcessorAsyncEnvPool._env_exec.

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 5b0282d
Pull-Request: #3511
Co-authored-by: Cursor <cursoragent@cursor.com>
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 17, 2026
…izations

- Three-tier backend system: `backend` (global default), `env_backend`
  (env pool override), `policy_backend` (transport override), mirroring
  the device parameter pattern.
- Lock-free SlotTransport: per-env slots with no shared lock, replacing
  ThreadingTransport as the default for in-process threading.
- min_batch_size parameter for InferenceServer to accumulate requests.
- Batch drain from result queue (get_nowait after first blocking get).
- Remove redundant .copy() in ProcessorAsyncEnvPool._env_exec.

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: d7fc567
Pull-Request: #3511
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Benchmarks rl/benchmark changes CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors Examples Feature New feature Modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant