[WIP][data] feat: TransferQueue - integrate TransferQueue into main codebase #4987

0oshowero0 · 2026-01-20T02:05:55Z

What does this PR do?

This PR integrates the experimental TransferQueue feature into the main codebase.

Key Changes

1. New Trainer Implementation

Added RayPPOTrainerTransferQueue which inherits from RayPPOTrainer.
Implemented BatchMeta-based operations to replace the distribution of real data objects (DataProto/TensorDict).

2. Configuration

Updated ppo_trainer.yaml and ppo_megatron_trainer.yaml to include TransferQueue-related configurations.

3. Integration & Entry Point

Modified RayPPOTrainerTransferQueue to adapt to the new features.
Added switch logic in main_ppo.py to enable the TransferQueue via config.\
Update CI & scripts of TQ

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

…#4902) ### What does this PR do? Unify the return values of the reward function to make the logic clearer ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) - [x] If your PR is related to the `recipe` submodule, please also update the reference to the submodule commit via `git submodule update --remote` or `cd recipe && git pull origin main`. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

there are different _balance_batch funcs for ray_trainer with and without tq

dataproto.from_tensordict requires at least 1 tensor inside the tensordict, however, currently for generation, there is no tensor in the gen_batch, therefore I relax the restriction to enable tqbridge

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

@register

1. pass tq config to worker config when enabling TQ, note that currently only ray_trainer_tq has been modified, and later ray_trainer needs to be updated too 2. change reward_extra_keys type from list to set 3. in tqbridge, when converting batchmeta to tensordict, several special meta_info keys need to be set as nontensordata even though their datatypes are list (compared with prints from non-tq version) 4. @register decorator bug fix

check the original code and find that it calls: dataproto.to_tensordict(): ... output = tu.get_tensordict(tensor_dict=tensor_batch, non_tensor_dict=self.meta_info) tu.get_tensordict(): ... for key, val in non_tensor_dict.items(): # non_tensor_dict is meta_info assert key not in tensor_dict tensor_dict[key] = NonTensorData(val) by now, the meta_info will only be turned into NonTensorData only need to pay attention to tu.assign_non_tensor(batch_td, xxx) in compute_yy funcs and make sure our modification aligns

gemini-code-assist

Code Review

This pull request integrates the experimental TransferQueue feature into the main codebase, which is a significant refactoring. The changes involve moving and deleting several files, updating configurations, and modifying the training loop to support BatchMeta-based operations. My review focuses on critical correctness and performance issues. I've identified a critical bug that could lead to a crash in the training loop and a high-severity performance issue related to inefficient data fetching from the TransferQueue. Addressing these points will improve the stability and performance of the new feature.

verl/trainer/ppo/ray_trainer_tq.py

verl/experimental/agent_loop/agent_loop.py

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

Copilot

Pull request overview

This PR integrates the experimental TransferQueue feature into the main codebase, enabling efficient data transfer in distributed PPO training. The integration includes a new trainer implementation, configuration updates, and modifications to support BatchMeta-based operations.

Changes:

Added RayPPOTrainerTransferQueue as a new trainer implementation inheriting from RayPPOTrainer
Updated configuration files to include TransferQueue-related settings (storage backend, batch management)
Modified entry point (main_ppo.py) to conditionally enable TransferQueue based on configuration
Integrated TransferQueue client creation in worker initialization and decorator functions

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`verl/trainer/ppo/ray_trainer_tq.py`	New trainer implementation with TransferQueue support
`verl/trainer/main_ppo.py`	Added conditional logic to select trainer based on TransferQueue config
`verl/utils/transferqueue_utils.py`	Enhanced utility functions for BatchMeta/TensorDict conversion
`verl/workers/engine_workers.py`	Added TransferQueue client initialization in workers
`verl/single_controller/base/decorator.py`	Extended register decorator with TransferQueue parameters
`verl/workers/config/engine.py`	Added TransferQueueConfig dataclass
Configuration files	Updated YAML configs with TransferQueue settings
CI/Scripts	Updated test scripts and workflow to use new entry point

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-20T08:54:19Z

verl/trainer/ppo/ray_trainer_tq.py

+                        )
+                        batch_meta = batch_meta.union(compute_advantage_output_meta)
+
+                        if "resampled_idx" in batch_meta.field_names and self.config.transferqueue.enable:


In line 1094, there's a typo in the field name check: 'resampled_idx' should be 'pf_ppo_reweight_idx' based on the field being selected in line 1095. This inconsistency will cause the condition to never be True.

Suggested change

if "resampled_idx" in batch_meta.field_names and self.config.transferqueue.enable:

if "pf_ppo_reweight_idx" in batch_meta.field_names and self.config.transferqueue.enable:

verl/trainer/ppo/ray_trainer_tq.py

verl/utils/tensordict_utils.py

verl/workers/config/engine.py

verl/single_controller/base/decorator.py

verl/trainer/main_ppo.py

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

1. convert lists to tuples in tqbridge 2. convert tuples back as lists in decorator

in order to avoid reorder/resample of the full data, the core_algos.compute_pf_ppo_reweight_data is replaced with core_algos.compute_pf_ppo_reweight_data_tq and accordingly, the compute_advantage function is also modified a bit for tq version

1. add TransferQueueConfig class and update TrainingWorkerConfig class 2. create TransferQueueConfig from self.config in rayppotrainertq and trainingworker

…sordict all DataProto.from_xxx methods require batch_size from tensor data, however, before generation, only non-tensor data exist in tensordict, which causes DataProto.check_consistency issue. To avoid this problem, we write a func from_tensordict_without_tensor and apply it when all data are non tensors.

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

forgot to update this part

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

0oshowero0 and others added 28 commits January 15, 2026 16:03

before cherry-pick

aa8d30d

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

adapt recipe ray_trainer with transferqueue

b91c5b4

adapt workers with TQ

dd63277

merge TQ with main branch

aa5d2fb

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix redundant methods and CI

4a59b93

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

add missing import and val_data_size

cb976bc

fix missing import+1 and val_data_size

2cc8afb

fix import

7d64a06

cherry pick verl-project#4928

0fa9263

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

create tq client sync=True

ef9dfe4

fix _validate() to track new ray_trainer

33cf600

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix fit()

723c235

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

adapt verl-project#4902

6c686d9

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

add TODO

56a194d

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

update _balance_batch func in tq ray trainer

16b7dfe

there are different _balance_batch funcs for ray_trainer with and without tq

dataproto.from_tensordict

87d86b7

dataproto.from_tensordict requires at least 1 tensor inside the tensordict, however, currently for generation, there is no tensor in the gen_batch, therefore I relax the restriction to enable tqbridge

fix agent_loop calculate metrics

da1e0c8

remove todo

010d862

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix

e190410

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix tensordict utils

823a485

still fix tensordict utils

5ee52c4

update comments

76b2cf2

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

remove manual set pypi mirror

236f28d

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix _async_update_batchmeta_with_output

9938228

fix select_fields bug in ray_trainer_tq

c07d213

gemini-code-assist bot reviewed Jan 20, 2026

View reviewed changes

verl/trainer/ppo/ray_trainer_tq.py Outdated Show resolved Hide resolved

verl/experimental/agent_loop/agent_loop.py Outdated Show resolved Hide resolved

remove redundant tqbridge

f1e67c8

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

0oshowero0 added 8 commits January 20, 2026 15:23

fix tq enable switch

9fcbae0

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

optimize agentloop _performance_metrics

3724017

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

optimize config

a01a925

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

remove redundant tqbridge

00e7a37

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix

46104c9

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix

7e1e053

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix pre-commit check

30bc7b5

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

revert DataProto changes

a2ce389

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

0oshowero0 requested a review from Copilot January 20, 2026 08:50

Copilot started reviewing on behalf of 0oshowero0 January 20, 2026 08:50 View session

Copilot AI reviewed Jan 20, 2026

View reviewed changes

0oshowero0 and others added 18 commits January 20, 2026 17:24

fix missing full_batch_meta

10d05e4

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix minor bugs

38d89a2

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix sanity check

5a76cd3

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix concating list-type extra_info errors for batchmeta

c1224be

1. convert lists to tuples in tqbridge 2. convert tuples back as lists in decorator

update compute_advantage

ba9bc34

in order to avoid reorder/resample of the full data, the core_algos.compute_pf_ppo_reweight_data is replaced with core_algos.compute_pf_ppo_reweight_data_tq and accordingly, the compute_advantage function is also modified a bit for tq version

fix trainingworker create tq client

9589ba4

1. add TransferQueueConfig class and update TrainingWorkerConfig class 2. create TransferQueueConfig from self.config in rayppotrainertq and trainingworker

_update_actor returns DataProto with only meta_info

f7a1d64

fix update_critic following update_actor

6c50e46

fix missing import & apply pre-commit check

54d7495

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

agent loop metrics fix

7aba675

add naive engine worker TQ adaptation

f13f9b4

add engine worker and agent loop worker TODO

b4b824b

update todo

8705623

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

remove _balance_batch in RayPPOTrainerTransferQueue

53d67a5

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix dataproto creation bug in tq utils

ce6ef3a

fix ray_trainer old log prob

e197946

forgot to update this part

try: put single request to TQ without padding

8d1782b

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

wuxibin89 mentioned this pull request Jan 26, 2026

[roadmap] verl Q1 roadmap #4880

Open

30 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][data] feat: TransferQueue - integrate TransferQueue into main codebase #4987

[WIP][data] feat: TransferQueue - integrate TransferQueue into main codebase #4987

Uh oh!

0oshowero0 commented Jan 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	if "resampled_idx" in batch_meta.field_names and self.config.transferqueue.enable:
	if "pf_ppo_reweight_idx" in batch_meta.field_names and self.config.transferqueue.enable:

[WIP][data] feat: TransferQueue - integrate TransferQueue into main codebase #4987

Are you sure you want to change the base?

[WIP][data] feat: TransferQueue - integrate TransferQueue into main codebase #4987

Uh oh!

Conversation

0oshowero0 commented Jan 20, 2026

What does this PR do?

Key Changes

1. New Trainer Implementation

2. Configuration

3. Integration & Entry Point

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants