Ideas to improve the efficiency

Some thoughts for improvements

https://github.com/sail-sg/oat/blob/454074043f35316726c8d66378cd426b4d421dff/oat/learners/base.py#L582-L608

- Instead of the `fut.result()` for each param, would save the dispatch latency if we call `update_weight` and `broadcast` for every single param, and then wait on all the futs. My understanding is that they will be dispatched as a series of nccl calls, and will respect the order they are dispatched.
- It may be possible to broadcast different params from different learners, so that the communication bandwidth is maximally used. But with some caveats, 1. we may need different communication groups; 2. we need some coordination mechanism to make sure the `broadcast` / `update_weight` pairs are in the right order. Is it possible to add all the actors to the deepspeed communication group so that they get parameter updates? But without having them participate in the training.

https://github.com/sail-sg/oat/blob/454074043f35316726c8d66378cd426b4d421dff/oat/learners/base.py#L575-L579

- Can we avoid this polling? An idea is to create a nccl broadcast independent of the vllm `update_weight` call. The actors receive weight updates in another thread and cache them. Then in the `step` function of actor, we check this cache and update the weights when they are available. In this way we maximize the communication computation overlap.

	for name, param in model.named_parameters():
	count += 1 # empty_cache at last param

	# Fire all vllm engines for broadcast
	if self.strategy.is_rank_0():
	shape = (
	param.shape
	if self.strategy.args.zero_stage != 3
	else param.ds_shape
	)
	futs = [
	actor.futures.update_weight(
	name,
	dtype=torch_type_codec(param.dtype),
	shape=shape,
	empty_cache=count == num_params,
	)
	for actor in self.actors
	]

	# For ZeRO-3, allgather sharded parameter and broadcast to all vllm engines by rank 0
	with deepspeed.zero.GatheredParameters(
	[param], enabled=self.strategy.args.zero_stage == 3
	):
	if self.strategy.is_rank_0():
	dist.broadcast(param.data, 0, group=self._model_update_group)
	_ = [fut.result() for fut in futs]

	while True:
	time.sleep(0.1)
	actors_busy = [actor.is_generating() for actor in self.actors]
	if not any(actors_busy):
	break

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ideas to improve the efficiency #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ideas to improve the efficiency #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions