DDPPO Multi-GPU Error #2038

Yuxin916 · 2024-08-23T15:02:51Z

Habitat-Lab and Habitat-Sim versions

Habitat-Lab: master

Habitat-Sim: master

Habitat is under active development, and we advise users to restrict themselves to stable releases. Are you using the latest release versions of Habitat-Lab and Habitat-Sim? Your question may already be addressed in the latest versions. We may also not be able to help with problems in earlier versions because they sometimes lack the more verbose logging needed for debugging.

Master branch contains 'bleeding edge' code and should be used at your own risk.

Docs and Tutorials

Did you read the docs? https://aihabitat.org/docs/habitat-lab/
Yes
Did you check out the tutorials? https://aihabitat.org/tutorial/2020/
Yes
Perhaps your question is answered there. If not, carry on!

❓ Questions and Help

Hi i am using habitat-baseline for objectnav task with ddppo trainer. And i am runing on a single node server with multiple GPUs. I follow the provided single node bash file to run python -u -m torch.distributed.launch --nnodes=1 --nproc_per_node=3 --use_env habitat-baselines/habitat_baselines/run.py --config-name=objectnav/ddppo_objectnav_hm3d.yaml habitat_baselines.trainer_name=ddppo habitat_baselines.num_environments=2 habitat_baselines.evaluate=False.

However, the error as shows:

It looks like there is error in this function of ddppo.py:

def _evaluate_actions(self, *args, **kwargs): r"""Internal method that calls Policy.evaluate_actions. This is used instead of calling that directly so that that call can be overrided with inheritance """ # DistributedDataParallel moves all tensors to the device (or devices) # So we need to make anything that is on the CPU into a numpy array # This is needed for older versions of pytorch that haven't deprecated # the single-process multi-device version of DDP return self._evaluate_actions_wrapper.ddp( *_cpu_to_numpy(args), **_cpu_to_numpy(kwargs) )

Any insight or suggestions on that? Is it because the pytorch version is too new? I manually switch off the torch.inference_mode in common.py and it worked.

Best regards

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDPPO Multi-GPU Error #2038

DDPPO Multi-GPU Error #2038

Yuxin916 commented Aug 23, 2024 •

edited

Loading

DDPPO Multi-GPU Error #2038

DDPPO Multi-GPU Error #2038

Comments

Yuxin916 commented Aug 23, 2024 • edited Loading

Habitat-Lab and Habitat-Sim versions

Docs and Tutorials

❓ Questions and Help

Yuxin916 commented Aug 23, 2024 •

edited

Loading