MPI on CPU-only: "no support for _allgather_base" #3176

tikhu · 2024-10-17T09:51:26Z

System Info

- `Accelerate` version: 1.0.1
- Platform: Linux-6.10.4-linuxkit-aarch64-with-glibc2.35
- `accelerate` bash location: /usr/local/bin/accelerate
- Python version: 3.10.12
- Numpy version: 1.24.4
- PyTorch version (GPU?): 2.5.0a0+b465a5843b.nv24.09 (False)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- PyTorch MUSA available: False
- System RAM: 70.54 GB
- `Accelerate` default config:
        Not found

This is my config file:

compute_environment: LOCAL_MACHINE
debug: true
distributed_type: MULTI_CPU
downcast_bf16: 'no'
enable_cpu_affinity: false
ipex_config:
  ipex: false
machine_rank: 0
main_process_ip: 172.18.0.2
main_process_port: 8888
main_training_function: main
mixed_precision: 'no'
mpirun_config:
  mpirun_ccl: '1'
  mpirun_hostfile: /accelerate/hostfile
num_machines: 2
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: true

For testing this is running with two mpi-connected docker images (based on nvcr.io/nvidia/pytorch:24.09-py3) on an Apple M3 Max running macOS 15

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

On an MPI system with more than one node run accelerate launch --config_file=config_mpi.yaml nlp_example.py --cpu
wait for the run to crash
get this output:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/accelerate/nlp_example.py", line 209, in <module>
[rank0]:     main()
[rank0]:   File "/accelerate/nlp_example.py", line 205, in main
[rank0]:     training_function(config, args)
[rank0]:   File "/accelerate/nlp_example.py", line 179, in training_function
[rank0]:     predictions, references = accelerator.gather_for_metrics((predictions, batch["labels"]))
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2500, in gather_for_metrics
[rank0]:     data = self.gather(input_data)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2456, in gather
[rank0]:     return gather(tensor)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 398, in wrapper
[rank0]:     return function(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 437, in gather
[rank0]:     return _gpu_gather(tensor)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 356, in _gpu_gather
[rank0]:     return recursively_apply(_gpu_gather_one, tensor, error_on_other_type=True)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 108, in recursively_apply
[rank0]:     return honor_type(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 82, in honor_type
[rank0]:     return type(obj)(generator)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 111, in <genexpr>
[rank0]:     recursively_apply(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 127, in recursively_apply
[rank0]:     return func(data, *args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 346, in _gpu_gather_one
[rank0]:     gather_op(output_tensors, tensor)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 83, in wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 3410, in all_gather_into_tensor
[rank0]:     work = group._allgather_base(output_tensor, input_tensor, opts)
[rank0]: RuntimeError: no support for _allgather_base in MPI process group

If I switch out MPI from under accelerate the example runs without giving an error message.
I do this by adding the following:

import torch.distributed as dist
world_size = int(os.environ.get('ACCELERATE_WORLD_SIZE', '1')) 
rank = int(os.environ.get('ACCELERATE_RANK', '0'))  
dist.init_process_group(backend='gloo', rank=rank, world_size=world_size)

Expected behavior

I expect it not to crash.
It works fine if number of machines is 1.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI on CPU-only: "no support for _allgather_base" #3176

MPI on CPU-only: "no support for _allgather_base" #3176

tikhu commented Oct 17, 2024 •

edited

Loading

MPI on CPU-only: "no support for _allgather_base" #3176

MPI on CPU-only: "no support for _allgather_base" #3176

Comments

tikhu commented Oct 17, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

tikhu commented Oct 17, 2024 •

edited

Loading