Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging Code Hangs at Accelerate.backward(loss) When Using Multiple GPUs #4

Open
krP471 opened this issue Dec 13, 2024 · 0 comments
Open

Comments

@krP471
Copy link

krP471 commented Dec 13, 2024

I am trying to run msmoe-merging.py file using muliple gpu by setting num_processes=4 through accelerate and using same default settings, but my code run get stopped or hangs on Accelerate.backward(loss) but it is working fine for me when running on single GPU. Please guide me how can I resolve this issue, why this is happening?

Script

export NCCL_P2P_DISABLE=1
export CUDA_VISIBLE_DEVICES="0,1,2,3"
accelerate launch --debug --config_file static/finetune_config.yaml
--main_process_port 29512 mcsmoe/msmoe-merging.py
--per_device_train_batch_size=8
--per_device_eval_batch_size=8
--gradient_accumulation_steps=1
--preprocessing_num_workers=8
--num_epochs=10
--no_eval_until_epochs=1
--num_eval_steps=50
--learning_rate=2e-4
--warmup_steps=16
--weight_decay=0.01
--kd_temperature=2
--mlm_lambda=1.0
--kd_lambda=0.2
--hd_lambda=0
--task="copa"
--merging_strategy="normal"
--exact_fisher=False
--num_samples_for_merging=16
--similarity_base="router-logits"
--reverse_similarity=False
--similarity_fn="cosine"
--num_groups=8
--globally_group=True
--permute_when_merge=False
--save_stable_rank=False
--encoder_merging_layers="1,5,9"
--decoder_merging_layers="1,5,9"
--output_dir="results/copa/merging/"
--teacher_checkpoint="results/copa/switch-32e-permuted"

Configuration File (finetune_config.yaml)

compute_environment: LOCAL_MACHINE
deepspeed_config:
  gradient_accumulation_steps: 1
  offload_optimizer_device: cpu
  offload_param_device: none
  zero3_init_flag: false
  zero_stage: 2 
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 4 
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant