-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raise KeyError(key) from None KeyError: 'RANK' #48
Comments
hello!, 请问您跑通过了吗?解决问题了吗 |
我在探索多GPU训练。关于这个“RANK”和“WORLD_SIZE”,我能说的就是这是多GPU训练所必需的两个参数; if 'RANK' in os.environ and 'WORLD_SIZE' in os.environ:
args.rank = int(os.environ["RANK"])
args.world_size = int(os.environ['WORLD_SIZE'])
args.gpu = int(os.environ['LOCAL_RANK'])
elif 'SLURM_PROCID' in os.environ:
args.rank = int(os.environ['SLURM_PROCID'])
args.gpu = args.rank % torch.cuda.device_count()
else:
print('Not using distributed mode')
args.distributed = False
return 我也只能提供这点线索了(笑哭)就是在找怎么解决环境变量没有这两个key才搜到你们的 |
我在运行vatex部分的training命令,得到了这样的错误,我上网查了下,手动给os.environ['RANK‘]赋值可跳过此错误,但是后面会报错:os.environ['WORLD_SIZE'] key error, 我思考这个问题应该不简单,搞不懂了,请各位大神教我,如何把程序跑通是第一步。。谢谢
File "src/tasks/run_caption_VidSwinBert.py", line 689, in
main(args)
File "src/tasks/run_caption_VidSwinBert.py", line 675, in main
args, vl_transformer, optimizer, scheduler = mixed_precision_init(args, vl_transformer)
File "src/tasks/run_caption_VidSwinBert.py", line 105, in mixed_precision_init
model, optimizer, _, _ = deepspeed.initialize(
File "/home/bwang/anaconda3/envs/qysu_vc/lib/python3.8/site-packages/deepspeed/init.py", line 129, in initialize
dist.init_distributed(dist_backend=dist_backend, dist_init_required=dist_init_required)
File "/home/bwang/anaconda3/envs/qysu_vc/lib/python3.8/site-packages/deepspeed/comm/comm.py", line 592, in init_distributed
init_deepspeed_backend(get_accelerator().communication_backend_name(), timeout, init_method)
File "/home/bwang/anaconda3/envs/qysu_vc/lib/python3.8/site-packages/deepspeed/comm/comm.py", line 148, in init_deepspeed_backend
rank = int(os.environ["RANK"])
File "/home/bwang/anaconda3/envs/qysu_vc/lib/python3.8/os.py", line 675, in getitem
raise KeyError(key) from None
KeyError: 'RANK'
The text was updated successfully, but these errors were encountered: