Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mindcv FAQ #804

Open
Ash-Lee233 opened this issue Sep 2, 2024 · 1 comment
Open

mindcv FAQ #804

Ash-Lee233 opened this issue Sep 2, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Ash-Lee233
Copy link
Collaborator

Here is a summary of common training and inference issues.

@Ash-Lee233 Ash-Lee233 added the bug Something isn't working label Sep 2, 2024
@Ash-Lee233 Ash-Lee233 pinned this issue Sep 2, 2024
@ChongWei905
Copy link
Contributor

Notice that if you are using mpirun startup with 2 devices, it's necessary to add --bind-to numa to avoid known performance error. For example:
mpirun --allow-run-as-root --merge-stderr-to-stdout --output-filename ./output_bind --bind-to numa -n 2 \ python train.py --distribute --model=densenet121 --dataset=imagenet --data_dir=/path/to/imagenet
For more information, please refer to https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/startup_method.html

注意,如果在两卡环境下选用mpirun作为启动方式,请添加配置项 --bind-to numa 增加绑核操作以规避两卡场景下的性能问题,范例代码如下:
mpirun --allow-run-as-root --merge-stderr-to-stdout --output-filename ./output_bind --bind-to numa -n 2 \ python train.py --distribute --model=densenet121 --dataset=imagenet --data_dir=/path/to/imagenet
如需更多操作指导,请参考 https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/startup_method.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants