Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA_DEVICE_SM_LIMIT = 0 的疑惑 #636

Open
for800000 opened this issue Nov 22, 2024 · 2 comments
Open

CUDA_DEVICE_SM_LIMIT = 0 的疑惑 #636

for800000 opened this issue Nov 22, 2024 · 2 comments

Comments

@for800000
Copy link

Please provide an in-depth description of the question you have:
请教下CUDA_DEVICE_SM_LIMIT = 0 ,这种情况libvgpu属于是拦截了还是没有拦截呢,相比nvidia-device-plugin有没有损耗呢
What do you think about this question?:

Environment:

  • HAMi version:
  • Kubernetes version:
  • Others:
@lixd
Copy link

lixd commented Nov 22, 2024

CUDA_DEVICE_SM_LIMIT 设置为 0 会被当做 100 处理,也会走libvgpu,但是不会做算力限制了,理论上这样也会有损耗。可以配置环境变量 CUDA_DISABLE_CONTROL=true 来屏蔽掉容器层的资源隔离机制。

@for800000
Copy link
Author

感谢,还有个问题
image
image
同个deploy的两个副本,分配1张卡,任务运行起来后出现mem分配与使用不一致,这是正常的还是bug,v2.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants