Skip to content

vNPU HAMi mode may not be used with VGPU in the same cluster. #4833

@lomtom

Description

@lomtom

Description

When GPU and NPU (310P) cards exist at the same time in a cluster, the pod of NPU will be in the pending state.

Steps to reproduce the issue

  1. Prepare a cluster with GPU and NPU types of cards. As far as I am concerned, I have two nodes:
    NodeA: T4 * 2
    NodeB: Ascend310P * 2
  2. prepared two workload
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
  annotations:
    volcano.sh/vgpu-mode: "hami-core" # (Optional, 'hami-core' or 'mig')
spec:
  schedulerName: volcano
  containers:
    - name: cuda-container
      image: swr.cn-east-3.myhuaweicloud.com/lomtom-common/pytorch:2.1.2-cuda12.1-cudnn8-runtime-ubuntu22.04
      command: ["sleep"]
      args: ["100000"]
      resources:
        limits:
          volcano.sh/vgpu-number: 1 
          volcano.sh/vgpu-memory: 1000 
          volcano.sh/vgpu-cores: 10

---
apiVersion: v1
kind: Pod
metadata:
  name: npu-pod-310p
spec:
  schedulerName: volcano
  containers:
  - name: npu-container
    image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04
    command: ["sleep"]
    args: ["100000"] 
    resources:
      limits:
        cpu: "1"
        memory: 1000Mi
        huawei.com/Ascend310P: "1" 
        huawei.com/Ascend310P-memory: "3072"
      requests:
        cpu: "1"
        memory: 1000Mi
        huawei.com/Ascend310P: "1"
        huawei.com/Ascend310P-memory: "3072"
  1. The pod of GPU can work normally, while the pod of NPU has been in the pending state.
Events:
  Type     Reason            Age   From     Message
  ----     ------            ----  ----     -------
  Warning  FailedScheduling  13m   volcano  pod group is not ready, 1 Pending, 1 minAvailable; Pending: 1 Unschedulable

I can schedule normally after removing huawei.com/Ascend310P-memory. I'm not sure if it's the same problem as this 4778

Describe the results you received and expected

I hope the pod of NPU can be scheduled normally.

What version of Volcano are you using?

latest

Any other relevant information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions