Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading utanh_bf16 #850

nikolaydubina · 2024-10-14T06:32:03Z

Describe the bug

LLAMA 3.2 11B Vision cannot start after loading model

Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading utanh_bf16

my system

DRIVER_VERSION=550.90.07

Latest commit or version

ghcr.io/ericlbuehler/mistral.rs:cuda-90-sha-1caf83a@sha256:095518a16d1f0a9fa2e212463736ccb540eeb0f88f21c10a2123ab8cf481b83e

References

550.90.07 driver supports 12.4 cuda, which is same as in Docker image 12.4.1 https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-550-90-07/index.html

The text was updated successfully, but these errors were encountered:

nikolaydubina · 2024-10-14T06:47:40Z

with
DRIVER_VERSION=535.183.01 (default in GKE)

error

Error: DriverError(CUDA_ERROR_UNSUPPORTED_PTX_VERSION, "the provided PTX was compiled with an unsupported toolchain.") when loading utanh_bf16

in docker there is cuda requirement that does not go as far as 550. it is all 536

ENV NVIDIA_REQUIRE_CUDA=cuda>=12.4 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=525,driver<526 brand=unknown,driver>=525,driver<526 brand=nvidia,driver>=525,driver<526 brand=nvidiartx,driver>=525,driver<526 brand=geforce,driver>=525,driver<526 brand=geforcertx,driver>=525,driver<526 brand=quadro,driver>=525,driver<526 brand=quadrortx,driver>=525,driver<526 brand=titan,driver>=525,driver<526 brand=titanrtx,driver>=525,driver<526 brand=tesla,driver>=535,driver<536 brand=unknown,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=geforce,driver>=535,driver<536 brand=geforcertx,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=titan,driver>=535,driver<536 brand=titanrtx,driver>=535,driver<536

this driver uses 12.2.2 cuda
https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-104-12/index.html

nikolaydubina · 2024-10-14T07:44:37Z

I have 24GB cuda nvidia-l4. but it fails with 3B model as well and does not reach full memory. so this is not out of memory issues.

3B has same error on different resource

Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading cast_u32_f32

nikolaydubina · 2024-10-16T06:50:55Z

ok, so CUDA works on nodes. there is something wrong with the CUDA usage or build in mistralrs

+-----------------------------------------------------------------------------------------+                                                                                                                       
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |                                                                                                                       
|-----------------------------------------+------------------------+----------------------+                                                                                                                       
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |                                                                                                                       
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   41C    P8             17W /   72W |       1MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                          
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

and sample CUDA Pod works too

$ kubectl -n ml logs vector-add
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

pods

apiVersion: v1
kind: Pod
metadata:
  name: cuda-info
  namespace: ml
spec:
  restartPolicy: OnFailure
  containers:
    - name: main
      image: cuda:12.4.1-cudnn-devel-ubuntu22.04
      command: ["nvidia-smi"]
      resources:
        limits:
          nvidia.com/gpu: 1

apiVersion: v1
kind: Pod
metadata:
  name: vector-add
  namespace: ml
spec:
  restartPolicy: OnFailure
  containers:
    - name: main
      image: cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04
      resources:
        limits:
          nvidia.com/gpu: 1

nikolaydubina · 2024-10-17T11:17:07Z

Pytorch also works on these nodes.

apiVersion: v1
kind: Pod
metadata:
  name: pytorch-cuda
  namespace: ml
spec:
  containers:
    - name: main
      image: pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel
      command: ["/bin/sh", "-c", "sleep 1000000"]
      resources:
        limits:
          nvidia.com/gpu: 1

 $ kubectl exec -n ml --stdin --tty pytorch-cuda -- /bin/bash
root@pytorch-cuda:/workspace# python3
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.current_device()
0
>>> torch.cuda.device_count() 
1
>>> torch.cuda.get_device_name(0)
'NVIDIA L4'
>>> 
root@pytorch-cuda:/workspace#

HuggingFace PyTorch based HTTP server docker also works here and uses CUDA. https://github.com/nikolaydubina/basic-openai-pytorch-server

nikolaydubina · 2024-10-17T11:17:26Z

@EricLBuehler any tips on why CUDA here is not working?

nikolaydubina added the bug Something isn't working label Oct 14, 2024

nikolaydubina mentioned this issue Oct 22, 2024

CUDA_ERROR_UNSUPPORTED_PTX_VERSION on Jetson AGX Orin #867

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading utanh_bf16 #850

Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading utanh_bf16 #850

nikolaydubina commented Oct 14, 2024

nikolaydubina commented Oct 14, 2024 •

edited

Loading

nikolaydubina commented Oct 14, 2024 •

edited

Loading

nikolaydubina commented Oct 16, 2024 •

edited

Loading

nikolaydubina commented Oct 17, 2024 •

edited

Loading

nikolaydubina commented Oct 17, 2024

Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading utanh_bf16 #850

Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading utanh_bf16 #850

Comments

nikolaydubina commented Oct 14, 2024

Describe the bug

Latest commit or version

References

nikolaydubina commented Oct 14, 2024 • edited Loading

nikolaydubina commented Oct 14, 2024 • edited Loading

nikolaydubina commented Oct 16, 2024 • edited Loading

nikolaydubina commented Oct 17, 2024 • edited Loading

nikolaydubina commented Oct 17, 2024

nikolaydubina commented Oct 14, 2024 •

edited

Loading

nikolaydubina commented Oct 14, 2024 •

edited

Loading

nikolaydubina commented Oct 16, 2024 •

edited

Loading

nikolaydubina commented Oct 17, 2024 •

edited

Loading