Run model with a cupy array on CUDA #10238

james77777778 · 2022-01-11T05:48:46Z

Similar to #10217

Can we run onnxruntime model with cupy array (with some conversions)?
I tried the dlpack way as mentioned in #4162 but I got module not found error with the statement C.OrtValue.from_dlpack()

Have I missed something or the installation is different as usual to support from onnxruntime.capi import _pybind_state as C?

I basically install onnxruntime as following:

# Ubuntu 18.04 with cuda 11.2
pip install onnxruntime-gpu==1.9.0

Thanks!

The text was updated successfully, but these errors were encountered:

ManuelAngel99 · 2022-01-14T04:05:23Z

Hi, I am facing the same Issue. Have you figured out how to use from_dlpack() and to_dlpack() with ONNX?

ManuelAngel99 · 2022-01-14T21:25:12Z

After some research I found a plausible solution. Take a look at #10286 as you may need to build onnxruntime from source. Once you have everything working, you will find the function you were looking for in onnxruntime.training.ortmodule._utils:

from onnxruntime.training.ortmodule._utils import from_dlpack, to_dlpack

stale · 2022-04-16T07:55:00Z

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

monzelr · 2024-02-24T14:28:24Z

I also came across this task here and found that the solution is quite simple: You have to use IO Bindings and just pass the cupy data pointer. I've also made sure that the array is contiguous, because the PyTroch example also shows this. Also there is a nice cupy interoperability guide which I have used.

Here is an example, where a cupy array is turned in, and a numpy is returned:

image_gpu = cp.array(image, dtype=cp.float32)
image_gpu = cp.ascontiguousarray(image_gpu)

binding = onnx_sess.io_binding()
binding.bind_input(name=onnx_sess.get_inputs()[0].name, device_type='cuda', device_id=0, element_type=cp.float32,
                   shape=tuple(image_gpu.shape), buffer_ptr=image_gpu.data.ptr)
binding.bind_output(name=onnx_sess.get_outputs()[0].name)
onnx_sess.run_with_iobinding(binding)
results = binding.copy_outputs_to_cpu()[0]

Here is an example, where only cupy arrays are used:

image_gpu = cp.array(image, dtype=cp.float32)
image_gpu = cp.ascontiguousarray(image_gpu)

binding = onnx_sess.io_binding()
binding.bind_input(name=onnx_sess.get_inputs()[0].name, device_type='cuda', device_id=0, element_type=cp.float32,
                   shape=tuple(image_gpu.shape), buffer_ptr=image_gpu.data.ptr)
binding.bind_output("output", "cuda")

onnx_sess.run_with_iobinding(binding)

if ort_output.data_ptr():
  ort_output = binding.get_outputs()[0]  # returns OrtValue with memory pointer
  mem = cp.cuda.UnownedMemory(ort_output.data_ptr(), np.prod(onnx_sess.get_outputs()[0].shape), owner=ort_output)
  mem_ptr = cp.cuda.MemoryPointer(mem, 0)
  results = cp.ndarray(ort_output.shape(), dtype=cp.float32, memptr=mem_ptr)

I guess also the feature request can thus be closed #15963
Adding this to the docs in the examples would be nice - cupy is awesome regarding pre processing of images on the GPU.

The code above was tested using onnxruntime 1.8.0, cupy 9.6.0 and cuda 11.0.

edit on 26th February: always check if the ort-value pointer is non-zero!

martinResearch · 2024-04-26T18:55:49Z

I have been getting corrupted results on some calls with the solution above when doing repeated calls to the model, and after a day trying to solve the issue I found by chance that one needs to call binding synchronize_inputs() before calling onnx_sess.run_with_iobinding(binding) to avoid that. This does not seem to be documented anywhere :(. I am not sure one needs to also call io_binding.synchronize_outputs().
Also it seems one may need to use results = cp.ndarray(ort_output.shape(), dtype=cp.float32, memptr=mem_ptr).copy() in the example above to avoid getting the data deleted if/when ort_output gets out of scope.

jax11235 · 2024-09-09T13:48:57Z

I tried to use cupy to convert data, and then feed to onnx with the solution above, but one error occured:

RuntimeError: Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]

Does anyone know the reason? onnxruntime-gpu: 1.18.1 | cupy-cuda12x: 13.3.0 | cuda: 12.2

jax11235 · 2024-09-09T13:53:11Z

I have been getting corrupted results on some calls with the solution above when doing repeated calls to the model, and after a day trying to solve the issue I found by chance that one needs to call binding synchronize_inputs() before calling onnx_sess.run_with_iobinding(binding) to avoid that. This does not seem to be documented anywhere :(. I am not sure one needs to also call io_binding.synchronize_outputs(). Also it seems one may need to use results = cp.ndarray(ort_output.shape(), dtype=cp.float32, memptr=mem_ptr).copy() in the example above to avoid getting the data deleted if/when ort_output gets out of scope.

My I ask what is your onnxruntime, cupy and cuda version?

monzelr · 2024-09-09T18:54:29Z

I am currently running the code above with the following versions:

Python 3.9.12
cupy-cuda11x 13.2.0
onnxruntime-gpu 1.17.1
onnx 1.14.1
NVIDIA cudnn 8.9.0.131
NVIDIA CUDA 11.6.2

I had also troubles finding the correct versions for my python environment.
This helped me:
https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements

You may find a solution here:
https://stackoverflow.com/questions/77951682/onnx-runtime-io-binding-bind-input-causing-no-data-transfer-from-devicetype1

jax11235 · 2024-09-11T14:25:35Z

I am currently running the code above with the following versions:

Python 3.9.12

cupy-cuda11x 13.2.0

onnxruntime-gpu 1.17.1

onnx 1.14.1

NVIDIA cudnn 8.9.0.131

NVIDIA CUDA 11.6.2

I had also troubles finding the correct versions for my python environment. This helped me: https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements

You may find a solution here: https://stackoverflow.com/questions/77951682/onnx-runtime-io-binding-bind-input-causing-no-data-transfer-from-devicetype1

Thank you for your config, my code runs successfully under this config.

jax11235 · 2024-09-12T12:53:53Z

I tried to use cupy to convert data, and then feed to onnx with the solution above, but one error occured:
RuntimeError: Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]
Does anyone know the reason? onnxruntime-gpu: 1.18.1 | cupy-cuda12x: 13.3.0 | cuda: 12.2

Update: My previous config is also ok, but I made a mistake and installed the cuda11 version of onnxruntime-gpu library, my code ran successfully when I reinstalled using this command from the official website:

pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/

tianleiwu added the ep:CUDA issues related to the CUDA execution provider label Jan 11, 2022

stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 16, 2022

SimonRelu mentioned this issue May 16, 2023

[Feature Request] Support DLPack protocol in non-training builds #15963

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run model with a cupy array on CUDA #10238

Run model with a cupy array on CUDA #10238

james77777778 commented Jan 11, 2022

ManuelAngel99 commented Jan 14, 2022

ManuelAngel99 commented Jan 14, 2022 •

edited

Loading

stale bot commented Apr 16, 2022

monzelr commented Feb 24, 2024 •

edited

Loading

martinResearch commented Apr 26, 2024 •

edited

Loading

jax11235 commented Sep 9, 2024

jax11235 commented Sep 9, 2024

monzelr commented Sep 9, 2024

jax11235 commented Sep 11, 2024 •

edited

Loading

jax11235 commented Sep 12, 2024

Run model with a cupy array on CUDA #10238

Run model with a cupy array on CUDA #10238

Comments

james77777778 commented Jan 11, 2022

ManuelAngel99 commented Jan 14, 2022

ManuelAngel99 commented Jan 14, 2022 • edited Loading

stale bot commented Apr 16, 2022

monzelr commented Feb 24, 2024 • edited Loading

martinResearch commented Apr 26, 2024 • edited Loading

jax11235 commented Sep 9, 2024

jax11235 commented Sep 9, 2024

monzelr commented Sep 9, 2024

jax11235 commented Sep 11, 2024 • edited Loading

jax11235 commented Sep 12, 2024

ManuelAngel99 commented Jan 14, 2022 •

edited

Loading

monzelr commented Feb 24, 2024 •

edited

Loading

martinResearch commented Apr 26, 2024 •

edited

Loading

jax11235 commented Sep 11, 2024 •

edited

Loading