-
-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversion to/from tch
Tensors
#46
Comments
There's definitely an unacceptable number of copies that could be causing the degradation in performance. In the conversion to
As for conversion from
I'm not too familiar with the |
Thank you very much for the quick response. It seems the mechanism for libraries relying on manipulating pytorch tensors for pre/post-processing and inference on an I ran a few more experiments using either CPU or CUDA. The performance when using CPU is very similar for both a Libtorch and ONNX implementation of the models. When using CUDA the Libtorch version is more than ~10 faster than the ONNX equivalent. This aligns with the optimum linked above setting It seems using |
Unfortunately, it seems ONNX Runtime only exposes DLPack conversion through Python bindings, there is no C API for it (at least at the moment). (+ it would only be available in training builds.) But I am working on |
I released v1.15.0 with support for |
Thank you! |
@decahedron1 I have migrated my crate to ort-2.0, it is still unclear how to effectively use IOBindings with tensors that are already on the CUDA device. Would it be possible to provide documentation for the same? |
Hello,
Thank you again for building these bindings.
I am working on integrating ONNX support for a project I have been working on (rust-bert). I have most existing pipelines working (from classification to text generation), but I am observing a severe performance degradation compared to the Libtorch backend using tch bindings.
Most of the pipeline logic is written using
tch
tensors and I was hoping to be able to re-use most of this logic for ONNX models. I suspect the performance hit comes from the conversion betweentch::Tensor
andort::tensor::InputTensor
.The current conversion I am using follows generally the following steps:
1. tch to ort:
The actual implementation looks like
2. ort to tch:
The actual implementation looks like
This includes a lot of copy and memory allocations (especially given the slice intermediate representation). I was hoping to be able to convert from
TchTensors
toOrtTensors
ideally without copy (creating these elements from the data point of the source element), or at least without having to go through the intermediate slices.I have tried a few things on the
tch
side, including creating a Tensor from a ndarray skipping the slice creation, but this still copies data over and I am unsure if there would be a better way of doing so.I understand you may not be fully familiar with the tch project - any hints on the way forward would be appreciated.
For information, the ONNX implementation I am working on is on guillaume-be/rust-bert#346
Thank you!
The text was updated successfully, but these errors were encountered: