Replies: 1 comment
-
Closing this discussion and reposting as an issue: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Dear community,
I am using direct-ml for inference of UNet models trained with PyTorch. UNets are mostly Conv3D + BatchNormalization + Relu operations. No transformers used.
The inference results are great and I am looking for model optimization for faster inference.
I hoped that converting the model weights to float16 would be about twice as fast, however I took just as long or sometimes 5-10% longer than float32.
With the same models and cuda execution provider I got halve the inference time, as expected. However I like the portability from directml.
I export models as:
I tried the following:
Expected behaviour
I would expect half the inference time, since using the same platform and GPU I can get so with the cuda provider.
Are there any other options that I could try?
platform: Windows 11
python=3.11.9
Onnx=1.16
Onnxruntime=1.17 / 1.20
GPU: NVIDIA RTX 2080 8gb VRAM.
Beta Was this translation helpful? Give feedback.
All reactions