-
Notifications
You must be signed in to change notification settings - Fork 213
Description
Hello,
I don’t understand why convtranspose3d works in int8 with implicit quantization, and when I want to use explicit quantization or QAT phase with the TensorRT model optimizer, convtranspose3d is now in fp16.
I am using a model optimizer with a default int8 configuration.
I think this is only a problem related to the TensorRT model optimizer, but I’m not sure.
I am sharing the code and export with quantification below.
Here is the result after TensorRT quantization.

So, here, it s int8 convTranspose3D.
Inference time in implicit mode

9 ms inference time, so it s ok in implicit mode
Here is explicit ONNX quantization.

Here is the result after TensorRT quantization.

Inference time in explicit mode.

In explicit quantization: inference time of 11 ms.
Command for run trt quatization :
.\trtexec.exe --onnx=surround_occ_int8.onnx --noDataTransfers --useCudaGraph --useSpinWait --profilingVerbosity=detailed --verbise -h --fp16 --int8
I try with --int8 and without --int89 ut same result
My code with export onnx :
I've been looking at documentation and solutions on forums for a few weeks now. I'm out of solutions.
