Skip to content

TensorRT: Quantization issues with convtranspose3D #688

@Floriangit12

Description

@Floriangit12

Hello,

I don’t understand why convtranspose3d works in int8 with implicit quantization, and when I want to use explicit quantization or QAT phase with the TensorRT model optimizer, convtranspose3d is now in fp16.
I am using a model optimizer with a default int8 configuration.
I think this is only a problem related to the TensorRT model optimizer, but I’m not sure.
I am sharing the code and export with quantification below.

Here is my implicit ONNX.
Image

Here is the result after TensorRT quantization.
Image

So, here, it s int8 convTranspose3D.

Inference time in implicit mode
Image
9 ms inference time, so it s ok in implicit mode

Here is explicit ONNX quantization.
Image

Here is the result after TensorRT quantization.
Image

Inference time in explicit mode.
Image
In explicit quantization: inference time of 11 ms.

Command for run trt quatization :
.\trtexec.exe --onnx=surround_occ_int8.onnx --noDataTransfers --useCudaGraph --useSpinWait --profilingVerbosity=detailed --verbise -h --fp16 --int8

I try with --int8 and without --int89 ut same result

My code with export onnx :

code.txt

model.txt

I've been looking at documentation and solutions on forums for a few weeks now. I'm out of solutions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionHelp is is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions