Hi, @ajrasane
A common structure is residual add in cnn model which is time consuming on data type converting from fp8 to fp16. Is it feasible to keep data type as fp8 in the future theoretically?
in my onnx, I try to replace dwconv output channel to 16 making it can be quantifiable. But it leads bad layer fusion, generating more opts from 300+ to 1500+ and resulting more data type converting opts. Performance drops dramatically from 1200+FPS to 280+FPS.
onnx models are here https://github.com/PonyPinkPie/export/tree/main/ckpt
