-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model gives inaccurate results post conversion to tflite #685
Comments
As I commented in the previous issue, if you find it troublesome to correct the There are hundreds of issues in this repository with the same question, so it's a good idea to search for the issue first.
It's mentally painful to be asked to answer the same thing over and over again. I might delete the |
Really sorry for this. Referring back to the previous issue, the model input resolution was fixed to [1, 3, 480, 640] so I believed the dynamic input size was no longer a problem. with -cotof I can see a pretty bad divergence with an abs error of 4340.93 in the final output. But from this step I thought the input tensors are no longer dynamic and I am reshaping my images according to the fixed tensor input. Regardless I do not wish to take up anymore of your time, I will try to figure out why the the values are diverging |
Using the Your model is a ViT model and has huge parameters, so the auto-correction by onnx2tf may be skipped. onnx2tf/onnx2tf/utils/common_functions.py Lines 3843 to 3862 in 8a93cff
The dummy inference function is necessary to automatically correct model conversion errors, but it consumes a large amount of RAM for models with large structures. https://github.com/PINTO0309/onnx2tf/tree/main?tab=readme-ov-file#3-accuracy-check onnx2tf -i metric3d-vit-small.onnx -cotof Your model appears to consume 130GB of RAM for auto-correction. onnx2tf seems to make a mysterious and fatal error in the constant calculations in this part. There may be a problem with the calculation of optimizing two consecutive It's probably a bug in the optimization process of arithmetic operations. Sorry for blaming you so much. |
Ah that is amazing. Brilliant as always. Results have improved significantly compared to the gibberish output as before. The values are still not accurate though from real world tests and the high frequency features were missed by the converted file. Do you know how we can find out how the pre and post processing steps change upon conversion? Maybe the image needs normalisation as well or the output needs some sort of scaling? The onnx range of values were [4.7, 24.7] and the tflite range is [0.68631876, 11.6238165] and with metric depth such disparity matters. This does seem at odds with the calculated error of 1e-4 but I am not sure. I will run some more experiments to figure out the exact disparity. Anyway this is certainly nothing you should worry about, you have helped me a lot. Maybe the model simply cannot be converted with very high precision. Again thanks a lot for taking out your time and fixing these issues! |
I can't say anything for sure, just guessing how you plan to use the model, but here are some common patterns of loss of accuracy that can occur after conversion to TensorFlow:
Note that onnx2tf fixes all elements to If the error check using the
Therefore, as you point out, the situation where the final output values of each model differ by more than a factor of 10 is clearly not a problem with the models themselves. |
Makes sense. Using float32 here but yes still holds true. Thanks for the detail, I get why conversion is so hard especially for these larger models. So from what I can tell it is just not that easy to convert accurately for this particular model. And any arbitrary changes made by tflite cant be predicted (though normalization is not a big factor).
So the issue is tflite? The values are close but not accurate enough, but that cant be explained since -cotof test does give an error of less than 1e-4. Maybe some depth scaling? I will figure out the scale factor if there is one. Thanks a bunch! |
I have to attend a conference for the next three days, so my investigation and definitive answer will be a little delayed. |
Issue Type
Others
OS
Linux
onnx2tf version number
1.25.7
onnx version number
1.16.2
onnxruntime version number
1.18.1
onnxsim (onnx_simplifier) version number
0.4.36
tensorflow version number
2.17.0
Download URL for ONNX
https://huggingface.co/onnx-community/metric3d-vit-small/blob/main/onnx/model.onnx
Parameter Replacement JSON
N/A
Description
To deploy a monodepth model on edge devices. R&D work and problem exploration. Massive impact since nothing except tflite seems to work with snapdragon SOCs
The model outputs are not correct at all. I did a lot of inspection and here are my findings
Here are the input details for the tflite conversion
and here are the output details
Upon inspection of the onnx file, the onnx version has 3 outputs
So in the tflite file, Identity, Identity_1, and identity_2 corresponds to either one of these. For predicted depth, it could be either identity or identity_2 and I tried both of them but neither give accurate results at all.
Identity gives values in the range of [-1000, -5000] which does not seem accurate for either confidence or depth values while
Identity_2 gives values in the range of [10, 50] which seems more reasonable but still not accurate.
I am not sure if I was supposed to follow any pre or post processing steps different from the onnx format. Tflite often has different steps but I don't exactly know what they are.
this is an example for drawing inference from the onnx file which works absolutely fine. Is it the conversion process that broke it or is there something additional I need to do to fix the results?
Please also find the reference to an old issue which helped me with the conversion process.
I also created a colab notebook to make it easier to see the inferences from the onnx file. For the same image bus.jpg the range of values with onnx are [4.7, 24.7]
The text was updated successfully, but these errors were encountered: