-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detection duplicates with fp16 on Jetson Nano (TensorRT v8.2.1.8) #112
Comments
Actually I observed something peculiar. I tried these two different combinations
and
expecting that the first combination with the small iou_thres would result in a more permissive model that would allow for multiple detections of the same object, while the second combination would be more conservative and only permit the most dominant detection to survive. To my surprise, the two approached had absolutely no difference, as if the Any idea why is this happening? Has anyone experienced something similar before? |
Ok, last update. I tried to skip
Then everything works smoothly since the post-processing takes care of the NMS, but the inference time increases; not significantly but it does especially in a platform like the Nano. It's still faster than its onnx counterpart, but I think this approach is sub-optimal. Is there something special with fp16 that forces me to drive this way? Thanks in advance for your support and I hope this issue will help someone in the future. Cheers Looking forward to your reply! |
@IoannisKaragiannis May I ask how did you install cuda-python on the jetson nano? |
Hey there Linamo1214,
First of all, great job with the trt. I have one question though. I have proceeded with the conversion like this.
On my laptop, running Ubuntu 22.04, without any NVIDIA GPU, I have created a virtual environment with python3.10, I installed all the essential packages for the
yolov7
repo, and I just proceeded with the.pt
to.onnx
conversion like thisI on purpose did not set the
--end2end
flag in order to use it later directly on the trt conversion.Then I moved on my Jetson Nano. I have my own tiny project, where I confirmed that the
yolov7-tiny-416.onnx
model from the above conversion works fine with an average inference time of 99.5 ms. Then I downloaded your repo on my Jetson Nano, I created a dedicated virtual environment with python3.6 (to be compatible with tensorrt which was built with python3.6 also), I symbolically linked the natively built TensorRT like this:and then I proceeded with the
.onnx
to.trt
conversion like this:The reason I set the maximum workspace size to 2GB was because I was getting the following error:
and the reason I decided to use the
-w
flag in the first place is because I was getting the following error:So, basically, to overcome this, I had to apply the following change in your
export.py
. I guess I had to do this because of the old tensorRT version of the Jetson Nano.Then, based on you
trt.py
, I load the trt model in my application on the Jetson Nano and it does load successfully and the inference time drops from 99.5ms to 61 ms, but I encountered two issues:--end2end
flag would take care of applying the NMS but it doesn't. Is it again because of the old implementation of the TensorRT v8.2.1.8? Should I perhaps entirely skip the--end2end
flag and allow yourinference
function inside thetrt.py
do the post-processing trick? What do you recommend?Thanks in advance for your response! cheers
The text was updated successfully, but these errors were encountered: