Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_pb is slow #2

Open
fugjo16 opened this issue Jan 19, 2019 · 12 comments
Open

read_pb is slow #2

fugjo16 opened this issue Jan 19, 2019 · 12 comments

Comments

@fugjo16
Copy link

fugjo16 commented Jan 19, 2019

Dear author,

It's a great project, and the result is good!

but when I ran yolov3 with TensorRT on TX2, it took a long time (about 10~20 mins) to run read_pb_return_tensors().
Is this right? I'm wondering whether I did something wrong ...

Thanks

@ardianumam
Copy link
Owner

Hi,

Do you: (i) run all the block 2 code of this code file, or (ii) only run function read_pb_graph("./model/YOLOv3/yolov3_gpu_nms.pb? If (i), yes, it takes longer time since you also perform TensorRT optimization. But, later after you store the trt_model.pb, you can just do similar to (ii) to call your stored trt_model.pb, and it only takes few seconds (also depends on your GPU). By the way, can you provide how much improvement in term of FPS after TRT optimization? And also what GPU you use. I am also curious about that.

@fugjo16
Copy link
Author

fugjo16 commented Jan 21, 2019

Hi @ardianumam ,

The situation is (ii), it will need about 15 mins to load the model, and I run this code on Jetson TX2, but with 3rd party carrier board.
After the loading finished, the fps can be about 9 fps, and about 4 fps without TensorRT optimization.
I think maybe the problem is caused by the 3rd party carrier board or different version of packages, I'll check it.
Thanks for your reply.

@ardianumam
Copy link
Owner

@fugjo16 : Do you convert the frozen_model.pb to TRT_model.pb in Desktop, then, you use it in Jetson TX2? Because I ever do the similar, and yes, it takes very long time even only to load the TRT_model.pb. And actually, such workflow is not proper, since TensorRT optimization generates an optimized model specifically for the machine we used to run the TensorRT optimization.

If not, I wonder that you can convert frozen_model.pb to TRT_model.pb in Jetson TX2, cz I ever try it several times and it always runs out memory. -.-

@fugjo16
Copy link
Author

fugjo16 commented Jan 21, 2019

@ardianumam. No, I convert to TRT_model.pb on the TX2, I use swap to get some more memory, as below. It's for CPU memory, but it still helped.
https://devtalk.nvidia.com/default/topic/1025939/jetson-tx2/when-i-run-a-tensorflow-model-there-is-not-enough-memory-what-shoud-i-do-/
Maybe this is why I need so much time to load TRT_model ...

@ardianumam
Copy link
Owner

@fugjo16 : I just knew about that. I'll try later in my TX2 too, and update here soon. Thanks. Yes, probably that's the cause.

@fugjo16
Copy link
Author

fugjo16 commented Jan 22, 2019

@ardianumam Thanks! this problem really confuse me a lot.

@ardianumam
Copy link
Owner

Hi @fugjo16 : I just tried in my TX2, and yes, it took about 15 minutes to only read the <tensorrt_model.pb>, meanwhile reading the native tensorflow model <frozen_model>.pb needs only 5 seconds. lol. Maybe it due to the swap memory use when performing TensorRT optimization. I posted to NVIDIA forum too, hope someone replies. Or do you plan to, for example, reduce the YOLOv3 architecture so that we can perform tensorrt optimization in TX2 without making swap memory?

@fugjo16
Copy link
Author

fugjo16 commented Jan 24, 2019

Hi @ardianumam: Thanks a lot! hope someone will answer it. lol. Yes, I think this method will work, I will try it! Thanks :D

@filipski
Copy link

I'd rather say you're hit by the protobuf version/backend. Check:
https://devtalk.nvidia.com/default/topic/1046492/tensorrt/extremely-long-time-to-load-trt-optimized-frozen-tf-graphs/

and start with:
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp
before running your code. If that doesn't help - update protobuf. I rebuilt it from sources.

@ardianumam
Copy link
Owner

@filipski : thanks for the info. I'll give it a try.

@fugjo16
Copy link
Author

fugjo16 commented Jun 25, 2019

I tested with this blog's script. it's easy to modify, and it works for me.
https://jkjung-avt.github.io/tf-trt-revisited/

@MuhammadAsadJaved
Copy link

@fugjo16 @ardianumam
I have a yolov3 Tensorflow model in both ckpts and .pb format. My model can run in GTX 1080 Ti at 37 FPS . Now I want to run in Xavier NX but model is very slow. about 2 FPS.
How I can optimize this model using trt to make it faster and run in Xavier NX? how I can convert .pb model to .trt engine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants