-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct example to use TensorRT? #1985
Comments
Hi @sherlcok314159 thanks for the issue. I tested the ORTModel with various models using the However, for I timed the inference of the encoder with and without the cache and noticed a 5-6 minute speedup with the engine cache. Despite this, the total load time remained around 30 minutes on A100. Given that, this is a TensorRT-specific issue. Could you please create a separate issue on the Nvidia forum or ORT? Below is the code I used for testing: from onnxruntime import SessionOptions, InferenceSession
import onnxruntime as ort
from PIL import Image
from transformers import AutoProcessor
from huggingface_hub import hf_hub_download
# Set session options
sess_opt = SessionOptions()
sess_opt.log_severity_level = 0
# sess_opt.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
# Provider options for TensorRT
provider_options = {
"trt_detailed_build_log": True,
"trt_engine_cache_enable": True,
"trt_engine_cache_path": "nougat_onx_cache"
}
# Load the ONNX model
model_path = "nougat-small-onnx/encoder_model.onnx"
# Create an inference session with TensorRT execution provider
session = InferenceSession(model_path, sess_options=sess_opt, providers=[('TensorrtExecutionProvider', provider_options)])
# Download and process the image
filepath = hf_hub_download(repo_id="hf-internal-testing/fixtures_docvqa", filename="nougat_paper.png", repo_type="dataset")
image = Image.open(filepath)
processor = AutoProcessor.from_pretrained("facebook/nougat-small") # Replace with your processor model
pixel_values = processor(image, return_tensors="pt").pixel_values.numpy()
# Prepare input data
input_name = session.get_inputs()[0].name
# Run inference
outputs = session.run(None, {input_name: pixel_values})
# Process and print the output
print("Inference output:", outputs) |
I dont have time for following the issues. Could you please refer this issue to TensorRT? And if there is solution, inform me of it please. Thanks! |
System Info
Who can help?
@michaelbenayoun @JingyaHuang @echarlaix
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
I followed the doc here. The below is my code:
When running the code, the terminal looks like:
I waited for almost half an hour for exporting the model (RTX 2080TI). However, when I loaded it by the below code, it just repeated the same thing.
Therefore, I want to know whether Optimum supports TensorRT or not. Or there is something wrong with the official doc to run TensorRT.
Expected behavior
When loading the converted model by TensorRT, optimum should not repeat the converting process again.
The text was updated successfully, but these errors were encountered: