Correct example to use TensorRT? #1985

sherlcok314159 · 2024-08-08T08:46:14Z

System Info

optimum: 1.20.0
os: ubuntu 20.04 with RTX 2080TI
python: 3.10.14

Who can help?

@michaelbenayoun @JingyaHuang @echarlaix

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

I followed the doc here. The below is my code:

from transformers import AutoProcessor
from optimum.onnxruntime import ORTModelForVision2Seq

model = 'facebook/nougat-small'
ort_model = ORTModelForVision2Seq.from_pretrained(
    "facebook/nougat-small",
    export=True,
    provider="TensorrtExecutionProvider",
)

assert ort_model.providers == ["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"]
processor = AutoProcessor.from_pretrained(model)
ort_model.save_pretrained('./nougat-small-trt')
processor.save_pretrained('./nougat-small-trt')

When running the code, the terminal looks like:

2024-08-08 16:31:02.881585368 [W:onnxruntime:Default, tensorrt_execution_provider.h:83 log] [2024-08-08 08:31:02 WARNING] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped

I waited for almost half an hour for exporting the model (RTX 2080TI). However, when I loaded it by the below code, it just repeated the same thing.

        session_options = ort.SessionOptions()
        session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
        session_options.log_severity_level = 3
        trt_engine_cache = './nougat-small-trt-cache'
        os.makedirs(trt_engine_cache, exist_ok=True)
        provider_options = {
            'trt_engine_cache_enable': True,
            'trt_engine_cache_path': trt_engine_cache
        }
        self.model = ORTModelForVision2Seq.from_pretrained(
            model,
            provider='TensorrtExecutionProvider',
            provider_options=provider_options,
            session_options=session_options,
        )

Therefore, I want to know whether Optimum supports TensorRT or not. Or there is something wrong with the official doc to run TensorRT.

Expected behavior

When loading the converted model by TensorRT, optimum should not repeat the converting process again.

The text was updated successfully, but these errors were encountered:

mht-sharma · 2024-08-29T11:00:48Z

Hi @sherlcok314159 thanks for the issue.

I tested the ORTModel with various models using the TensorRTExecutionProvider for caching, and it generally worked as expected.

However, for facebook/nougat-small encoder , the engine cache doesn't seem to be effective enough to significantly reduce the load time, as it appears the model is recompiling.

I timed the inference of the encoder with and without the cache and noticed a 5-6 minute speedup with the engine cache. Despite this, the total load time remained around 30 minutes on A100.

Given that, this is a TensorRT-specific issue. Could you please create a separate issue on the Nvidia forum or ORT? Below is the code I used for testing:

from onnxruntime import SessionOptions, InferenceSession
import onnxruntime as ort
from PIL import Image
from transformers import AutoProcessor
from huggingface_hub import hf_hub_download

# Set session options
sess_opt = SessionOptions()
sess_opt.log_severity_level = 0
# sess_opt.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL

# Provider options for TensorRT
provider_options = {
    "trt_detailed_build_log": True,
    "trt_engine_cache_enable": True,
    "trt_engine_cache_path": "nougat_onx_cache"
}

# Load the ONNX model
model_path = "nougat-small-onnx/encoder_model.onnx"

# Create an inference session with TensorRT execution provider
session = InferenceSession(model_path, sess_options=sess_opt, providers=[('TensorrtExecutionProvider', provider_options)])

# Download and process the image
filepath = hf_hub_download(repo_id="hf-internal-testing/fixtures_docvqa", filename="nougat_paper.png", repo_type="dataset")
image = Image.open(filepath)
processor = AutoProcessor.from_pretrained("facebook/nougat-small")  # Replace with your processor model
pixel_values = processor(image, return_tensors="pt").pixel_values.numpy()

# Prepare input data
input_name = session.get_inputs()[0].name

# Run inference
outputs = session.run(None, {input_name: pixel_values})

# Process and print the output
print("Inference output:", outputs)

sherlcok314159 · 2024-08-29T11:24:34Z

I dont have time for following the issues. Could you please refer this issue to TensorRT? And if there is solution, inform me of it please. Thanks!

sherlcok314159 added the bug Something isn't working label Aug 8, 2024

mht-sharma self-assigned this Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct example to use TensorRT? #1985

Correct example to use TensorRT? #1985

sherlcok314159 commented Aug 8, 2024 •

edited

Loading

mht-sharma commented Aug 29, 2024

sherlcok314159 commented Aug 29, 2024

Correct example to use TensorRT? #1985

Correct example to use TensorRT? #1985

Comments

sherlcok314159 commented Aug 8, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

mht-sharma commented Aug 29, 2024

sherlcok314159 commented Aug 29, 2024

sherlcok314159 commented Aug 8, 2024 •

edited

Loading