Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add NVIDIA triton trt-llm extension #888

Merged
merged 2 commits into from
Dec 12, 2023

Conversation

hiro-v
Copy link
Contributor

@hiro-v hiro-v commented Dec 7, 2023

For #821

Integration diagram
Image

NVIDIA triton inference server and TensorRT LLM setup

@hiro-v hiro-v added P1: important Important feature / fix type: feature request A new feature labels Dec 7, 2023
@hiro-v hiro-v added this to the v0.5.0 milestone Dec 7, 2023
@hiro-v hiro-v requested a review from louis-menlo December 7, 2023 02:11
@hiro-v hiro-v self-assigned this Dec 7, 2023
@hiro-v
Copy link
Contributor Author

hiro-v commented Dec 7, 2023

  1. Check if extension installed - See Inference Triton Trt Llm Extension - v 1.0.0
    CleanShot 2023-12-07 at 09 07 41

  2. Find model in Hub
    CleanShot 2023-12-07 at 09 07 53

  3. Update ~/jan/engines/triton_trtllm.json with base_url as remote public/ private IP
    CleanShot 2023-12-07 at 09 16 08

  4. Chat with the remote llama2 7b model on remote NVIDIA Triton inference server
    CleanShot 2023-12-07 at 09 09 45

@hiro-v hiro-v marked this pull request as draft December 7, 2023 02:16
@hiro-v
Copy link
Contributor Author

hiro-v commented Dec 7, 2023

There is a current blockage is the error on Triton inference server + trt llm with missing space character: triton-inference-server/tensorrtllm_backend#34
I will check the answer to follow up.

@louis-menlo louis-menlo force-pushed the feat/inference_engines branch from 481abc4 to 774b122 Compare December 7, 2023 03:36
@hiro-v hiro-v force-pushed the feat/inference_engines branch from 774b122 to d7c0d97 Compare December 7, 2023 08:23
@freelerobot
Copy link
Contributor

What's the rationale for having both inference-extension (which just has the nitro binary) and inference-nitro-extension?

@hiro-v hiro-v force-pushed the feat/inference_engines branch from d29ef17 to f9e73b0 Compare December 8, 2023 16:15
Base automatically changed from feat/inference_engines to main December 8, 2023 18:09
@hiro-v
Copy link
Contributor Author

hiro-v commented Dec 10, 2023

No, the inference-extension has been removed @0xSage

@hiro-v hiro-v force-pushed the feat/inference_engine_triton_trtllm branch from 194132d to fc8057b Compare December 10, 2023 13:31
@hiro-v hiro-v requested review from a team and removed request for louis-menlo December 10, 2023 13:31
@dan-menlo dan-menlo modified the milestones: 0.4.1, 0.4.2, API Endpoint at localhost:1337, Jan supports multiple Inference Engines Dec 11, 2023
@hiro-v hiro-v force-pushed the feat/inference_engine_triton_trtllm branch 2 times, most recently from 0cd4106 to 4054d77 Compare December 12, 2023 01:19
@hiro-v hiro-v marked this pull request as ready for review December 12, 2023 01:45
@hiro-v hiro-v force-pushed the feat/inference_engine_triton_trtllm branch 3 times, most recently from 06d46cb to f26a8d8 Compare December 12, 2023 07:30
Copy link
Contributor

@tikikun tikikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hiro-v hiro-v force-pushed the feat/inference_engine_triton_trtllm branch from f26a8d8 to 587f5ad Compare December 12, 2023 18:28
@hiro-v hiro-v merged commit 9256505 into main Dec 12, 2023
@hiro-v hiro-v deleted the feat/inference_engine_triton_trtllm branch December 12, 2023 18:29
@Van-QA Van-QA added this to the v0.4.9 milestone Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1: important Important feature / fix type: feature request A new feature
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants