feat: Add NVIDIA triton trt-llm extension #888

hiro-v · 2023-12-07T02:11:34Z

Integration diagram

NVIDIA triton inference server and TensorRT LLM setup

https://github.com/hamelsmu/llama-inference/blob/master/triton-tensorRT/README.md
This can be extended to triton-inference-cluster using Helm on Kubernetes on DGX clusters

hiro-v · 2023-12-07T02:16:21Z

Check if extension installed - See Inference Triton Trt Llm Extension - v 1.0.0
Find model in Hub
Update ~/jan/engines/triton_trtllm.json with base_url as remote public/ private IP
Chat with the remote llama2 7b model on remote NVIDIA Triton inference server

hiro-v · 2023-12-07T02:27:36Z

There is a current blockage is the error on Triton inference server + trt llm with missing space character: triton-inference-server/tensorrtllm_backend#34
I will check the answer to follow up.

freelerobot · 2023-12-08T03:51:44Z

What's the rationale for having both inference-extension (which just has the nitro binary) and inference-nitro-extension?

hiro-v · 2023-12-10T13:19:49Z

No, the inference-extension has been removed @0xSage

tikikun

LGTM

extensions/inference-triton-trtllm-extension/src/@types/global.d.ts

extensions/inference-triton-trtllm-extension/src/helpers/sse.ts

hiro-v added P1: important Important feature / fix type: feature request A new feature labels Dec 7, 2023

hiro-v added this to the v0.5.0 milestone Dec 7, 2023

hiro-v requested a review from louis-menlo December 7, 2023 02:11

hiro-v self-assigned this Dec 7, 2023

hiro-v marked this pull request as draft December 7, 2023 02:16

louis-menlo force-pushed the feat/inference_engines branch from 481abc4 to 774b122 Compare December 7, 2023 03:36

hiro-v force-pushed the feat/inference_engines branch from 774b122 to d7c0d97 Compare December 7, 2023 08:23

hiro-v force-pushed the feat/inference_engines branch from d29ef17 to f9e73b0 Compare December 8, 2023 16:15

Base automatically changed from feat/inference_engines to main December 8, 2023 18:09

hiro-v force-pushed the feat/inference_engine_triton_trtllm branch from 194132d to fc8057b Compare December 10, 2023 13:31

hiro-v requested review from a team and removed request for louis-menlo December 10, 2023 13:31

dan-menlo modified the milestones: 0.4.1, 0.4.2, API Endpoint at localhost:1337, Jan supports multiple Inference Engines Dec 11, 2023

hiro-v force-pushed the feat/inference_engine_triton_trtllm branch 2 times, most recently from 0cd4106 to 4054d77 Compare December 12, 2023 01:19

hiro-v marked this pull request as ready for review December 12, 2023 01:45

hiro-v force-pushed the feat/inference_engine_triton_trtllm branch 3 times, most recently from 06d46cb to f26a8d8 Compare December 12, 2023 07:30

tikikun approved these changes Dec 12, 2023

View reviewed changes

louis-menlo reviewed Dec 12, 2023

View reviewed changes

extensions/inference-triton-trtllm-extension/src/@types/global.d.ts Outdated Show resolved Hide resolved

louis-menlo reviewed Dec 12, 2023

View reviewed changes

extensions/inference-triton-trtllm-extension/src/helpers/sse.ts Show resolved Hide resolved

louis-menlo reviewed Dec 12, 2023

View reviewed changes

extensions/inference-triton-trtllm-extension/src/helpers/sse.ts Show resolved Hide resolved

louis-menlo approved these changes Dec 12, 2023

View reviewed changes

hiro-v added 2 commits December 13, 2023 01:24

feat: Add triton trtllm for engine for remote models

f268877

fix: Fix issues based on Louis comments

587f5ad

hiro-v force-pushed the feat/inference_engine_triton_trtllm branch from f26a8d8 to 587f5ad Compare December 12, 2023 18:28

hiro-v merged commit 9256505 into main Dec 12, 2023

hiro-v deleted the feat/inference_engine_triton_trtllm branch December 12, 2023 18:29

hiro-v mentioned this pull request Feb 26, 2024

epic: Jan supports TensorRT-LLM, Triton Server and Nvidia Professional/Datacenter-grade GPUs #1766

Closed

Van-QA added this to the v0.4.9 milestone Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add NVIDIA triton trt-llm extension #888

feat: Add NVIDIA triton trt-llm extension #888

hiro-v commented Dec 7, 2023 •

edited

Loading

hiro-v commented Dec 7, 2023

hiro-v commented Dec 7, 2023

freelerobot commented Dec 8, 2023

hiro-v commented Dec 10, 2023

tikikun left a comment

feat: Add NVIDIA triton trt-llm extension #888

feat: Add NVIDIA triton trt-llm extension #888

Conversation

hiro-v commented Dec 7, 2023 • edited Loading

hiro-v commented Dec 7, 2023

hiro-v commented Dec 7, 2023

freelerobot commented Dec 8, 2023

hiro-v commented Dec 10, 2023

tikikun left a comment

Choose a reason for hiding this comment

hiro-v commented Dec 7, 2023 •

edited

Loading