-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add NVIDIA triton trt-llm extension #888
Conversation
There is a current blockage is the error on Triton inference server + trt llm with missing space character: triton-inference-server/tensorrtllm_backend#34 |
481abc4
to
774b122
Compare
774b122
to
d7c0d97
Compare
What's the rationale for having both |
d29ef17
to
f9e73b0
Compare
No, the |
194132d
to
fc8057b
Compare
0cd4106
to
4054d77
Compare
06d46cb
to
f26a8d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
extensions/inference-triton-trtllm-extension/src/@types/global.d.ts
Outdated
Show resolved
Hide resolved
f26a8d8
to
587f5ad
Compare
For #821
Integration diagram
NVIDIA triton inference server and TensorRT LLM setup
triton-inference-cluster
using Helm on Kubernetes on DGX clusters