-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Issues: triton-inference-server/server
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Milestones
Assignee
Sort
Issues list
vLLM backend Hugging Face feature branch model loading
enhancement
New feature or request
#7963
opened Jan 23, 2025 by
knitzschke
unexpected throughput results - Increasing instance group count VS deploying the count distributed on the same card using shared computing windows
performance
A possible performance tune-up
#7956
opened Jan 21, 2025 by
ariel291888
How to start/expose the metrics end point of the Triton Server via openai_frontend/main.py arguments
#7954
opened Jan 21, 2025 by
shuknk8s
Segmentation Fault error when crafting pb_utils.Tensor object Triton BLS model
bug
Something isn't working
#7953
opened Jan 18, 2025 by
carldomond7
Failed to launch triton-server:”error: creating server: Internal - failed to load all models“
module: backends
Issues related to the backends
#7950
opened Jan 17, 2025 by
pzydzh
Triton crashes with SIGSEGV
crash
Related to server crashes, segfaults, etc.
#7938
opened Jan 15, 2025 by
ctxqlxs
[Question] Are the libnvinfer_builder_resources necessary in the triton image ?
question
Further information is requested
#7932
opened Jan 14, 2025 by
MatthieuToulemont
Server build with python BE failing due to missing Boost lib
#7925
opened Jan 9, 2025 by
buddhapuneeth
OpenAI-Compatible Frontend should support world_size larger than 1
enhancement
New feature or request
#7914
opened Jan 3, 2025 by
cocodee
vllm_backend: What is the right way to use downloaded model + Further information is requested
model.json
together?
question
#7912
opened Jan 2, 2025 by
kyoungrok0517
Python backend with multiple instances cause unexpected and non-deterministic results
bug
Something isn't working
#7907
opened Dec 25, 2024 by
NadavShmayo
MIG deployment of triton cause "CacheManager Init Failed. Error: -17"
bug
Something isn't working
#7906
opened Dec 25, 2024 by
LSC527
Shared memory io bottleneck?
performance
A possible performance tune-up
#7905
opened Dec 24, 2024 by
wensimin
Support for guided decoding for vllm backend
enhancement
New feature or request
#7897
opened Dec 20, 2024 by
Inkorak
How Triton inference server always compare the current frame infer result with the previous one
question
Further information is requested
#7893
opened Dec 19, 2024 by
Komoro2023
async execute is not run concurrently
bug
Something isn't working
#7888
opened Dec 17, 2024 by
ShuaiShao93
Error when using ONNX with TensorRT (ORT-TRT) Optimization on Multi-GPU
bug
Something isn't working
#7885
opened Dec 16, 2024 by
efajardo-nv
Manual warmup per model instance / specify warmup config dynamically using c api
#7884
opened Dec 16, 2024 by
asaff1
Triton documentation inconsistency
documentation issue
Documentation isn't correct or could be improved
#7878
opened Dec 12, 2024 by
BenHaItay
Segfault/Coredump in grpc::ModelInferHandler::InferResponseComplete
crash
Related to server crashes, segfaults, etc.
grpc
Related to the GRPC server
#7877
opened Dec 12, 2024 by
andyblackheel
Core was generated by /opt/tritonserver/backends/python/triton_python_backend_stub
module: backends
Issues related to the backends
python
Python related, whether backend, in-process API, client, etc
#7875
opened Dec 12, 2024 by
powerpistn
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.