You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using the AWS TEI Docker image (2.0.1-tei1.4.0-gpu-py310-cu122-ubuntu22.04) for text embeddings inference. When I deploy it on a SageMaker g4dn.xlarge instance, the process stops working after just a couple of requests. Strangely, the same setup runs smoothly on a g5 instance without any issues.
It looks like after a few inference requests on g4dn.xlarge, the processes that serve the models just die.
Any idea why is that happening with the specific instance?
Information
Docker
The CLI directly
Tasks
An officially supported command
My own modifications
Reproduction
Deploy a model with TEI on g4dn and send a couple of hundred or thousand requests.
Expected behavior
I would expect for the processes to not die.
The text was updated successfully, but these errors were encountered:
Encountering the same issue here. To add to this, for some input I get embedding consisting of only [None,...] values just after deploying the endpoint in an inconsistant manner (for the same input I would get that or a valid embedding).
Then, after some load testing, the model just always return array of [None,..] so it seems to have definitely died.
System Info
Hello,
I'm using the AWS TEI Docker image (2.0.1-tei1.4.0-gpu-py310-cu122-ubuntu22.04) for text embeddings inference. When I deploy it on a SageMaker g4dn.xlarge instance, the process stops working after just a couple of requests. Strangely, the same setup runs smoothly on a g5 instance without any issues.
It looks like after a few inference requests on g4dn.xlarge, the processes that serve the models just die.
Any idea why is that happening with the specific instance?
Information
Tasks
Reproduction
Deploy a model with TEI on g4dn and send a couple of hundred or thousand requests.
Expected behavior
I would expect for the processes to not die.
The text was updated successfully, but these errors were encountered: