We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
text-generation-launcher 2.2.1-dev0
This one works docker run --rm -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -p 8080:80 --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 256g -v $PWD:/data --env PYTORCH_TUNABLEOP_ENABLED=0 --env HUGGINGFACE_HUB_CACHE=/data --env ROCM_USE_FLASH_ATTN_V2_TRITON=0 ghcr.io/huggingface/text-generation-inference:2.2.0-rocm --model-id=teknium/OpenHermes-2.5-Mistral-7B
This one fails docker run --rm -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -p 8080:80 --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 256g -v $PWD:/data --env PYTORCH_TUNABLEOP_ENABLED=0 --env HUGGINGFACE_HUB_CACHE=/data --env ROCM_USE_FLASH_ATTN_V2_TRITON=0 ghcr.io/huggingface/text-generation-inference:latest-rocm --model-id=teknium/OpenHermes-2.5-Mistral-7B
from text_generation_server.layers.attention.flashinfer import (
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/layers/attention/flashinfer.py", line 5, in
import flashinfer
ModuleNotFoundError: No module named 'flashinfer'
2024-09-13T18:58:43.898755Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
It should launch the server successfully.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
System Info
text-generation-launcher 2.2.1-dev0
Information
Tasks
Reproduction
This one works
docker run --rm -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -p 8080:80 --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 256g -v $PWD:/data --env PYTORCH_TUNABLEOP_ENABLED=0 --env HUGGINGFACE_HUB_CACHE=/data --env ROCM_USE_FLASH_ATTN_V2_TRITON=0 ghcr.io/huggingface/text-generation-inference:2.2.0-rocm --model-id=teknium/OpenHermes-2.5-Mistral-7B
This one fails
docker run --rm -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -p 8080:80 --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 256g -v $PWD:/data --env PYTORCH_TUNABLEOP_ENABLED=0 --env HUGGINGFACE_HUB_CACHE=/data --env ROCM_USE_FLASH_ATTN_V2_TRITON=0 ghcr.io/huggingface/text-generation-inference:latest-rocm --model-id=teknium/OpenHermes-2.5-Mistral-7B
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/layers/attention/flashinfer.py", line 5, in
ModuleNotFoundError: No module named 'flashinfer'
2024-09-13T18:58:43.898755Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
Expected behavior
It should launch the server successfully.
The text was updated successfully, but these errors were encountered: