Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading documentation #174

Open
12010486 opened this issue Jul 2, 2024 · 4 comments
Open

Misleading documentation #174

12010486 opened this issue Jul 2, 2024 · 4 comments

Comments

@12010486
Copy link

12010486 commented Jul 2, 2024

Hi everyone,

I can see there has been a recent effort to add more documentation on TGI, and I appreciate it. However, there are some sections that are misleading, for example:
docs/source/conceptual/quantization.md
it is describing Quantization with GPTQ and Quantization with bitsandbytes, but to the best of my knowledge, this is not working on Gaudi2 (we tested bitsandbytes and cuda calls are hardcoded in there).

My ask would be if you can prune the bites that are not relevant for Gaudi

@12010486
Copy link
Author

12010486 commented Jul 2, 2024

I can also contribute, if you find it relevant. We interact with customers, so might bring in a different perspective

@12010486 12010486 changed the title Documentation misleading Misleading documentation Jul 2, 2024
@regisss
Copy link
Collaborator

regisss commented Jul 12, 2024

Maybe we should make it clearer in the README that not all features of TGI are supported on Gaudi and that the doc for this fork is the README.

@endomorphosis
Copy link

endomorphosis commented Jul 28, 2024

I came here to chime in that the documentation is wrong

this example crashes during warmup

docker run -p 8080:80
--runtime=habana
-v $volume:/data
-e HABANA_VISIBLE_DEVICES=all
-e OMPI_MCA_btl_vader_single_copy_mechanism=none
-e HF_HUB_ENABLE_HF_TRANSFER=1
-e HUGGING_FACE_HUB_TOKEN=$hf_token
-e PT_HPU_ENABLE_LAZY_COLLECTIVES=true
-e PREFILL_BATCH_BUCKET_SIZE=1
-e BATCH_BUCKET_SIZE=256
-e PAD_SEQUENCE_TO_MULTIPLE_OF=128
--cap-add=sys_nice
--ipc=host
ghcr.io/huggingface/tgi-gaudi:2.0.1
--model-id $model
--max-batch-prefill-tokens 8242
--max-input-tokens 4096
--max-total-tokens 8192
--max-batch-size 256
--max-concurrent-requests 400
--sharded true
--num-shard 8

@regisss
Copy link
Collaborator

regisss commented Jul 28, 2024

@endomorphosis Can you please point me at where you find this example in the documentation? I can't find it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants