Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tuple object error #1354

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SupreetSinghPalne
Copy link

@SupreetSinghPalne SupreetSinghPalne commented Sep 23, 2024

What does this PR do?

Summary- (you can reproduce it from here)

Issue is not seen on 1 HPU.
Issue is seen with OptimumHabana v1.13 + with 8 HPU
Issue not seen with OptimumHabana v1.12 with 8 HPU

Replicate:

Reserve Gaudi2 with driver 1.17-495
docker pull vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:1.17.0-495
git clone GitHub - huggingface/optimum-habana: Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
git checkout v1.13.2

docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --rm --cap-add=sys_nice --net=host --ipc=host -v $PWD:/root -v $PWD/data:/data --workdir=/root/ vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:1.17.0-495

export HTTPS_PROXY=http://proxy-dmz.intel.com:912/

cd optimum-habana
python -m pip install .

cd examples/text-generation/
python -m pip install -r ./requirements.txt
python -m pip install -r ./requirements_lm_eval.txt
python -m pip install git+https://github.com/HabanaAI/[email protected] your Github account
pip install datasets==2.19.2
huggingface-cli login --token $HF_TOKEN

QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py
--use_deepspeed --world_size 8 run_generation.py
--model_name_or_path bigcode/starcoder2-15b
--attn_softmax_bf16
--use_hpu_graphs
--trust_remote_code
--trim_logits
--use_kv_cache
--bucket_size 128
--bucket_internal
--use_flash_attention
--flash_attention_recompute
--max_new_tokens 128
--batch_size 1
--bf16

Error: [rank0]: TypeError: 'tuple' object does not support item assignment

@regisss
Copy link
Collaborator

regisss commented Sep 24, 2024

@SupreetSinghPalne Please add a description of this issue and a code snippet to reproduce it if possible. I cannot access https://habana.atlassian.net/browse/HS-3349.

@pk1d3v
Copy link
Contributor

pk1d3v commented Sep 24, 2024

@SupreetSinghPalne, this is common code which is shared across many models.
So, please make sure you are not breaking any other models that rely on this code. And also I'm wondering, why other models works fine without this change.

@SupreetSinghPalne
Copy link
Author

SupreetSinghPalne commented Sep 24, 2024

@SupreetSinghPalne Please add a description of this issue and a code snippet to reproduce it if possible. I cannot access

Summary-

Issue is not seen on 1 HPU.
Issue is seen with OptimumHabana v1.13 + with 8 HPU
Issue not seen with OptimumHabana v1.12 with 8 HPU

Replicate:

Reserve Gaudi2 with driver 1.17-495
docker pull vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:1.17.0-495
git clone GitHub - huggingface/optimum-habana: Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
git checkout v1.13.2

docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --rm --cap-add=sys_nice --net=host --ipc=host -v $PWD:/root -v $PWD/data:/data --workdir=/root/ vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:1.17.0-495

export HTTPS_PROXY=http://proxy-dmz.intel.com:912/

cd optimum-habana
python -m pip install .

cd examples/text-generation/
python -m pip install -r ./requirements.txt
python -m pip install -r ./requirements_lm_eval.txt
python -m pip install git+https://github.com/HabanaAI/[email protected] your Github account
pip install datasets==2.19.2
huggingface-cli login --token $HF_TOKEN

QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py
--use_deepspeed --world_size 8 run_generation.py
--model_name_or_path bigcode/starcoder2-15b
--attn_softmax_bf16
--use_hpu_graphs
--trust_remote_code
--trim_logits
--use_kv_cache
--bucket_size 128
--bucket_internal
--use_flash_attention
--flash_attention_recompute
--max_new_tokens 128
--batch_size 1
--bf16

Error: [rank0]: TypeError: 'tuple' object does not support item assignment

@mgonchar
Copy link
Contributor

hi @SupreetSinghPalne I think this model doesn't support internal bucketing.
Often model should be patched to support it, see for example this PR #1137

So I'd say you need similar changes here https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/transformers/models/starcoder2/modeling_starcoder2.py#L286 but not in the common code.

Also, why do you use internal bucketing for this model? It supports another key named reuse_cache with similar effect

@SupreetSinghPalne
Copy link
Author

SupreetSinghPalne commented Sep 25, 2024

@SupreetSinghPalne, this is common code which is shared across many models. So, please make sure you are not breaking any other models that rely on this code. And also I'm wondering, why other models works fine without this change.

I checked with other models so they work, so i will not change the common code but need to change the modeling_starcoder2.py file, which fixes this problem, thank you for your review.

@SupreetSinghPalne
Copy link
Author

hi @SupreetSinghPalne I think this model doesn't support internal bucketing. Often model should be patched to support it, see for example this PR #1137

So I'd say you need similar changes here https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/transformers/models/starcoder2/modeling_starcoder2.py#L286 but not in the common code.

Also, why do you use internal bucketing for this model? It supports another key named reuse_cache with similar effect

Right, common code was not supposed to be changed. So changed the modeling_starcoder2.py file accordingly and it fixes the error of tuple. I have made the changes. Thank you for the review.

@SupreetSinghPalne
Copy link
Author

I have updated the description @regisss

@SupreetSinghPalne
Copy link
Author

@regisss can you take a look and help this PR merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants