Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantization FP8 error #1438

Open
2 of 4 tasks
aitss2017 opened this issue Oct 18, 2024 · 1 comment
Open
2 of 4 tasks

quantization FP8 error #1438

aitss2017 opened this issue Oct 18, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@aitss2017
Copy link

System Info

optimum-habana            1.14.0.dev0
HL-SMI Version:           hl-1.18.0-fw-53.1.1.1         
Driver Version:           1.18.0-ee698fb

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

HF_ENDPOINT=https://hf-mirror.com
QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_lm_eval.py
-o acc_yi34b_bs1_measure.txt
--model_name_or_path /mnt/disk1/Yi-34B
--attn_softmax_bf16
--use_hpu_graphs
--trim_logits
--use_kv_cache
--bucket_size=128
--bucket_internal
--use_flash_attention
--flash_attention_recompute
--bf16
--batch_size 1
--trust_remote_code

/usr/lib/python3.10/inspect.py:288: FutureWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
return isinstance(object, types.FunctionType)
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
10/18/2024 03:08:59 - WARNING - main - trust_remote_code is set, there is no guarantee this model works properly and it may fail
10/18/2024 03:08:59 - INFO - main - Single-device run.
2024-10-18 03:09:02 [WARNING][auto_accelerator.py:422] Auto detect accelerator: HPU_Accelerator.
2024-10-18 03:09:02 [INFO][utils.py:201] Preparation started.
2024-10-18 03:09:02 [INFO][quantize.py:160] Start to prepare model with fp8_quant.
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 1
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
PT_HPU_EAGER_PIPELINE_ENABLE = 1
PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
---------------------------: System Configuration :---------------------------
Num CPU Cores : 112
CPU RAM : 1056428680 KB

2024-10-18 03:09:10 [INFO][utils.py:201] Preparation end.
Initializing inference mode
10/18/2024 03:09:11 - INFO - main - Args: Namespace(buckets=[16, 32, 64, 128, 189, 284], output_file='acc_bloomz7b_bs1_measure.txt', tasks=['hellaswag', 'lambada_openai', 'piqa', 'winogrande'], limit_iters=None, device='hpu', model_name_or_path='/DISK0/bloomz-7b1', bf16=True, max_new_tokens=100, max_input_tokens=0, batch_size=1, warmup=3, n_iterations=5, local_rank=0, use_kv_cache=True, use_hpu_graphs=True, dataset_name=None, column_name=None, do_sample=False, num_beams=1, top_k=None, penalty_alpha=None, trim_logits=True, seed=27, profiling_warmup_steps=0, profiling_steps=0, profiling_record_shapes=False, prompt=None, bad_words=None, force_words=None, assistant_model=None, peft_model=None, num_return_sequences=1, token=None, model_revision='main', attn_softmax_bf16=True, output_dir=None, bucket_size=128, bucket_internal=True, dataset_max_samples=-1, limit_hpu_graphs=False, show_graphs_count=False, reuse_cache=False, verbose_workers=False, simulate_dyn_prompt=None, reduce_recompile=False, use_flash_attention=True, flash_attention_recompute=True, flash_attention_causal_mask=False, flash_attention_fast_softmax=True, book_source=False, torch_compile=False, ignore_eos=True, temperature=1.0, top_p=1.0, const_serialization_path=None, trust_remote_code=True, parallel_strategy='none', input_embeds=False, run_partial_dataset=False, load_quantized_model_with_autogptq=False, disk_offload=False, load_quantized_model_with_inc=False, local_quantized_inc_model_path=None, quant_config='./quantization_config/maxabs_measure.json', world_size=0, global_rank=0)
10/18/2024 03:09:11 - INFO - main - device: hpu, n_hpu: 0, bf16: True
10/18/2024 03:09:11 - INFO - main - Model initialization took 13.380s
Traceback (most recent call last):
File "/home/intel/optimum-habana/examples/text-generation/run_lm_eval.py", line 231, in
main()
File "/home/intel/optimum-habana/examples/text-generation/run_lm_eval.py", line 198, in main
lm_tasks = lm_eval.tasks.get_task_dict(args.tasks)
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/tasks/init.py", line 415, in get_task_dict
task_name_dict = {
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/tasks/init.py", line 416, in
task_name: get_task(task_name)()
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/base.py", line 513, in init
self.download(data_dir, cache_dir, download_mode)
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/base.py", line 542, in download
self.dataset = datasets.load_dataset(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2606, in load_dataset
builder_instance = load_dataset_builder(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2277, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1923, in dataset_module_factory
raise e1 from None
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1875, in dataset_module_factory
can_load_config_from_parquet_export = "DEFAULT_CONFIG_NAME" not in f.read()
File "/usr/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Expected behavior

quantization successfully running

@aitss2017 aitss2017 added the bug Something isn't working label Oct 18, 2024
@regisss
Copy link
Collaborator

regisss commented Oct 20, 2024

I can't reproduce this. Can you try to update your main branch and try again please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants