quantization FP8 error #1438

aitss2017 · 2024-10-18T03:15:57Z

System Info

optimum-habana            1.14.0.dev0
HL-SMI Version:           hl-1.18.0-fw-53.1.1.1         
Driver Version:           1.18.0-ee698fb

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

HF_ENDPOINT=https://hf-mirror.com
QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_lm_eval.py
-o acc_yi34b_bs1_measure.txt
--model_name_or_path /mnt/disk1/Yi-34B
--attn_softmax_bf16
--use_hpu_graphs
--trim_logits
--use_kv_cache
--bucket_size=128
--bucket_internal
--use_flash_attention
--flash_attention_recompute
--bf16
--batch_size 1
--trust_remote_code

/usr/lib/python3.10/inspect.py:288: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
return isinstance(object, types.FunctionType)
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
10/18/2024 03:08:59 - WARNING - main - `trust_remote_code` is set, there is no guarantee this model works properly and it may fail
10/18/2024 03:08:59 - INFO - main - Single-device run.
2024-10-18 03:09:02 [WARNING][auto_accelerator.py:422] Auto detect accelerator: HPU_Accelerator.
2024-10-18 03:09:02 [INFO][utils.py:201] Preparation started.
2024-10-18 03:09:02 [INFO][quantize.py:160] Start to prepare model with fp8_quant.
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 1
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
PT_HPU_EAGER_PIPELINE_ENABLE = 1
PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
---------------------------: System Configuration :---------------------------
Num CPU Cores : 112
CPU RAM : 1056428680 KB

2024-10-18 03:09:10 [INFO][utils.py:201] Preparation end.
Initializing inference mode
10/18/2024 03:09:11 - INFO - main - Args: Namespace(buckets=[16, 32, 64, 128, 189, 284], output_file='acc_bloomz7b_bs1_measure.txt', tasks=['hellaswag', 'lambada_openai', 'piqa', 'winogrande'], limit_iters=None, device='hpu', model_name_or_path='/DISK0/bloomz-7b1', bf16=True, max_new_tokens=100, max_input_tokens=0, batch_size=1, warmup=3, n_iterations=5, local_rank=0, use_kv_cache=True, use_hpu_graphs=True, dataset_name=None, column_name=None, do_sample=False, num_beams=1, top_k=None, penalty_alpha=None, trim_logits=True, seed=27, profiling_warmup_steps=0, profiling_steps=0, profiling_record_shapes=False, prompt=None, bad_words=None, force_words=None, assistant_model=None, peft_model=None, num_return_sequences=1, token=None, model_revision='main', attn_softmax_bf16=True, output_dir=None, bucket_size=128, bucket_internal=True, dataset_max_samples=-1, limit_hpu_graphs=False, show_graphs_count=False, reuse_cache=False, verbose_workers=False, simulate_dyn_prompt=None, reduce_recompile=False, use_flash_attention=True, flash_attention_recompute=True, flash_attention_causal_mask=False, flash_attention_fast_softmax=True, book_source=False, torch_compile=False, ignore_eos=True, temperature=1.0, top_p=1.0, const_serialization_path=None, trust_remote_code=True, parallel_strategy='none', input_embeds=False, run_partial_dataset=False, load_quantized_model_with_autogptq=False, disk_offload=False, load_quantized_model_with_inc=False, local_quantized_inc_model_path=None, quant_config='./quantization_config/maxabs_measure.json', world_size=0, global_rank=0)
10/18/2024 03:09:11 - INFO - main - device: hpu, n_hpu: 0, bf16: True
10/18/2024 03:09:11 - INFO - main - Model initialization took 13.380s
Traceback (most recent call last):
File "/home/intel/optimum-habana/examples/text-generation/run_lm_eval.py", line 231, in
main()
File "/home/intel/optimum-habana/examples/text-generation/run_lm_eval.py", line 198, in main
lm_tasks = lm_eval.tasks.get_task_dict(args.tasks)
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/tasks/init.py", line 415, in get_task_dict
task_name_dict = {
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/tasks/init.py", line 416, in
task_name: get_task(task_name)()
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/base.py", line 513, in init
self.download(data_dir, cache_dir, download_mode)
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/base.py", line 542, in download
self.dataset = datasets.load_dataset(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2606, in load_dataset
builder_instance = load_dataset_builder(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2277, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1923, in dataset_module_factory
raise e1 from None
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1875, in dataset_module_factory
can_load_config_from_parquet_export = "DEFAULT_CONFIG_NAME" not in f.read()
File "/usr/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Expected behavior

quantization successfully running

The text was updated successfully, but these errors were encountered:

regisss · 2024-10-20T12:31:22Z

I can't reproduce this. Can you try to update your main branch and try again please?

aitss2017 added the bug Something isn't working label Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantization FP8 error #1438

quantization FP8 error #1438

aitss2017 commented Oct 18, 2024

regisss commented Oct 20, 2024

quantization FP8 error #1438

quantization FP8 error #1438

Comments

aitss2017 commented Oct 18, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

regisss commented Oct 20, 2024