Support Request: Baichuan-M1-14B on Core Ultra #12810

KiwiHana · 2025-02-11T10:50:02Z

Hello, I try to run Baichuan-M1-14B on ARL-H windows 11, but failed. The following is some commands and error log.

Download Baichuan-M1-14B
modelscope download --model baichuan-inc/Baichuan-M1-14B-Instruct README.md --local_dir ./dir
install env

conda create -n ipex-2.6 python=3.10 libuv
conda activate ipex-2.6
pip install --pre --upgrade ipex-llm[xpu_2.6] --extra-index-url https://download.pytorch.org/whl/xpu
set SYCL_CACHE_PERSISTENT=1 
set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1

python run_14b.py

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn, einops. Run pip install flash_attn einops

pip install flash_attn einops

run_14b.py

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# 1. Load pre-trained model and tokenizer
model_name = "./Baichuan-M1-14B-Instruct"  
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True) ## "sdpa"   "eager"
model = AutoModelForCausalLM.from_pretrained(model_name,load_in_4bit=True,attn_implementation="sdpa",cpu_embedding=True,trust_remote_code=True)

if 1:
    model.save_low_bit(model_path +  "-int4/")
    tokenizer.save_pretrained(model_path + "-int4/")

model = model.to('xpu')
print('Successfully loaded Tokenizer and optimized Model!')

# 2. Input prompt text
prompt = "May I ask you some questions about medical knowledge?"

# 3. Encode the input text for the model
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# 4. Generate text
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

# 5. Decode the generated text
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]


# 6. Output the result
print("Generated text:")
print(response)

The text was updated successfully, but these errors were encountered:

KiwiHana · 2025-02-12T04:37:14Z

follow this way #12808

pip install --pre --upgrade ipex-llm[xpu_2.6] --extra-index-url https://download.pytorch.org/whl/xpu
pip install --pre --upgrade ipex-llm

Package            Version
------------------ --------------
accelerate         0.23.0
bigdl-core-xe-all  2.6.0b20250209
certifi            2025.1.31
charset-normalizer 3.4.1
colorama           0.4.6
dpcpp-cpp-rt       2025.0.2
einops             0.8.1
filelock           3.17.0
fsspec             2025.2.0
huggingface-hub    0.28.1
idna               3.10
intel-cmplr-lib-rt 2025.0.2
intel-cmplr-lib-ur 2025.0.2
intel-cmplr-lic-rt 2025.0.2
intel-opencl-rt    2025.0.2
intel-openmp       2025.0.2
intel-sycl-rt      2025.0.2
ipex-llm           2.2.0b20250211
Jinja2             3.1.5
MarkupSafe         3.0.2
mpmath             1.3.0
networkx           3.4.2
numpy              1.26.4
onednn             2025.0.1
onednn-devel       2025.0.1
packaging          24.2
pillow             11.1.0
pip                25.0.1
protobuf           6.30.0rc1
psutil             6.1.1
py-cpuinfo         9.0.0
PyYAML             6.0.2
regex              2024.11.6
requests           2.32.3
safetensors        0.5.2
sentencepiece      0.2.0
setuptools         75.8.0
sympy              1.13.1
tabulate           0.9.0
tbb                2022.0.0
tcmlib             1.2.0
tokenizers         0.15.2
torch              2.6.0+xpu
torchaudio         2.6.0+xpu
torchvision        0.21.0+xpu
tqdm               4.67.1
transformers       4.37.0
typing_extensions  4.12.2
umf                0.9.1
urllib3            2.3.0
wheel              0.45.1

run_baichun-m1-14b.py

from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
import torch
# 1. Load pre-trained model and tokenizer
model_name = "./Baichuan-M1-14B-Instruct"  

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype = torch.half, load_in_low_bit='sym_int4',trust_remote_code=True).eval()

(llm-arl-2.6) C:\Users\Lengda\Documents\AIGC_ov20250123\resources\service\models>python run_baichun-m1-14b.py

C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
You are using a model of type baichuan_m1 to instantiate a model of type baichuan. This is not supported for all configurations of models and can yield errors.
Traceback (most recent call last):
  File "C:\Users\Lengda\Documents\AIGC_ov20250123\resources\service\models\run_baichun-m1-14b.py", line 7, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype = torch.half, load_in_low_bit='sym_int4',trust_remote_code=True).eval()
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\unittest\mock.py", line 1379, in patched
    return func(*newargs, **newkeywargs)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\ipex_llm\transformers\model.py", line 349, in from_pretrained
    model = cls.load_convert(q_k, optimize_model, *args, **kwargs)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\ipex_llm\transformers\model.py", line 493, in load_convert
    model = cls.HF_Model.from_pretrained(*args, **kwargs)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\models\auto\auto_factory.py", line 553, in from_pretrained
    model_class = get_class_from_dynamic_module(
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\dynamic_module_utils.py", line 500, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module.replace(".py", ""))
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\dynamic_module_utils.py", line 200, in get_class_in_module
    module = importlib.import_module(module_path)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "C:\Users\Lengda\.cache\huggingface\modules\transformers_modules\Baichuan-M1-14B-Instruct\modeling_baichuan.py", line 11, in <module>
    from transformers import add_start_docstrings, PreTrainedModel, DynamicCache, \
ImportError: cannot import name 'StaticCache' from 'transformers' (C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\__init__.py)

MeouSker77 · 2025-02-12T05:44:48Z

follow this way #12808

use transformers 4.45

pip install transformers==4.45 trl==0.11

KiwiHana · 2025-02-12T06:06:20Z

(llm-arl-2.6) C:\Users\Lengda\Documents\AIGC_ov20250123\resources\service\models>python run_baichun-m1-14b.py

You are using a model of type baichuan_m1 to instantiate a model of type baichuan. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 6/6 [00:21<00:00,  3.61s/it]
2025-02-12 14:02:31,804 - INFO - Converting the current model to sym_int4 format......
Successfully loaded Tokenizer and optimized Model!
Traceback (most recent call last):
  File "C:\Users\Lengda\Documents\AIGC_ov20250123\resources\service\models\run_baichun-m1-14b.py", line 33, in <module>
    generated_ids = model.generate(
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\ipex_llm\transformers\lookup.py", line 125, in generate
    return original_generate(self,
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\ipex_llm\transformers\speculative.py", line 127, in generate
    return original_generate(self,
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\ipex_llm\transformers\pipeline_parallel.py", line 283, in generate
    return original_generate(self,
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\generation\utils.py", line 1829, in generate
    self._prepare_special_tokens(generation_config, kwargs_has_attention_mask, device=device)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\generation\utils.py", line 1678, in _prepare_special_tokens
    and isin_mps_friendly(elements=eos_token_tensor, test_elements=pad_token_tensor).any()
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\pytorch_utils.py", line 328, in isin_mps_friendly
    return torch.isin(elements, test_elements)
RuntimeError: Invalid value for bool configuration variable SYCL_CACHE_PERSISTENT: 1

KiwiHana · 2025-02-14T04:54:13Z

set SYCL_CACHE_PERSISTENT=1
set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1

Thanks！

glorysdj assigned MeouSker77 Feb 12, 2025

glorysdj added the user issue label Feb 12, 2025

KiwiHana closed this as completed Feb 14, 2025

KiwiHana mentioned this issue Feb 14, 2025

Baichuan-M1-14B can save int4 model, but load low bit failed. #12824

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Request: Baichuan-M1-14B on Core Ultra #12810

Support Request: Baichuan-M1-14B on Core Ultra #12810

KiwiHana commented Feb 11, 2025

KiwiHana commented Feb 12, 2025 •

edited

Loading

MeouSker77 commented Feb 12, 2025

KiwiHana commented Feb 12, 2025

KiwiHana commented Feb 14, 2025

Support Request: Baichuan-M1-14B on Core Ultra #12810

Support Request: Baichuan-M1-14B on Core Ultra #12810

Comments

KiwiHana commented Feb 11, 2025

KiwiHana commented Feb 12, 2025 • edited Loading

MeouSker77 commented Feb 12, 2025

KiwiHana commented Feb 12, 2025

KiwiHana commented Feb 14, 2025

KiwiHana commented Feb 12, 2025 •

edited

Loading