Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Request: Baichuan-M1-14B on Core Ultra #12810

Closed
KiwiHana opened this issue Feb 11, 2025 · 4 comments
Closed

Support Request: Baichuan-M1-14B on Core Ultra #12810

KiwiHana opened this issue Feb 11, 2025 · 4 comments
Assignees

Comments

@KiwiHana
Copy link

Hello, I try to run Baichuan-M1-14B on ARL-H windows 11, but failed. The following is some commands and error log.

  1. Download Baichuan-M1-14B
    modelscope download --model baichuan-inc/Baichuan-M1-14B-Instruct README.md --local_dir ./dir

  2. install env

conda create -n ipex-2.6 python=3.10 libuv
conda activate ipex-2.6
pip install --pre --upgrade ipex-llm[xpu_2.6] --extra-index-url https://download.pytorch.org/whl/xpu
set SYCL_CACHE_PERSISTENT=1 
set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 

python run_14b.py

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn, einops. Run pip install flash_attn einops

pip install flash_attn einops

Image

run_14b.py

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# 1. Load pre-trained model and tokenizer
model_name = "./Baichuan-M1-14B-Instruct"  
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True) ## "sdpa"   "eager"
model = AutoModelForCausalLM.from_pretrained(model_name,load_in_4bit=True,attn_implementation="sdpa",cpu_embedding=True,trust_remote_code=True)

if 1:
    model.save_low_bit(model_path +  "-int4/")
    tokenizer.save_pretrained(model_path + "-int4/")

model = model.to('xpu')
print('Successfully loaded Tokenizer and optimized Model!')

# 2. Input prompt text
prompt = "May I ask you some questions about medical knowledge?"

# 3. Encode the input text for the model
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# 4. Generate text
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

# 5. Decode the generated text
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]


# 6. Output the result
print("Generated text:")
print(response)
@KiwiHana
Copy link
Author

KiwiHana commented Feb 12, 2025

follow this way #12808

pip install --pre --upgrade ipex-llm[xpu_2.6] --extra-index-url https://download.pytorch.org/whl/xpu
pip install --pre --upgrade ipex-llm

Package            Version
------------------ --------------
accelerate         0.23.0
bigdl-core-xe-all  2.6.0b20250209
certifi            2025.1.31
charset-normalizer 3.4.1
colorama           0.4.6
dpcpp-cpp-rt       2025.0.2
einops             0.8.1
filelock           3.17.0
fsspec             2025.2.0
huggingface-hub    0.28.1
idna               3.10
intel-cmplr-lib-rt 2025.0.2
intel-cmplr-lib-ur 2025.0.2
intel-cmplr-lic-rt 2025.0.2
intel-opencl-rt    2025.0.2
intel-openmp       2025.0.2
intel-sycl-rt      2025.0.2
ipex-llm           2.2.0b20250211
Jinja2             3.1.5
MarkupSafe         3.0.2
mpmath             1.3.0
networkx           3.4.2
numpy              1.26.4
onednn             2025.0.1
onednn-devel       2025.0.1
packaging          24.2
pillow             11.1.0
pip                25.0.1
protobuf           6.30.0rc1
psutil             6.1.1
py-cpuinfo         9.0.0
PyYAML             6.0.2
regex              2024.11.6
requests           2.32.3
safetensors        0.5.2
sentencepiece      0.2.0
setuptools         75.8.0
sympy              1.13.1
tabulate           0.9.0
tbb                2022.0.0
tcmlib             1.2.0
tokenizers         0.15.2
torch              2.6.0+xpu
torchaudio         2.6.0+xpu
torchvision        0.21.0+xpu
tqdm               4.67.1
transformers       4.37.0
typing_extensions  4.12.2
umf                0.9.1
urllib3            2.3.0
wheel              0.45.1

run_baichun-m1-14b.py

from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
import torch
# 1. Load pre-trained model and tokenizer
model_name = "./Baichuan-M1-14B-Instruct"  

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype = torch.half, load_in_low_bit='sym_int4',trust_remote_code=True).eval()

(llm-arl-2.6) C:\Users\Lengda\Documents\AIGC_ov20250123\resources\service\models>python run_baichun-m1-14b.py

C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
You are using a model of type baichuan_m1 to instantiate a model of type baichuan. This is not supported for all configurations of models and can yield errors.
Traceback (most recent call last):
  File "C:\Users\Lengda\Documents\AIGC_ov20250123\resources\service\models\run_baichun-m1-14b.py", line 7, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype = torch.half, load_in_low_bit='sym_int4',trust_remote_code=True).eval()
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\unittest\mock.py", line 1379, in patched
    return func(*newargs, **newkeywargs)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\ipex_llm\transformers\model.py", line 349, in from_pretrained
    model = cls.load_convert(q_k, optimize_model, *args, **kwargs)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\ipex_llm\transformers\model.py", line 493, in load_convert
    model = cls.HF_Model.from_pretrained(*args, **kwargs)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\models\auto\auto_factory.py", line 553, in from_pretrained
    model_class = get_class_from_dynamic_module(
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\dynamic_module_utils.py", line 500, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module.replace(".py", ""))
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\dynamic_module_utils.py", line 200, in get_class_in_module
    module = importlib.import_module(module_path)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "C:\Users\Lengda\.cache\huggingface\modules\transformers_modules\Baichuan-M1-14B-Instruct\modeling_baichuan.py", line 11, in <module>
    from transformers import add_start_docstrings, PreTrainedModel, DynamicCache, \
ImportError: cannot import name 'StaticCache' from 'transformers' (C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\__init__.py)

@MeouSker77
Copy link
Contributor

follow this way #12808

use transformers 4.45

pip install transformers==4.45 trl==0.11

@KiwiHana
Copy link
Author

(llm-arl-2.6) C:\Users\Lengda\Documents\AIGC_ov20250123\resources\service\models>python run_baichun-m1-14b.py

You are using a model of type baichuan_m1 to instantiate a model of type baichuan. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 6/6 [00:21<00:00,  3.61s/it]
2025-02-12 14:02:31,804 - INFO - Converting the current model to sym_int4 format......
Successfully loaded Tokenizer and optimized Model!
Traceback (most recent call last):
  File "C:\Users\Lengda\Documents\AIGC_ov20250123\resources\service\models\run_baichun-m1-14b.py", line 33, in <module>
    generated_ids = model.generate(
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\ipex_llm\transformers\lookup.py", line 125, in generate
    return original_generate(self,
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\ipex_llm\transformers\speculative.py", line 127, in generate
    return original_generate(self,
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\ipex_llm\transformers\pipeline_parallel.py", line 283, in generate
    return original_generate(self,
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\generation\utils.py", line 1829, in generate
    self._prepare_special_tokens(generation_config, kwargs_has_attention_mask, device=device)
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\generation\utils.py", line 1678, in _prepare_special_tokens
    and isin_mps_friendly(elements=eos_token_tensor, test_elements=pad_token_tensor).any()
  File "C:\ProgramData\miniforge3\envs\llm-arl-2.6\lib\site-packages\transformers\pytorch_utils.py", line 328, in isin_mps_friendly
    return torch.isin(elements, test_elements)
RuntimeError: Invalid value for bool configuration variable SYCL_CACHE_PERSISTENT: 1

@KiwiHana
Copy link
Author

set SYCL_CACHE_PERSISTENT=1
set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants