-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
32k inference result is garbled #147
Comments
你好,因为您是用qlora进行的sft,请您尝试用 inference-qlora.py 进行inference,试一下效果。 |
推理CUDA_VISIBLE_DEVICES=0,1,2,3 /mnt/nvme1n1/zhang/venv/small/bin/python inference-qlora.py 用咱们提供的数据和开源llama-7b进行上下文扩展,这个复现有特定要求吗? |
你好, 请问您可以换成我用qlora训练的模型试一下,看看可以正常inference吗。来确认一下是fine-tune的问题,还是inference的问题。 https://huggingface.co/Yukang/LongAlpaca-7B-qlora-weights/tree/main 您下载这个weights之后,需要跑一下merge的程序,来得到完整的model. |
CUDA_VISIBLE_DEVICES=0,1,2,3 /mnt/nvme1n1/zhanglv/venv/small/bin/python inference-qlora.py |
这个模型在我这边推理是正常的,你可以检查一下text.txt的长度是否超过了32k. 或者直接用 https://huggingface.co/Yukang/LongAlpaca-7B 这个模型,这个不需要merge lora weights。 |
直接下载你的模型,也是乱码,这个很奇怪执行代码: 发现有个警告,是不是这个原因造成的呢?This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (4096). Depending on the model, you may observe exceptions, performance degradation, or nothing at all. |
或者可以发一下,你的那篇文章,Why doesn't Professor Snape seem to like Harry? 的输出答案?我用的文章是:https://github.com/amephraim/nlp/blob/master/texts/J.%20K.%20Rowling%20-%20Harry%20Potter%202%20-%20The%20Chamber%20Of%20Secrets.txt |
sft cmd:
/mnt/nvme0n1/zhang/venv/small_project/bin/torchrun --nproc_per_node=4 supervised-fine-tune-qlora.py
--model_name_or_path /mnt/nvme0n1/zhang/model/llama-2-7b-chat-hf
--bf16 True
--output_dir /mnt/nvme1n1/zhang/model/out/sft/llama-2-7b-chat-hf-qlore-20231120
--model_max_length 32768
--use_flash_attn True
--data_path /mnt/nvme1n1/zhang/data/LongAlpaca-12k.json
--low_rank_training True
--num_train_epochs 3
--per_device_train_batch_size 1
--per_device_eval_batch_size 2
--gradient_accumulation_steps 1
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 1000
--save_total_limit 2
--learning_rate 2e-5
--weight_decay 0.0
--warmup_steps 20
--lr_scheduler_type "constant_with_warmup"
--logging_steps 1
--deepspeed "ds_configs/stage2.json"
--tf32 True
inference cmd:
/mnt/nvme1n1/zhang/venv/small/bin/python inference.py
--base_model /mnt/nvme1n1/zhang/model/out/sft/llama-2-7b-chat-hf-qlore-20231120/model-merger
--question "Why doesn't Professor Snape seem to like Harry?"
--context_size 32768
--max_gen_len 512
--flash_attn True
--material "materials/test.txt"
output:
.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ...Љ.Љ.Љ..Љ.Љ..Љ...Љ..Љ.Љ.Љ.Љ.Љ..Љ.Љ.Љ..........................................Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.ЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉ.Љ.Љ.Љ.
The running result is garbled. how to process?
The text was updated successfully, but these errors were encountered: