Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

32k inference result is garbled #147

Open
zhanglv0209 opened this issue Nov 21, 2023 · 8 comments
Open

32k inference result is garbled #147

zhanglv0209 opened this issue Nov 21, 2023 · 8 comments

Comments

@zhanglv0209
Copy link

sft cmd:
/mnt/nvme0n1/zhang/venv/small_project/bin/torchrun --nproc_per_node=4 supervised-fine-tune-qlora.py
--model_name_or_path /mnt/nvme0n1/zhang/model/llama-2-7b-chat-hf
--bf16 True
--output_dir /mnt/nvme1n1/zhang/model/out/sft/llama-2-7b-chat-hf-qlore-20231120
--model_max_length 32768
--use_flash_attn True
--data_path /mnt/nvme1n1/zhang/data/LongAlpaca-12k.json
--low_rank_training True
--num_train_epochs 3
--per_device_train_batch_size 1
--per_device_eval_batch_size 2
--gradient_accumulation_steps 1
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 1000
--save_total_limit 2
--learning_rate 2e-5
--weight_decay 0.0
--warmup_steps 20
--lr_scheduler_type "constant_with_warmup"
--logging_steps 1
--deepspeed "ds_configs/stage2.json"
--tf32 True

inference cmd:
/mnt/nvme1n1/zhang/venv/small/bin/python inference.py
--base_model /mnt/nvme1n1/zhang/model/out/sft/llama-2-7b-chat-hf-qlore-20231120/model-merger
--question "Why doesn't Professor Snape seem to like Harry?"
--context_size 32768
--max_gen_len 512
--flash_attn True
--material "materials/test.txt"

output:
.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ...Љ.Љ.Љ..Љ.Љ..Љ...Љ..Љ.Љ.Љ.Љ.Љ..Љ.Љ.Љ..........................................Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.Љ.ЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉЉ.Љ.Љ.Љ.

The running result is garbled. how to process?

@yukang2017
Copy link
Member

yukang2017 commented Nov 23, 2023

你好,因为您是用qlora进行的sft,请您尝试用 inference-qlora.py 进行inference,试一下效果。

@zhanglv0209
Copy link
Author

你好,因为您是用qlora进行的sft,请您尝试用 inference-qlora.py 进行inference,试一下效果。

推理

CUDA_VISIBLE_DEVICES=0,1,2,3 /mnt/nvme1n1/zhang/venv/small/bin/python inference-qlora.py
--base_model /mnt/nvme1n1/zhanglv/model/out/sft/llama-2-7b-chat-hf-qlore-20231120/model-merger
--question "Why doesn't Professor Snape seem to like Harry?"
--context_size 32768
--max_gen_len 512
--flash_attn True
--material "materials/test.txt"
运行结果:
image

用咱们提供的数据和开源llama-7b进行上下文扩展,这个复现有特定要求吗?

@yukang2017
Copy link
Member

你好,

请问您可以换成我用qlora训练的模型试一下,看看可以正常inference吗。来确认一下是fine-tune的问题,还是inference的问题。

https://huggingface.co/Yukang/LongAlpaca-7B-qlora-weights/tree/main

您下载这个weights之后,需要跑一下merge的程序,来得到完整的model.

@zhanglv0209
Copy link
Author

你好,

请问您可以换成我用qlora训练的模型试一下,看看可以正常inference吗。来确认一下是fine-tune的问题,还是inference的问题。

https://huggingface.co/Yukang/LongAlpaca-7B-qlora-weights/tree/main

您下载这个weights之后,需要跑一下merge的程序,来得到完整的model.

CUDA_VISIBLE_DEVICES=0,1,2,3 /mnt/nvme1n1/zhanglv/venv/small/bin/python inference-qlora.py
--base_model /mnt/nvme1n1/zhanglv/model/out/tmp/llama2-qlora-32-download/Llama-2-7b-longlora-8k-merged
--question "Why doesn't Professor Snape seem to like Harry?"
--context_size 32768
--max_gen_len 512
--flash_attn True
--material "materials/test.txt"

image 报错了

@zhanglv0209
Copy link
Author

zhanglv0209 commented Nov 24, 2023

CUDA_VISIBLE_DEVICES=0,1,2,3 /mnt/nvme1n1/zhanglv/venv/small/bin/python inference-qlora.py
--base_model /mnt/nvme1n1/zhanglv/model/out/tmp/llama2-qlora-32-download/Llama-2-7b-longlora-8k-merged
--question "Why doesn't Professor Snape seem to like Harry?"
--context_size 32768
--max_gen_len 512
--flash_attn True
--material "materials/test.txt"

把float16改成bfloat16,错误解决了。

使用你给的进行了model 合并,但是乱码还是一样的

image

@yukang2017
Copy link
Member

这个模型在我这边推理是正常的,你可以检查一下text.txt的长度是否超过了32k. 或者直接用 https://huggingface.co/Yukang/LongAlpaca-7B 这个模型,这个不需要merge lora weights。

@zhanglv0209
Copy link
Author

zhanglv0209 commented Nov 28, 2023

这个模型在我这边推理是正常的,你可以检查一下text.txt的长度是否超过了32k. 或者直接用 https://huggingface.co/Yukang/LongAlpaca-7B 这个模型,这个不需要merge lora weights。

直接下载你的模型,也是乱码,这个很奇怪

执行代码:
CUDA_VISIBLE_DEVICES=0,1,2,3 /mnt/nvme1n1/zhanglv/venv/small/bin/python inference-qlora2.py \ --base_model /mnt/nvme1n1/zhanglv/model/LongAlpaca-7B \ --question "Why Dobby didn't move" \ --context_size 32768 \ --max_gen_len 512 \ --flash_attn True \ --material "materials/test.txt"

image

发现有个警告,是不是这个原因造成的呢?

This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (4096). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.

@zhanglv0209
Copy link
Author

或者可以发一下,你的那篇文章,Why doesn't Professor Snape seem to like Harry? 的输出答案?我用的文章是:https://github.com/amephraim/nlp/blob/master/texts/J.%20K.%20Rowling%20-%20Harry%20Potter%202%20-%20The%20Chamber%20Of%20Secrets.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants