New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

How can I use the Llama-2-7b-longlora-100k-ft model correctly #166

Open

seanxuu opened this issue Jan 18, 2024 · 0 comments

seanxuu commented Jan 18, 2024

I used some A100 gpus to get this model to load successfully, but it doesn't output properly(Its output is blank Or random characters).
my command:

export CUDA_VISIBLE_DEVICES =3,4,5  python3 inference.py  \
        --base_model /models/Llama-2-7b-longlora-100k-ft \
        --question "Why doesn't Professor Snape seem to like Harry?" \
        --context_size 100000 \
        --max_gen_len 512 \
        --flash_attn True \
        --material "materials/Harry Potter and The Order of the Phoenix.txt"

Here is a record of my experiment
10000token input + PROMPT_DICT["prompt_llama2"]:

10000token input + PROMPT_DICT["prompt_no_input"]:

10000token input+ PROMPT_DICT["prompt_no_input_llama2"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment