Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run evaluation on the validation set? #131

Open
anirudh-chakravarthy opened this issue Nov 12, 2024 · 1 comment
Open

How to run evaluation on the validation set? #131

anirudh-chakravarthy opened this issue Nov 12, 2024 · 1 comment

Comments

@anirudh-chakravarthy
Copy link

anirudh-chakravarthy commented Nov 12, 2024

Hi,

Is is possible to provide a set of instructions to run evaluation on the validation set?

From the README:

test_eval.json is used for evaluation. test_llama.json is used for training

However, when I run:

python demo.py --llama_dir /path/to/llama_model_weights --checkpoint /path/to/pre-trained/checkpoint.pth --data ../test_llama.json  --output ../output.json --batch_size 4 --num_processes 8

python evaluation.py --root_path1 ./output.json --root_path2 ./test_eval.json

I face this random UUID issue. I follow the exact instructions, so I'm not sure why this doesn't work.

For further diagnosis, following the FAQ which say I should run inference on the validation set, I ran:

python convert2llama.py

and changed this line to v1_1_val_nus_q_only.json and output to val_llama.json

And then did:

python demo.py --llama_dir /path/to/llama_model_weights --checkpoint /path/to/pre-trained/checkpoint.pth --data ../val_llama.json  --output ../output_val.json --batch_size 4 --num_processes 8

python evaluation.py --root_path1 ./output_val.json --root_path2 ./v1_1_val_nus_q_only.json

But this doesn't work either, and shows the same UUID error.

@ChonghaoSima
Copy link
Contributor

Could you post the UUID error here? Are you running eval on your local env or our test server?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants