pip install vllm --no-build-isolation
pip install transformers
# simple-evals
cd human-eval
pip install -e human-eval
pip install openai
# math evaluations
cd ../Qwen2.5-Math/evaluation/latex2sympy
pip install -e .
cd ..
pip install -r requirements.txt
# livecodebench
cd ../../LiveCodeBench
pip install -e .
vllm serve /workspace/HuggingFace-Download-Accelerator/qwen25/models--Qwen--Qwen2.5-7B-Instruct
simple_evals.py需要修改的地方:
line 36: models 里面加你要评测的模型
line 113: grading_sampler = models["qwen2.5-7b"]和 line 114: equality_checker = models["qwen2.5-7b"]改成你想用的模型, 原本的代码用的是gpt-4o
line 156: 设置你想评测的数据集
python -m simple-evals.simple_evals --model qwen2.5-7b --debug
You can evaluate Qwen2.5/Qwen2-Math-Instruct series model with the following command:
cd Qwen2.5-Math/evaluation
# Qwen2.5-Math-Instruct Series
PROMPT_TYPE="qwen25-math-cot"
# Qwen2.5-Math-7B-Instruct
export CUDA_VISIBLE_DEVICES="0"
MODEL_NAME_OR_PATH="Qwen/Qwen2.5-Math-7B-Instruct"
bash sh/eval.sh $PROMPT_TYPE $MODEL_NAME_OR_PATH
To run local inference, modify the model name in the following script and execute it:
cd MMLU-Pro/scripts/examples/
sh eval_llama_2_7b.sh
To use the API for inference, modify the API KEY in evaluate_from_api.py script and execute the bash script:
cd MMLU-Pro/scripts/examples/
sh eval_gpt_4.sh
follow the instructions in ./benchmarks/LiveCodeBench/README.md
run this command once and ignore the errors.
pip install poetry
poetry install
Then directly run evaluation.
run the jupyter notebook ./arc_agi/few-shot-prompting-for-arc24.ipynb
in progress