Reproduction issue of task GSM8K with Llama3.2-1B-Instruct #810

VoiceBeer · 2024-12-05T05:38:15Z

Hi thx for the work and new llama3.2 reproduction update. But I do encounter an issue with gsm8k reproduction.

I add gsm8k directory to my work_dir manually with .yaml file here,

and the command was
CUDA_VISIBLE_DEVICES=3 lm_eval --model hf --model_args pretrained=/data/models/meta-llama/Llama-3.2-1B-Instruct,dtype=auto,parallelize=False,add_bos_token=True --tasks meta_gsm8k --batch_size 4 --output_path eval_results_general --include_path llama32_1B_workdir --seed 42 --log_samples --fewshot_as_multiturn --apply_chat_template

The result I got is 0.4003, which is different than the official report of 44.4.

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
meta_gsm8k	3	flexible-extract	8	exact_match	↑	0.4003	±	0.0135
		strict-match	8	exact_match	↑	0.3942	±	0.0135

Is there any thing I missed? Thx

The text was updated successfully, but these errors were encountered:

wukaixingxp · 2024-12-05T22:31:34Z

@VoiceBeer Can you show me your gsmk-cot-llama.yaml file, your link is not working.

wukaixingxp · 2024-12-05T22:36:35Z

I noticed that this gsmk-cot-llama.yaml is not using our eval details dataset and it is from this PR in lm_eval, so I am not so sure why there is such a gap.

VoiceBeer · 2024-12-09T02:51:03Z

@wukaixingxp Hi, sorry for the delay! The URL is https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/gsm8k/gsm8k-cot-llama.yaml, and it does come from the lm-eval repo.

I've noticed that the llama-recipe repo does not currently contain a gsm8k config. So is there any way I can reproduce the reported gsm8k results?

wukaixingxp self-assigned this Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduction issue of task GSM8K with Llama3.2-1B-Instruct #810

Reproduction issue of task GSM8K with Llama3.2-1B-Instruct #810

VoiceBeer commented Dec 5, 2024

wukaixingxp commented Dec 5, 2024 •

edited

Loading

wukaixingxp commented Dec 5, 2024

VoiceBeer commented Dec 9, 2024

Reproduction issue of task GSM8K with Llama3.2-1B-Instruct #810

Reproduction issue of task GSM8K with Llama3.2-1B-Instruct #810

Comments

VoiceBeer commented Dec 5, 2024

wukaixingxp commented Dec 5, 2024 • edited Loading

wukaixingxp commented Dec 5, 2024

VoiceBeer commented Dec 9, 2024

wukaixingxp commented Dec 5, 2024 •

edited

Loading