when eval the same checkpoint, different batch size get different result, how to handle this problem?

I want to get a stable eval result, however, when I eval the same checkpoint, I found different batch size (such as 32, 64) get different result,  I try to set generation_temperature==0, however , the results still changes when I change the batch size from 60 to 32, how to prevent this thiing happening ?

below is my command

> CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node 1 --master_port 11112 eval_base.py \
    --ckpt_dir ../data/weights/ \
    --llm_model 7B\
    --tokenizer_path ../data/weights/tokenizer.model \
    --data_root ../data \
    --caption_file ../data/captions.json \
    --adapter_path ./xxxx/checkpoint-19.pth \
    --adapter_type attn \
    --adapter_dim 8 \
    --adapter_scale 1 \
    --prompt_format QCM-ALE \
    --max_batch_size 32\
    --max_seq_len 512 \
    --split test \
    --n_prompt 50 \
    --temperature 10.\
    --generation_temperature 0.\
    --visual_adapter_type router\
    --bits 4bit \
    --cpu_load

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

when eval the same checkpoint, different batch size get different result, how to handle this problem? #33

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

when eval the same checkpoint, different batch size get different result, how to handle this problem? #33

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions