I want to get a stable eval result, however, when I eval the same checkpoint, I found different batch size (such as 32, 64) get different result, I try to set generation_temperature==0, however , the results still changes when I change the batch size from 60 to 32, how to prevent this thiing happening ?
below is my command
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node 1 --master_port 11112 eval_base.py
--ckpt_dir ../data/weights/
--llm_model 7B
--tokenizer_path ../data/weights/tokenizer.model
--data_root ../data
--caption_file ../data/captions.json
--adapter_path ./xxxx/checkpoint-19.pth
--adapter_type attn
--adapter_dim 8
--adapter_scale 1
--prompt_format QCM-ALE
--max_batch_size 32
--max_seq_len 512
--split test
--n_prompt 50
--temperature 10.
--generation_temperature 0.
--visual_adapter_type router
--bits 4bit
--cpu_load