Replies: 1 comment 4 replies
-
Hmm, thanks for lifting up a good point. I'm not sure as well how to best handle this. Another alternative is to temporarily set the tokenizer padding. Does lm_eval throw any more warnings when you set that collator? If anyone coming across knows how to deal with this, do let us know! |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When I set
do_causal_lm_eval: true
andeval_causal_lm_metrics: ['chrf']
andeval_sample_packing: false
in the config file, I kept get warnings about padding.warning:A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
Normally we train with right padding and inference with left padding, so I wonder if this is an issue.
I also try to fix this by using a new data collator called "LeftCollator" that explicitly set padding to left when processing the evaluation dataset.
Specifically, in
/src/axolotl/core/trainer_builder.py
, I set the collator to the "LeftCollator" whenis_eval
is true. I wonder if this is the correct practice.Beta Was this translation helpful? Give feedback.
All reactions