Hi, thanks for lm-eval-harness.
Right now if an example exceeds max_length, it gets truncated and still evaluated. Would it be possible to add an optional mode to just skip/drop those over-length samples instead?
This is useful when truncation changes the question too much, or when comparing models with different context windows.
If enabled, I’d expect:
- over-length instances are not evaluated
- they’re excluded from metric denominators
- skipped count/% is logged per task
Something like --skip_overlength / skip_overlength: true would be great.
Happy to help with a PR if this fits the project direction.
Thanks!