can't match the win rate posted on the leaderboard #431

lindseyfeng · 2025-01-02T21:59:56Z

Hi team,

I'm observing a significant difference in win rates when using GPT4o Mini with greedy decoding, achieving only 30.33%, compared to the leaderboard sample which reaches a 43% win rate using the same annotator and reference output (GPT Turbo). I'm using the default reference output and standard procedures, but I'm unclear about the exact inference strategy and prompt used for the leaderboard-generated file. Could you please provide details on the inference strategy, the specific prompts, and any additional parameters or settings that might be contributing to the higher win rate in the leaderboard sample? Understanding these factors will help me align my setup more closely with the leaderboard results.

Thank you for your assistance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can't match the win rate posted on the leaderboard #431

can't match the win rate posted on the leaderboard #431

lindseyfeng commented Jan 2, 2025

can't match the win rate posted on the leaderboard #431

can't match the win rate posted on the leaderboard #431

Comments

lindseyfeng commented Jan 2, 2025