You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm observing a significant difference in win rates when using GPT4o Mini with greedy decoding, achieving only 30.33%, compared to the leaderboard sample which reaches a 43% win rate using the same annotator and reference output (GPT Turbo). I'm using the default reference output and standard procedures, but I'm unclear about the exact inference strategy and prompt used for the leaderboard-generated file. Could you please provide details on the inference strategy, the specific prompts, and any additional parameters or settings that might be contributing to the higher win rate in the leaderboard sample? Understanding these factors will help me align my setup more closely with the leaderboard results.
Thank you for your assistance!
The text was updated successfully, but these errors were encountered:
Hi team,
I'm observing a significant difference in win rates when using GPT4o Mini with greedy decoding, achieving only 30.33%, compared to the leaderboard sample which reaches a 43% win rate using the same annotator and reference output (GPT Turbo). I'm using the default reference output and standard procedures, but I'm unclear about the exact inference strategy and prompt used for the leaderboard-generated file. Could you please provide details on the inference strategy, the specific prompts, and any additional parameters or settings that might be contributing to the higher win rate in the leaderboard sample? Understanding these factors will help me align my setup more closely with the leaderboard results.
Thank you for your assistance!
The text was updated successfully, but these errors were encountered: