Skip to content

Commit

Permalink
readme
Browse files Browse the repository at this point in the history
  • Loading branch information
yuchenlin committed Jun 18, 2024
1 parent e705a4f commit 6049a71
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ conda create -n wildbench python=3.10
conda activate wildbench
pip install vllm -U # pip install -e vllm
pip install openai datasets tenacity
# pip install google-cloud-aiplatform
pip install google-generativeai
pip install cohere mistralai
pip install anthropic==0.19.0
Expand Down Expand Up @@ -111,11 +110,11 @@ You should change the code to add these APIs, for example, gemini, cohere, claud
</ul>
We use three reference models (GPT-4-turbo-0429, Claude-3-Haiku, and Llama-2-70B-chat) to compute the rewards for each model. The final WB Reward-Mix is the average of the three rewards on 1024 examples.
<h4>Mitigating Length Bias</h4>
As many studies have shown, LLM judges tend to prefer longer responses. To mitigate this bias, we propose a simple and customizable length penalty method. <b>We convert Slightly Win/Lose to be a Tie if the winner is longer than the loser by a certain length threshold (K characters).</b> We set K=500 by default, but you can customize it on our leaderboard UI. Note that <b>K= ∞ will disable the length penalty.</b>
As many studies have shown, LLM judges tend to prefer longer responses. To mitigate this bias, we propose a simple and customizable length penalty method. <b>We convert Slightly Win/Lose to be a Tie if the winner is longer than the loser by a certain length threshold (K characters).</b> Note that <b>K= ∞ will disable the length penalty.</b>
</div>
</details>

### Run evaluation scripts
### ‼️ Run evaluation scripts

We suggest to use OpenAI's [Batch Mode](https://platform.openai.com/docs/guides/batch) for evaluation, which is faster, cheaper and more reliable.

Expand Down
Empty file added update_gradio.sh
Empty file.

0 comments on commit 6049a71

Please sign in to comment.