Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
hkiyomaru committed Dec 19, 2023
1 parent 38a18fa commit 1b6bc80
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ OPENAI_API_KEY=<YOUR-KEY> python llm_judge/gen_judgment.py \
--mode {single|pairwise-baseline|pairwise-all} \
[--baseline-model <BASELINE-MODEL-ID>] \
[--model-list <LIST-OF-MODEL-IDS>] \
[--yes] \
[--wandb]
```

Expand All @@ -55,6 +56,7 @@ Arguments & Options:
- `single`: run score-based single-model grading.
- `--baseline-model <BASELINE-MODEL-ID>` is the model ID of the baseline model. This option is only available in `pairwise-baseline` mode. If not specified, the baseline model is set to `text-davinci-003`.
- `--model-list <LIST-OF-MODEL-IDS>` is a list of model IDs to be evaluated. If not specified, all models in `data/jp_bench/model_answer` will be evaluated.
- `--yes` is a flag to skip the confirmation prompt.
- `--wandb` is a flag to enable logging to W&B. You can upload the results later to W&B by running `upload_result.py`, as described in the next section.

**Mode: `pairwise-baseline` (Default)**
Expand Down

0 comments on commit 1b6bc80

Please sign in to comment.