Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI version upgrade (latest version) #56

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,16 @@ python llm_judge/gen_model_answer.py --config <CONFIG-PATH>

Arguments & Options:
- `<CONFIG-PATH>` is the path to a configuration file. Examples are in `configs/`.
- `num_answers_per_question` specifies the number of answers to generate per question (default: all)

For example:

```bash
python llm_judge/gen_model_answer.py --config configs/rinna--japanese-gpt-neox-3.6b-instruction-ppo.json
```



#### Step 2. Generate GPT-4 judgments

There are several options to use GPT-4 as a judge, such as pairwise win-rate and single-answer grading.
Expand All @@ -43,7 +46,8 @@ OPENAI_API_KEY=<YOUR-KEY> python llm_judge/gen_judgment.py \
[--baseline-model <BASELINE-MODEL-ID>] \
[--model-list <LIST-OF-MODEL-IDS>] \
[--yes] \
[--wandb]
[--wandb] \
[--num_answers_per_question]
```

Arguments & Options:
Expand All @@ -55,6 +59,7 @@ Arguments & Options:
- `--model-list <LIST-OF-MODEL-IDS>` is a list of model IDs to be evaluated. If not specified, all models in `data/jp_bench/model_answer` will be evaluated.
- `--yes` is a flag to skip the confirmation prompt.
- `--wandb` is a flag to enable logging to W&B. You can upload the results later to W&B by running `upload_result.py`, as described in the next section.
- `num_answers_per_question` : specifies the number of answers to evaluate per question

**Mode: `pairwise-baseline` (Default)**

Expand Down Expand Up @@ -157,4 +162,3 @@ If you use our code in your research, please cite our work:
year={2024}
}
```

30 changes: 0 additions & 30 deletions configs/README.md

This file was deleted.

13 changes: 0 additions & 13 deletions configs/cyberagent--calm2-7b-chat.json

This file was deleted.

This file was deleted.

This file was deleted.

16 changes: 0 additions & 16 deletions configs/openai--text-davinci-003.json

This file was deleted.

16 changes: 0 additions & 16 deletions configs/rinna--japanese-gpt-neox-3.6b-instruction-ppo.json

This file was deleted.

16 changes: 0 additions & 16 deletions configs/rinna--japanese-gpt-neox-3.6b-instruction-sft-v2.json

This file was deleted.

13 changes: 0 additions & 13 deletions configs/tokyotech-llm--Swallow-70b-instruct-hf.json

This file was deleted.

40 changes: 29 additions & 11 deletions llm_judge/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,20 @@
from typing import Optional, Union

import openai
from openai import AzureOpenAI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure の API だけでなく OpenAI の API でも動く実装にしてください.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AzureのAPIしか現状使用できないので、検証はできませんが大丈夫でしょうか。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

それならこの部分はこちらで実装 & テストします.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

承知しました。


client = AzureOpenAI(api_key=os.getenv("OPENAI_API_KEY"),
api_version=os.getenv("OPENAI_API_VERSION"))
import tiktoken
from dotenv import load_dotenv

logger = logging.getLogger(__name__)

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.organization = os.getenv("OPENAI_ORGANIZATION")
openai.api_type = os.getenv("OPENAI_API_TYPE")
openai.api_base = os.getenv("OPENAI_API_BASE")
openai.api_version = os.getenv("OPENAI_API_VERSION")
# TODO: The 'openai.organization' option isn't read in the client API. You will need to pass it when you instantiate the client, e.g. 'OpenAI(organization=os.getenv("OPENAI_ORGANIZATION"))'
# openai.organization = os.getenv("OPENAI_ORGANIZATION")
# TODO: The 'openai.api_base' option isn't read in the client API. You will need to pass it when you instantiate the client, e.g. 'OpenAI(base_url=os.getenv("OPENAI_API_BASE"))'
# openai.api_base = os.getenv("OPENAI_API_BASE")

# Data paths
JP_BENCH_DIR = Path(__file__).resolve().parent.parent / "data" / "jp_bench"
Expand Down Expand Up @@ -68,9 +71,9 @@ def judge(self, **kwargs):
params["engine"] = self.model
else:
params["model"] = self.model
response = openai.ChatCompletion.create(**params)
return response["choices"][0]["message"]["content"]
except openai.error.OpenAIError as e:
response = client.chat.completions.create(**params)
return response.choices[0].message.content
except openai.OpenAIError as e:
logger.warning(f"OpenAI API error: {e}")
time.sleep(API_RETRY_SLEEP)

Expand Down Expand Up @@ -147,7 +150,6 @@ def get_score(judgment: str) -> int:
return ast.literal_eval(match.groups()[0])
return -1


@dataclasses.dataclass
class MatchPair:
question: dict
Expand Down Expand Up @@ -256,6 +258,19 @@ def get_model_list(answer_dir: Union[str, Path]):
return [path.name for path in Path(answer_dir).iterdir()]


# def load_model_answers(answer_dir: Union[str, Path]):
# """Load model answers.

# Args:
# answer_dir (Union[str, Path]): The answer directory.
# """
# answers = {}
# with open(Path(answer_dir) / "results.jsonl", "r") as fin:
# for line in fin:
# answer = json.loads(line)
# answers[answer["question_id"]] = answer
# return answers

def load_model_answers(answer_dir: Union[str, Path]):
"""Load model answers.

Expand All @@ -266,7 +281,10 @@ def load_model_answers(answer_dir: Union[str, Path]):
with open(Path(answer_dir) / "results.jsonl", "r") as fin:
for line in fin:
answer = json.loads(line)
answers[answer["question_id"]] = answer
qid = answer["question_id"]
if qid not in answers:
answers[qid] = []
answers[qid].append(answer)
return answers


Expand Down Expand Up @@ -362,4 +380,4 @@ def filter_pairwise_judgements(
filtered_result_id_results_map[result_id] = results
else:
filtered_result_id_results_map[result_id] = results
return filtered_result_id_results_map
return filtered_result_id_results_map
9 changes: 6 additions & 3 deletions llm_judge/gen_gpt3.5_answer.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
import time

import openai
from openai import OpenAI

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
import shortuuid
from common import PREDICTION_DIR, QUESTION_FILE, load_questions
from dotenv import load_dotenv
Expand All @@ -13,8 +16,8 @@
logger = logging.getLogger(__name__)

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.organization = os.getenv("OPENAI_ORGANIZATION")
# TODO: The 'openai.organization' option isn't read in the client API. You will need to pass it when you instantiate the client, e.g. 'OpenAI(organization=os.getenv("OPENAI_ORGANIZATION"))'
# openai.organization = os.getenv("OPENAI_ORGANIZATION")


def generate_response(input_text, generation_config) -> str:
Expand All @@ -24,7 +27,7 @@ def generate_response(input_text, generation_config) -> str:
input_text: The input text.
generation_config: The config for the generation.
"""
response = openai.Completion.create(prompt=input_text, **generation_config)
response = client.completions.create(prompt=input_text, **generation_config)
return response.choices[0].text


Expand Down
Loading