Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added system prompt for openai #2145

Merged
merged 5 commits into from
Jan 11, 2025

Conversation

Leo-LiHao
Copy link
Contributor

@Leo-LiHao Leo-LiHao commented Sep 12, 2024

What does this PR do?

Add system prompt for OpenAI Representation model, see Issue #2146

Before submitting

  • This PR fixes a typo or improves the docs (if yes, ignore all other checks!).
  • Did you read the contributor guideline?
  • Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes (if applicable)?
  • Did you write any new necessary tests?

Copy link
Owner

@MaartenGr MaartenGr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating these two models! I left a small comment.

Other than that, I believe we will need to look at the following:

With HuggingFace, we might need to change the formatting such that we use messages instead:

messages = [
    {
        "role": "system",
        "content": MY_SYSTEM_PROMPT,
    },
    {"role": "user", "content": prompt},
 ]

For LangChain, I'm okay with skipping this for now since it uses a more complex structure and the API here might need to be updated (see https://python.langchain.com/v0.1/docs/modules/model_io/chat/quick_start/#messages-in---message-out).

For llama-cpp-python, it seems straightforward and we should use .create_chat_completion instead of directly calling the model.

bertopic/representation/_cohere.py Outdated Show resolved Hide resolved
@Leo-LiHao
Copy link
Contributor Author

@MaartenGr I updated the code for llama-cpp

Copy link
Owner

@MaartenGr MaartenGr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the late reply, I was off for the last couple of weeks.

I left some comments here and there to make sure we are on the same page for all changes.

bertopic/representation/_cohere.py Outdated Show resolved Hide resolved
[DOCUMENTS]
Keywords: [KEYWORDS]
Provide the extracted topic name directly without any explanation."""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's structure this the same way as was originally done in _cohere.py

Topic name:"""
Provide the topic name directly without any explanation."""

DEFAULT_SYSTEM_PROMPT = "You are designated as an assistant that identify and extract high-level topics from texts."
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DEFAULT_SYSTEM_PROMPT = "You are designated as an assistant that identify and extract high-level topics from texts."
DEFAULT_SYSTEM_PROMPT = "You are an assistant that extracts high-level topics from texts."


Based on the above information, can you give a short label of the topic?
A: """
DEFAULT_SYSTEM_PROMPT = "You are designated as an assistant that identify and extract high-level topics from texts."
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DEFAULT_SYSTEM_PROMPT = "You are designated as an assistant that identify and extract high-level topics from texts."
DEFAULT_SYSTEM_PROMPT = "You are an assistant that extracts high-level topics from texts."

pipeline_kwargs: Mapping[str, Any] = {},
nr_docs: int = 4,
diversity: float = None,
doc_length: int = None,
doc_length: int = 100,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the value as it was, otherwise we have to do this for all instances of doc_length across all models.

bertopic/representation/_llamacpp.py Show resolved Hide resolved
}
], ** self.pipeline_kwargs
)
label = topic_description["choices"][0]["message"]["content"].strip().replace("Topic name: ", "")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove .replace("Topic name: ", "") if the prompt is updated (see comment above).

bertopic/representation/_llamacpp.py Outdated Show resolved Hide resolved
bertopic/representation/_openai.py Outdated Show resolved Hide resolved
bertopic/representation/_utils.py Outdated Show resolved Hide resolved
@Leo-LiHao
Copy link
Contributor Author

Apologies for the late reply, I was off for the last couple of weeks.

I left some comments here and there to make sure we are on the same page for all changes.

@MaartenGr Thanks for your comments, I incorporated them and updated the code.

@MaartenGr
Copy link
Owner

@Leo-LiHao Thanks! I noticed a couple of changes to creating the prompt that we might want in all other models but it's alright to keep that as is for now. At some point, I might make a utility function for creating the prompt that is used across all models to make sure everything remains stable.

For now, let's merge!

@MaartenGr
Copy link
Owner

MaartenGr commented Jan 8, 2025

Ah, I see the linting has an error. Could you check that?

EDIT: I updated the settings for the repo so that the checks will automatically run.

@Leo-LiHao
Copy link
Contributor Author

@MaartenGr Thank you! I fixed the lint error and it is read for merge now.

@MaartenGr MaartenGr merged commit 5cad563 into MaartenGr:master Jan 11, 2025
5 checks passed
@MaartenGr
Copy link
Owner

Awesome, thanks for taking the time! I just merged it 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants