Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server : add Hermes-3 tool call support (WIP) #9254

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Aug 30, 2024

Related to #5695
Close #9031

This is still WIP.

What is working:

  • support Hermes-3 model
  • ❌ support the official Meta-Llama-3.1: doesn't work because the model is too sensitive to the input prompt. Hopefully Meta will fix it soon.
  • auto detect template to be used
  • support tools via /chat/completion
  • detect if model wants to use tool or not
  • support stream ==> currently works for non-tool response
  • dynamic temperature and/or grammar when generating tool call response
  • add demo
  • add tests

Special thanks to @Rocketknight1 for his very detailed blog post: Tool Use, Unified


@qnixsynapse
Copy link
Contributor

qnixsynapse commented Aug 30, 2024

I have a suggestion. You can grab tool call action by tool token ids instead of token string.

In my own "Python implementation", if start tool call token id is generated(<|python_tag|> in case of llama 3.1) then streaming is paused until stop token/end_tool_call token is generated. Then a recursive call is made with the output of the tool and streaming is resumed.

@ngxson
Copy link
Collaborator Author

ngxson commented Aug 30, 2024

@qnixsynapse Yes it's possible to do so with Hermes-3 format, but that will not be possible with either llama 3.1 JSON tool calls or llama 3.1 custom function. The goal here is to make it compatible with OAI specs, so relying on <|python_tag|> is not an option here (it may make more sense with llama-cpp-python)

Anyway, I'll consider doing this later on, when tool call templates are more mainstream and patterns start to emerge.

@qnixsynapse
Copy link
Contributor

qnixsynapse commented Aug 31, 2024

@ngxson The <|python_tag|> was just an example. You can also add support for newer mistral models(7B v3; Nemo) which has [TOOL_CALL] tokens for example.

Here for example, we can expand this:

else if (has_token("[/INST]") && has_token("[TOOL_CALLS]")) {
        return LLAMA_TOOL_FORMAT_MISTRAL;
 }

Regarding streaming, this is sufficient I think:

upload.mp4

@mario7421
Copy link

mario7421 commented Sep 19, 2024

@ngxson, I appreciate your work, the new feature is great.
I've encountered an issue, though. When the conversation includes previous model messages with a "null" content (which can happen with tool_call only messages), I receive an "Invalid 'content' type" error on subsequent requests. I believe a minor adjustment to the parse_chat_messages function in /examples/server/utils.hpp should solve this.
For now, I've worked around the problem by manually assigning an empty string to "content" in the code that makes the calls to the server.

EDIT: Also the cases in which the "tools" parameter in the /completion request is null, or an empty array, should be accepted and treated the same as the case in which such parameter does not exist at all.

ss << "<|im_start|>system\n\n";
ss << "You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools>\n\n";
for (auto tool : tools) {
ss << tool.dump(1, '\t') << "\n\n";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the tabulations? They are increasing the number of tokens, but I think they do not provide useful information.

Suggested change
ss << tool.dump(1, '\t') << "\n\n";
ss << tool.dump() << "\n\n";

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: introduce Tool Call API in server mode
3 participants