Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please help me test tool-use (function calling) #514

Open
karthink opened this issue Dec 22, 2024 · 33 comments
Open

Please help me test tool-use (function calling) #514

karthink opened this issue Dec 22, 2024 · 33 comments
Labels
help wanted Extra attention is needed

Comments

@karthink
Copy link
Owner

karthink commented Dec 22, 2024

Note

Current status of tool-use:

Backend with streaming without streaming Parallel tool calls Notes
Anthropic
OpenAI-compatible
Gemini Tools must take arguments
Ollama Turn off streaming first

I've added tool-use/function calling support to all major backends in gptel -- OpenAI-compatible, Claude, Ollama and Gemini.

Demos

screencast_20241222T075329.mp4
gptel-tool-use-filesystem-demo.mp4

Call to action

Please help me test it! It's on the feature-tool-use branch. There are multiple ways in which you can help. Ranked from least to most intensive:

  1. Switch to the feature-tool-use branch and just use gptel as normal -- no messing around with tool use. Adding tool use required a significant amount of reworking in gptel's core, so it will help to catch any regressions first. (Remember to reinstall/re-byte-compile the package after switching branches!)

  2. Switch to the branch, define a tool or two, and try using gptel (instructions below). Let me know if something breaks.

  3. Same as 2, but suggest ways that the feature can be improved, especially in the UI department.


What is "tool use"?

"Tool use" or "function calling" is LLM usage where

  • you include a function specification along with your task/question to the LLM.
  • The LLM optionally decides to call the function, and supplies the function call arguments.
  • You run the function call, and (optionally) feed the results back to the LLM. gptel handles this automatically.
  • The LLM completes the task based on the information received.

You can use this to give the LLM awareness of the world, by providing access to APIs, your filesystem, web search, Emacs etc. You can get it to control your Emacs frame, for instance.

How do I enable it in gptel?

There are three steps:

  1. Use a model that supports tool use. Most of the big OpenAI/Anthropic/Google models do, as do llama3.1 and the newer mistral models if you're using Ollama.

  2. (setq gptel-use-tools t)

  3. Write tool definitions. See the documentation of gptel-make-tool. Here is an example of a tool definition:

Tool definition example
(setq gptel-tools         ;; <-- Holds a list of tools
      (list        
       (gptel-make-tool   ;; <-- This is a tool definition
        :function (lambda (location unit)
                    (url-retrieve-synchronously (format "api.weather.com/..."
                                                        location unit)))
        :name "get_weather" ;; <-- Javascript style, snake_case name
        :description "Get the current weather in a given location"
        :args (list '(:name "location"
                      :type "string"
                      :description "The city and state, e.g. San Francisco, CA")
                    '(:name "unit"
                      :type "string"
                      :enum ("celsius" "farenheit")  ;; <-- enum types help reduce hallucinations, optional
                      :description
                      "The unit of temperature, either 'celsius' or 'fahrenheit'"
                      :optional t)))))

And here are a few simple tools for Filesystem/Emacs/Web access. You can copy and evaluate them in your Emacs session:

Code:

Some tool definitions, copy to your Emacs
(gptel-make-tool
 :function (lambda (url)
             (with-current-buffer (url-retrieve-synchronously url)
               (goto-char (point-min)) (forward-paragraph)
               (let ((dom (libxml-parse-html-region (point) (point-max))))
                 (run-at-time 0 nil #'kill-buffer (current-buffer))
                 (with-temp-buffer
                   (shr-insert-document dom)
                   (buffer-substring-no-properties (point-min) (point-max))))))
 :name "read_url"
 :description "Fetch and read the contents of a URL"
 :args (list '(:name "url"
               :type "string"
               :description "The URL to read"))
 :category "web")

(gptel-make-tool
 :function (lambda (buffer text)
             (with-current-buffer (get-buffer-create buffer)
               (save-excursion
                 (goto-char (point-max))
                 (insert text)))
             (format "Appended text to buffer %s" buffer))
 :name "append_to_buffer"
 :description "Append text to the an Emacs buffer.  If the buffer does not exist, it will be created."
 :args (list '(:name "buffer"
               :type "string"
               :description "The name of the buffer to append text to.")
             '(:name "text"
               :type "string"
               :description "The text to append to the buffer."))
 :category "emacs")

;; Message buffer logging tool
(gptel-make-tool
 :function (lambda (text)
             (message "%s" text)
             (format "Message sent: %s" text))
 :name "echo_message"
 :description "Send a message to the *Messages* buffer"
 :args (list '(:name "text"
               :type "string"
               :description "The text to send to the messages buffer"))
 :category "emacs")

;; buffer retrieval tool
(gptel-make-tool
 :function (lambda (buffer)
             (unless (buffer-live-p (get-buffer buffer))
               (error "Error: buffer %s is not live." buffer))
             (with-current-buffer  buffer
               (buffer-substring-no-properties (point-min) (point-max))))
 :name "read_buffer"
 :description "Return the contents of an Emacs buffer"
 :args (list '(:name "buffer"
               :type "string"
               :description "The name of the buffer whose contents are to be retrieved"))
 :category "emacs")


(gptel-make-tool
 :function (lambda (directory)
	     (mapconcat #'identity
                        (directory-files directory)
                        "\n"))
 :name "list_directory"
 :description "List the contents of a given directory"
 :args (list '(:name "directory"
	       :type "string"
	       :description "The path to the directory to list"))
 :category "filesystem")

(gptel-make-tool
 :function (lambda (parent name)
             (condition-case nil
                 (progn
                   (make-directory (expand-file-name name parent) t)
                   (format "Directory %s created/verified in %s" name parent))
               (error (format "Error creating directory %s in %s" name parent))))
 :name "make_directory"
 :description "Create a new directory with the given name in the specified parent directory"
 :args (list '(:name "parent"
	       :type "string"
	       :description "The parent directory where the new directory should be created, e.g. /tmp")
             '(:name "name"
	       :type "string"
	       :description "The name of the new directory to create, e.g. testdir"))
 :category "filesystem")

(gptel-make-tool
 :function (lambda (path filename content)
             (let ((full-path (expand-file-name filename path)))
               (with-temp-buffer
                 (insert content)
                 (write-file full-path))
               (format "Created file %s in %s" filename path)))
 :name "create_file"
 :description "Create a new file with the specified content"
 :args (list '(:name "path"
	       :type "string"
	       :description "The directory where to create the file")
             '(:name "filename"
	       :type "string"
	       :description "The name of the file to create")
             '(:name "content"
	       :type "string"
	       :description "The content to write to the file"))
 :category "filesystem")

(gptel-make-tool
 :function (lambda (filepath)
	     (with-temp-buffer
	       (insert-file-contents (expand-file-name filepath))
	       (buffer-string)))
 :name "read_file"
 :description "Read and display the contents of a file"
 :args (list '(:name "filepath"
	       :type "string"
	       :description "Path to the file to read.  Supports relative paths and ~."))
 :category "filesystem")
An async tool to fetch youtube metadata using yt-dlp
(defun my/gptel-youtube-metadata (callback url)
  (let* ((video-id
          (and (string-match
                (concat
                 "^\\(?:http\\(?:s?://\\)\\)?\\(?:www\\.\\)?\\(?:youtu\\(?:\\(?:\\.be\\|be\\.com\\)/\\)\\)"
                 "\\(?:watch\\?v=\\)?" "\\([^?&]+\\)")
                url)
               (match-string 1 url)))
         (dir (file-name-concat temporary-file-directory "yt-dlp" video-id)))
    (if (file-directory-p dir) (delete-directory dir t))
    (make-directory dir t)
    (let ((default-directory dir) (idx 0)
          (data (list :description nil :transcript nil)))
      (make-process :name "yt-dlp"
                    :command `("yt-dlp" "--write-description" "--skip-download" "--output" "video" ,url)
                    :sentinel (lambda (proc status)
                                (cl-incf idx)
                                (let ((default-directory dir))
                                  (when (file-readable-p "video.description")
                                    (plist-put data :description
                                               (with-temp-buffer
                                                 (insert-file-contents "video.description")
                                                 (buffer-string)))))
                                (when (= idx 2)
                                    (funcall callback (gptel--json-encode data))
                                    (delete-directory dir t))))
      (make-process :name "yt-dlp"
                    :command `("yt-dlp" "--skip-download" "--write-auto-subs" "--sub-langs"
                               "en,-live_chat" "--convert-subs" "srt" "--output" "video" ,url)
                    :sentinel (lambda (proc status)
                                (cl-incf idx)
                                (let ((default-directory dir))
                                  (when (file-readable-p "video.en.srt")
                                    (plist-put data :transcript
                                               (with-temp-buffer
                                                 (insert-file-contents "video.en.srt")
                                                 (buffer-string)))))
                                (when (= idx 2)
                                    (funcall callback (gptel--json-encode data))
                                    (delete-directory dir t)))))))

(gptel-make-tool
 :name "youtube_video_metadata"
 :function #'my/gptel-youtube-metadata
 :description "Find the description and video transcript for a youtube video.  Return a JSON object containing two fields:

\"description\": The video description added by the uploader
\"transcript\": The video transcript in SRT format"
 :args '((:name "url"
          :description "The youtube video URL, for example \"https://www.youtube.com/watch?v=H2qJRnV8ZGA\""
          :type "string"))
 :category "web"
 :async t
 :include t)

As seen in gptel's menu:

screenshot_20241231T062011

See the documentation for gptel-make-tool for details on the keyword arguments.

Tip

@jester7 points out that you can get the LLM to write these tool definitions for you, and eval the Org Babel blocks to use them right away.

Important

Please share tools you write below so I can use them to test for issues.

In this case, the LLM may choose to ask for a call to get_weather if your question is related to the weather, as in the above demo video. You can help it along by saying something like:

Use the provided tools to accomplish this task: ...

Notes

  • Tools can be asynchronous, see the documentation of gptel-make-tool for an example.
  • You can use the tool definition schema to get the LLM to generate JSON for you. This is one of the ways to get LLM APIs to generate JSON output.
  • Right now tool call results are automatically sent back to the LLM. We'll add a way to make this optional, for when the tool needs to run for side-effects only. For now you can make your tool return nil if you want to run it for side-effects only.
  • It is possible to force the LLM to use provided tools -- this switch has not yet been implemented.
  • LLMs may use tools in parallel if multiple tools are specified -- this is fully supported in gptel.
@karthink
Copy link
Owner Author

@agzam -- in case you're interested.

@agzam
Copy link

agzam commented Dec 22, 2024

Oh wow, this is very cool. So many interesting ideas to try. I'm excited, will give it a go. Thank you!

@metachip
Copy link

Excellent. Will give it a go.

@jester7
Copy link

jester7 commented Dec 23, 2024

Wow, this works great. I just created tools for being able to cat files, ls, and creating new files. I had trouble with OpenAI calling the tools but Claude Sonnet works fine. I even had Claude write a couple of tools, then eval the emacs lisp blocks inside the org mode buffer, and have Claude immediately start using them.

I'll help you test these tools step by step:

  1. First, let's create the subdirectories "1" and "2" inside claude-scripts:

Now let's create hello world files in each directory:

Let's verify the structure using the tree_view tool:

Let's verify the content of both files:

Great! I've tested all the tools:

  1. Created directories "1" and "2"
  2. Created hello.txt files in each directory with different content
  3. Used tree_view to show the structure
  4. Read both files to verify their contents

All tools are working as expected! The tree view shows the correct directory structure, and the file contents are exactly as we wrote them. Would you like to try something else with these tools?

@karthink
Copy link
Owner Author

I even had Claude write a couple of tools, then eval the emacs lisp blocks inside the org mode buffer, and have Claude immediately start using them.

Holy bootstrap, Batman!

I had trouble with OpenAI calling the tools

Did it throw an error or just ignore the tools? If it was silent failure you can check the *gptel-log* buffer after running (setq gptel-log-level 'info).

@karthink karthink pinned this issue Dec 23, 2024
@jester7
Copy link

jester7 commented Dec 23, 2024

It seems to be calling the tools but fails at the end. It also failed with gemini, llama, and qwen for me but I have to double check my tools because for simpler use cases I think it was working a while ago. This same prompt works fine with the Claude models sonnet and haiku.

Here is the Messages buffer and attached is the log
gptel-tool-use-log-openai.txt
:
Querying OpenAI... gptel: moving from INIT to WAIT gptel: moving from WAIT to TYPE gptel: moving from TYPE to TOOL error in process sentinel: let: Wrong type argument: stringp, nil error in process sentinel: Wrong type argument: stringp, nil

@karthink
Copy link
Owner Author

karthink commented Dec 23, 2024 via email

@jester7
Copy link

jester7 commented Dec 23, 2024

Update: it seems Gemini, Llama, and Qwen models work only if I make a request that requires a single tool call. For example I did a request to each to summarize a URL and do a directory listing on my local machine and these types of interactions work.

@karthink
Copy link
Owner Author

karthink commented Dec 23, 2024

it seems Gemini, Llama, and Qwen models work only if I make a request that requires a single tool call

Could you try it with this OpenAI backend?

(gptel-make-openai "openai-with-parallel-tool-calls"
  :key YOUR_OPENAI_API_KEY
  :stream t
  :models gptel--openai-models
  :request-params '(:parallel_tool_calls t))

Parallel tool calls are supposed to be enabled by default, so I'm not expecting that this will work, but it would be wise to verify.

@karthink
Copy link
Owner Author

Here is the Messages buffer and attached is the log
gptel-tool-use-log-openai.txt

Could you also share the tool definitions you used in this failed request? I'd like to try reproducing the error here.

@ProjectMoon
Copy link

For ollama, I see the tool being sent to ollama in the gptel log buffer, but none of the models ever actually seem to use the tools. Have tried with Mistral Nemo, Qwen 2.5, Mistral Small, Llama 3.2 vision.

@karthink
Copy link
Owner Author

karthink commented Dec 24, 2024

@ProjectMoon Could you share the tools you wrote so I can try to reproduce these issues?

@ProjectMoon
Copy link

ProjectMoon commented Dec 24, 2024

Just a copy and paste of the example one.

I will try again at some point in the coming days with ollama debugging mode turned on to see what reaches the server.

Edit: also I need to test with a direct ollama connection. This might be (and probably is) a bug in Open WebUI's proxied ollama API.

@jester7
Copy link

jester7 commented Dec 25, 2024

I get these types of errors:
Querying openai-with-parallel-tool-calls... error in process sentinel: let: Wrong type argument: stringp, nil error in process sentinel: Wrong type argument: stringp, nil Querying Claude... This is a test message Claude error: ((HTTP/2 400) invalid_request_error) messages.4: Did not find 1tool_resultblock(s) at the beginning of this message. Messages followingtool_useblocks must begin with a matching number oftool_resultblocks.

Attached are my gptel config files, the regular one I use and a minimal version I made for testing tools using your suggestion "openai-with-parallel-tool-calls".
gptel-minimal.el.txt
gptel-config.el.txt

@karthink
Copy link
Owner Author

@jester7 Thanks for the tool definitions.

I've fixed parallel tool calls for the Claude and OpenAI-compatible APIs. Please update and test both the streaming and non-streaming cases. You can turn off streaming with (setq gptel-stream nil). If something fails please provide the log.

Parallel tool calls with the Gemini and Ollama APIs are still broken. All these APIs validate their inputs differently, and the docs don't contain the validation schema so adding tool calls is a long crapshoot. Still, we truck on.

@karthink
Copy link
Owner Author

karthink commented Dec 28, 2024

Update: Parallel tool calls with Gemini works too, but only as long as all function calls involve arguments. Zero-arity functions like get_scratch_buffer cause the Gemini API to complain.

@ProjectMoon
Copy link

ProjectMoon commented Dec 28, 2024

@ProjectMoon Could you share the tools you wrote so I can try to reproduce these issues?

OK, definitely seems to be more a problem with OpenWebUI's proxied Ollama API... although it was supposedly resolved to be able to pass in structured inputs. I will have to dig into the source code to see if it even does anything with the tools parameter.

I was able to make a tool call when connecting directly to the ollama instance using Mistral Nemo.

Edit: Yep doesn't have the tools param in the API, so it's discarded silently.

@karthink
Copy link
Owner Author

Edit: Yep doesn't have the tools param in the API, so it's discarded silently.

Thanks. When we merge this we should add a note to the README about the tool-use incompatibility with OpenWebUI.

@karthink
Copy link
Owner Author

Parallel tool calls now work with Ollama too, but you have to disable streaming. Ollama does not support tool calls with streaming responses.

@karthink
Copy link
Owner Author

I've updated the opening post with a status table I'll keep up to date.

@ProjectMoon
Copy link

So I added the tools parameter to OpenWebUI (it was just adding a single line to the chat completions form class, it seems). Then I get a response back from the proxied ollama API containing the tool call to use. But unlike when connecting directly, gptel seems to do nothing. Looking at the elisp code, the only thing that makes sense is the content from the OWUI response being non-empty, but both OWUI response and the direct connection response have "content": "" o_O

@karthink
Copy link
Owner Author

karthink commented Dec 28, 2024 via email

@prdnr
Copy link

prdnr commented Dec 29, 2024

  1. Same as 2, but suggest ways that the feature can be improved, especially in the UI department.

I’m only recently picking emacs back up, the last time I regularly used it was before tools like gpt existed. But if I understand correctly: Since the tool results aren’t echoed to the chat buffer created by m-x gptel, and if gptel-send sends the buffer contents before the point, then won’t any added context that results from a tool call get dropped from the conversation in the next message round?

If that is the case, perhaps it would be nice to provide a way to help people capture tool results to the context tooling provided by gptel? Or maybe have the tool results echoed to the chat buffer (perhaps in a folded block?)

@karthink
Copy link
Owner Author

karthink commented Dec 29, 2024 via email

@karthink
Copy link
Owner Author

But unlike when connecting directly, gptel seems to do nothing. Looking at the
elisp code, the only thing that makes sense is the content from the OWUI
response being non-empty, but both OWUI response and the direct connection
response have "content": "" o_O

@ProjectMoon This can happen if you have streaming turned on, since Ollama
doesn't support streaming + tool use. Can you ensure that gptel-stream is set
to nil before testing Ollama + OWUI?

(I will eventually handle this internally, where streaming is automatically
turned off if an Ollama request includes tools. Right now it's not clear where
in the code to put this check.)

@karthink
Copy link
Owner Author

I've added tool selection support to gptel's transient interface:

image

Pressing t to select a tool opens up:

screenshot_20241231T062011

Selecting a category (like "filesystem" or "emacs" here) will toggle all the tools in that category.

Tool selection can be done globally, buffer-locally or for the next request only using the Scope (=) option.

This makes it much more convenient to select the right set of tools for the task at hand. (LLMs get confused if you include a whole bunch of irrelevant tools.)

I've also updated the opening post above with the tool definitions you see in the above image. You can grab them from there and evaluate them. I'm not sure yet if gptel should include any tools by default.

@karthink
Copy link
Owner Author

karthink commented Dec 31, 2024

And here is a demo of using the filesystem toolset to make the LLM do something that's otherwise annoying to do:

gptel-tool-use-filesystem-demo.mp4

@ProjectMoon
Copy link

@karthink finally got around to testing a bunch of this. Wasn't able to replicate the feature to see the internal state via clicking the header line, but the gptel log at info level shows that it responds with a tool call.

The trick was forcing non-streaming on the client side, in my gptel configuration. Forcing streaming on the server didn't help. But setting stream to nil when running against a modified version of Open-WebUI allowed it to work. A regular version of Open-WebUI does not work at all because of the missing tools option in the API payload (though it seems to be a one-line PR, so I may submit a change to them).

@karthink
Copy link
Owner Author

karthink commented Jan 2, 2025

@karthink finally got around to testing a bunch of this. Wasn't able to
replicate the feature to see the internal state via clicking the header line,
but the gptel log at info level shows that it responds with a tool call.

You might need to update gptel to get the introspection feature, I've been
pushing to the branch over the week.

@meain
Copy link

meain commented Jan 2, 2025

Only just got around to testing this. Couple of thoughts I had:

  • Might be worth looking into MCP. Looks like the "industry" has somewhat adopted it and we would easily have a "huge" tools library.
  • Adding the tool call and output to the chat context could be useful. The context within the tool call output might be helpful for future messages to the llm. I kinda like how Cline represent tool call information. Then again, I don't know if it will introduce additional noise.
  • Option to be able to approve running of certain tools. Some tools might be performing destructive operations(like a file delete) which I might want to guard with a y-or-n-p. I guess this could go into the tool function definition, but might be worth providing a generic interface(especially if we want to provide an option to override asking for confirmation).

@karthink
Copy link
Owner Author

karthink commented Jan 2, 2025

Only just got around to testing this. Couple of thoughts I had:

  • Might be worth looking into MCP. Looks like the "industry" has somewhat adopted it and we would easily have a "huge" tools library.

See #484.

  • Adding the tool call and output to the chat context could be useful. The context within the tool call output might be helpful for future messages to the llm. I kinda like how Cline represent tool call information. Then again, I don't know if it will introduce additional noise.

  • Option to be able to approve running of certain tools. Some tools might be performing destructive operations(like a file delete) which I might want to guard with a y-or-n-p. I guess this could go into the tool function definition, but might be worth providing a generic interface(especially if we want to provide an option to override asking for confirmation).

Yes, both of these were planned at the start, and have been implemented locally. They can be specified both per tool (in the definition) and per call (in the transient menu). I'll push them to this branch eventually.

That said, updates will be slow again as I'm now out of time to work on gptel.

@karthink
Copy link
Owner Author

karthink commented Jan 2, 2025

@meain Do you know how getting LLMs to edit files works? Is the LLM given the file and asked to generate a diff, or generate the new version of the file, or generate only a changed region? The one tool that seems very useful that I don't know how to write is the edit-file action.

@meain
Copy link

meain commented Jan 2, 2025

Is the LLM given the file and asked to generate a diff, or generate the new version of the file, or generate only a changed region?

TLDR: Depends on the model

I've mostly seen aider's diff format like things getting used, but for some models we might have to ask it to generate full file. Edit formats page in aider might be worth looking into. Aider mostly uses the whole or diff format IIUC. Some models like 4o-mini does not work well with diff format and aider automatically uses whole format by default. For most bigger models in general, diff would be better as it would be much less tokens to generate. This has details about what format aider users for each file.


As for a generic option for edit, I've seen packages(cline for example) try asking the model to produce diff format response, and if we can't find the piece mentioned in the search section even after a retry, we ask it to generate the whole file. I don't know if the behavior is the llm automatically retrying on tool call on failure and not necessarily something that is done by the package. I have seen the llm automatically retry at times in my packages where I have not mentioned anything about retrying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants