Long delay when using streaming + tools #529

holdenmatt · 2024-09-13T21:18:32Z

(Sorry if this isn't the right place to report this, I wasn't sure).

I'm trying to switch from gpt-4o to claude-3.5-sonnet in an app I'm building, but high streaming tool latency is preventing me from doing so. Looks like this was discussed in #454 but wondering how I should proceed?

The total latency of Claude vs gpt-4o is pretty similar, and I think fine.

The issue is that Claude waits a long time before any content is streamed (I often see ~5s delays vs ~500ms for gpt-4o). This is a poor user experience in my app, because users get no feedback that any generation is happening. This will prevent me from switching, even though I much prefer Claude's output quality!

Do you have any plans to fix this? Or do you recommend not using tools + streaming with Claude?

Example timing and test code below, if helpful.

Timing comparison

claude-3-5-sonnet:
Stream created at 0ms
First content received: 4645ms
Streaming time: 46ms
Total time: 4691ms

gpt-4o:
Stream created at 343ms
First content received: 368ms
Streaming time: 2100ms
Total time: 2468ms

Test code:

import Anthropic from "@anthropic-ai/sdk";
import OpenAI from "openai";

const openai = new OpenAI();
const anthropic = new Anthropic();

const provider: "anthropic" | "openai" = "anthropic";

export async function POST() {
  const startTime = performance.now();
  let streamCreated: number | undefined = undefined;
  let firstContentReceived: number | undefined = undefined;

  const messages = [
    {
      role: "user" as const,
      content: `Write a poem about pirates.`,
    },
  ];

  const schema = {
    type: "object" as const,
    properties: {
      poem: { type: "string", description: "The poem" },
    },
    required: ["poem"],
  };

  if (provider === "openai") {
    const stream = await openai.chat.completions.create({
      model: "gpt-4o",
      stream: true,
      messages,
      tools: [
        {
          type: "function",
          function: {
            name: "poem",
            description: "Generate a poem",
            parameters: schema,
          },
        },
      ],
    });

    streamCreated = performance.now();

    for await (const chunk of stream) {
      console.log(JSON.stringify(chunk.choices[0]?.delta?.tool_calls, null, 2));
      if (firstContentReceived === undefined) {
        firstContentReceived = performance.now();
      }
    }
  } else if (provider === "anthropic") {
    const stream = anthropic.messages
      .stream({
        model: "claude-3-5-sonnet-20240620",
        max_tokens: 2000,
        messages,
        tools: [
          {
            name: "poem",
            description: "Generate a poem",
            input_schema: schema,
          },
        ],
      })
      // When a JSON content block delta is encountered this
      // event will be fired with the delta and the currently accumulated object
      .on("inputJson", (delta, snapshot) => {
        console.log(`delta: ${delta}`);
        if (firstContentReceived === undefined) {
          firstContentReceived = performance.now();
        }
      });

    streamCreated = performance.now();
    await stream.done();
  }

  const endTime = performance.now();
  if (streamCreated) {
    console.log(`Stream created at ${Math.round(streamCreated - startTime)}ms`);
  }
  if (firstContentReceived) {
    console.log(
      `First content received: ${Math.round(firstContentReceived - startTime)}ms`,
    );
    console.log(`Streaming time: ${Math.round(endTime - firstContentReceived)}ms`);
  }
  console.log(`Total time: ${Math.round(endTime - startTime)}ms`);
}

The text was updated successfully, but these errors were encountered:

samj-anthropic · 2024-09-16T21:52:14Z

Hi @holdenmatt, unfortunately this is a model limitation (same issue noted in #454 (comment)). We're planning on improving this with future models.

holdenmatt · 2024-09-16T23:14:59Z

I see, thanks. If I want faster streaming, would you recommend I move away from tools and try to coax a JSON schema via the system prompt instead?

samj-anthropic · 2024-09-20T20:24:34Z

Hi @holdenmatt -- one clarification to the above: we stream out each key/value pair together, so long values will result in buffering (the delays you're seeing). In the example you provided, Claude is producing a poem (a long string) as a value, which is why you're seeing the delay. However, a large object with many smaller keys/values wouldn't have this issue.

If I want faster streaming, would you recommend I move away from tools and try to coax a JSON schema via the system prompt instead?

That could work, this delay you're seeing should only be happening in that specific kind of tool use (where Claude is producing long keys/values).

holdenmatt · 2024-09-20T20:54:44Z

Ah, that would explain why I run into this but other folks I talk to haven't seen it.

The specific use case for me is generating LaTeX code from text prompts for https://texsandbox.com/

The latex output could be long, depending on the prompt. The reason I use function calling instead of text completion is I want to allow the model to "branch" between the good "latex" case and an "error" case if it doesn't know what to do, or eg the input prompt doesn't make sense.

I could avoid tools here if that would improve streaming, but I'd need some other way to signal "this is valid code" vs "this is an error message"

holdenmatt · 2024-09-25T16:05:53Z

fyi - I fixed this by moving away from tool calling, and streaming now feels fast again.

I hacked my own poor man's function calling on text generation, by prompting the model to write latex or error on the first line, followed by code or an error message.

This works fine (so you can close this if you like) but it was the biggest issue I ran into switching from gpt-4o to claude-3.5-sonnet. I quite often use functions/tools with long JSON values, so feature request to improve this in the future. Thanks!

ZECTBynmo · 2024-09-27T17:48:38Z

Is there an issue we can track for improvements to streaming + tool use, or do you plan to post updates here?

Kitenite · 2024-10-31T15:37:25Z

Hey team, is there a planned date for fixing this? This is a big limiter for our user experience for code-gen.
Since the result is returned as a stream anyway, is there a way to get those delta earlier?

darylsew · 2024-11-07T21:36:12Z

+1, think this basically makes tool use not viable for our use case - not limited to the typescript API, also a problem in python

Kitenite · 2024-11-07T23:32:31Z

+1, think this basically makes tool use not viable for our use case - not limited to the typescript API, also a problem in python

If this helps, there's a hacky workaround similar to the solution mentioned above that's currently working for me and someone else by streaming raw text and forcing a JSON format. Then progressively resolve the text into the partial object. It's surprisingly reliable so far.

vercel/ai#3422 (comment)

ItayElgazar · 2024-12-15T13:14:38Z

Any news on this? this is super limiting and ruins the user experience

holdenmatt mentioned this issue Sep 13, 2024

streamObject with @ai-sdk/anthropic only provides a full object once finished via .toTextStreamResponse() vercel/ai#2105

Closed

sahanatvessel mentioned this issue Oct 28, 2024

anthropic model doesn't stream the response when used with useObject vercel/ai#3395

Closed

Kitenite mentioned this issue Oct 31, 2024

StreamObject doesn't work with Anthropic vercel/ai#3422

Open

Kitenite mentioned this issue Oct 31, 2024

Using chat with Stream Object onlook-dev/onlook#695

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long delay when using streaming + tools #529

Long delay when using streaming + tools #529

holdenmatt commented Sep 13, 2024

samj-anthropic commented Sep 16, 2024

holdenmatt commented Sep 16, 2024

samj-anthropic commented Sep 20, 2024 •

edited

Loading

holdenmatt commented Sep 20, 2024

holdenmatt commented Sep 25, 2024

ZECTBynmo commented Sep 27, 2024

Kitenite commented Oct 31, 2024 •

edited

Loading

darylsew commented Nov 7, 2024

Kitenite commented Nov 7, 2024

ItayElgazar commented Dec 15, 2024

Long delay when using streaming + tools #529

Long delay when using streaming + tools #529

Comments

holdenmatt commented Sep 13, 2024

Timing comparison

Test code:

samj-anthropic commented Sep 16, 2024

holdenmatt commented Sep 16, 2024

samj-anthropic commented Sep 20, 2024 • edited Loading

holdenmatt commented Sep 20, 2024

holdenmatt commented Sep 25, 2024

ZECTBynmo commented Sep 27, 2024

Kitenite commented Oct 31, 2024 • edited Loading

darylsew commented Nov 7, 2024

Kitenite commented Nov 7, 2024

ItayElgazar commented Dec 15, 2024

samj-anthropic commented Sep 20, 2024 •

edited

Loading

Kitenite commented Oct 31, 2024 •

edited

Loading