-
Notifications
You must be signed in to change notification settings - Fork 183
Description
The Problem
I'm building an agent using Vercel Workflow where each turn of the agent is a separate workflow step. The steps themselves execute quickly—typically 0.4-2 seconds—but the serverless functions hosting them don't terminate. Instead, they hang for the full 300 seconds before timing out.
The strange part is that the workflow actually works correctly. Each step completes its task, returns its state, and the workflow continues to the next step without issue. But in the background, the function that ran that step is still alive, doing nothing, just waiting to time out. This results in ~295 seconds of wasted compute per step.
Environment
- Package:
workflowversion4.0.1-beta.50 - Platform: Vercel Pro plan
- Runtime: Node.js
What I'm Seeing
Here's a snippet from my logs showing the pattern—the step finishes in under a second, but timeout errors pile up from functions that have been hanging:
Feb 04 17:41:07.26 - [CleanupSandbox] ✅ Sandbox stopped successfully
Feb 04 17:41:06.21 - [Workflow] Turn 39: 0.41s
Feb 04 17:41:03.07 - Vercel Runtime Timeout Error: Task timed out after 300 seconds
Feb 04 17:41:01.05 - Vercel Runtime Timeout Error: Task timed out after 300 seconds
Feb 04 17:40:55.94 - Vercel Runtime Timeout Error: Task timed out after 300 seconds
... (dozens more 300s timeout errors, all within seconds of each other)
Turn 39 completed in 0.41 seconds. But the functions from earlier turns are still hanging around, finally timing out 300 seconds after they started.
Streaming Pattern Used
Every step uses getWritable() with multiple incremental writes followed by releaseLock():
export async function runToolTurn(state: WorkflowState, ...): Promise<WorkflowState> {
"use step";
const writable = getWritable<StreamUpdate>();
const writer = writable.getWriter();
try {
// Multiple writes during step execution
await writer.write({ type: 'data-tool_call', ... });
// ... step logic (completes in < 5 seconds) ...
await writer.write({ type: 'data-tool_result', ... });
return newState;
} finally {
writer.releaseLock();
}
}This pattern is used across all 5 step functions, each doing 2-3+ writes per invocation.
Relevance to PR #678
I noticed that PR #678 (included in 4.0.1-beta.45) specifically addressed this:
"Fix stream serialization to resolve when user releases lock instead of waiting for stream to close. This prevents Vercel functions from hanging when users incrementally write to streams within steps."
I'm on 4.0.1-beta.50 which should include this fix, yet I'm still experiencing hangs. I'm planning to test 4.1.0-beta.52 to see if additional fixes were made.
What I Think Might Be Happening
I'm not certain what's causing the functions to hang, but I have two theories:
Theory 1: Streaming
Despite PR #678, there might be edge cases with multiple writes that still cause hangs. Or perhaps something about my specific pattern isn't being handled correctly.
Theory 2: Sandbox SDK
Some of my steps connect to a persistent Vercel Sandbox using Sandbox.get() to execute code. The sandbox stays alive across steps so I can maintain state. There doesn't seem to be a disconnect() method to cleanly release the connection—only sandbox.stop(), which would kill the VM entirely. This could be keeping the event loop active.
It could also be something else entirely that I'm not aware of.
Questions
- Is the PR Fix stream serialization to resolve when user releases lock instead of waiting for stream to close #678 fix supposed to fully resolve the streaming pattern I'm using?
- Are there edge cases with multiple writes that might still cause hangs?
- Could the Sandbox SDK connection be keeping the event loop active?
- Is there something else I should be doing to ensure functions terminate cleanly?
The Impact
This is a significant problem because I'm being charged for all that idle compute time. A workflow with 40 steps that each take 1 second is costing me 40 × 300 = 12,000 seconds of compute instead of 40 seconds.
Any insight into what might be causing the functions to hang—or how to make them terminate cleanly—would be hugely appreciated.
Thanks!