[core] Handle 429 and 500 errors from worlds in runtime#966
[core] Handle 429 and 500 errors from worlds in runtime#966VaguelySerious wants to merge 5 commits intomainfrom
Conversation
🦋 Changeset detectedLatest commit: 9f9c30a The changes in this PR will be included in the next version bump. This PR includes changesets to release 19 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests🌍 Community Worlds (157 failed)mongodb (39 failed):
redis (39 failed):
starter (40 failed):
turso (39 failed):
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
|
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
- Propagate Retry-After header as WorkflowAPIError.retryAfter on 429 responses - Add withThrottleRetry wrapper for both workflow and step handlers - Re-enqueue workflows on 5xx errors with exponential backoff (5s, 30s, 120min) - Track serverErrorRetryCount in queue payload for retry budgeting - Expose delaySeconds on QueueOptions interface Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| ); | ||
| // Short wait: sleep in-process, then retry once | ||
| await new Promise((resolve) => | ||
| setTimeout(resolve, retryAfterSeconds * 1000) |
There was a problem hiding this comment.
Should we account for function execution time limits specifically in the case of Vercel world? If the serverless fuction is already close to the end of it's limit and the workflow server throws a 429, adding a 10 sec sleep could potentially exceed the function execution limit and the function could get SIGKILLd midway.
There was a problem hiding this comment.
The workflow layer should never take much more than a few seconds, so I think it's highly unlikely that we'd run into timeouts, so I'm not too worried about this, but technically a concern
|
Rest of the code looks good except the one concern I have around function execution limits in the case of vercel world. |
Uses retry-after header for 429 when provided.
500s are limited to 3x retry, doing exponential backoff.
Review with white space off for sanity.
Claude's suggestion for e2e tests, which I think we might do separately. I think we should make some sort of testbench for world errors and how the runtime reacts.