Activities and orchestrations stuck scheduled but not running for hours #3018

JakeStanger · 2025-01-28T11:46:32Z

Description

We are observing that approximately 50% of the time, our sub-orchestrations or activities become stuck in the Pending or TaskScheduled states, taking many hours to become unstuck and progress again.

Possibly a red herring, but the Application Insights logging within the Azure portal seems to suggest that the sub-orchestrator job "forks" itself, with one copy using the parent ID and the other using its own unique ID.

These two orchestrator run entries show what started as the same job with the same timestamp.

This behaviour has been observed using both a dedicated (newly provisioned) storage account, and using the Netherite storage provider.

Expected behavior

All activities and (sub)orchestrators should complete succesfully in a reasonable timeframe.

Actual behavior

Oftentimes the jobs appear to get stuck around the same point, taking a very long time to continue.

There is no other activity within the Functions app or the storage/event resources during this period.

Relevant source code snippets

Propritrary codebase, however can prepare relevant snippets on request.

Known workarounds

None

App Details

Durable Functions extension version: v3.0.2 (In-Process/WebJobs)
Azure Functions runtime version: ~4
Programming language used: C# (.NET 6)

All related dependencies:

<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.Storage.Queues" Version="5.2.0" />
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.DurableTask" Version="3.0.2" />
<PackageReference Include="Microsoft.Azure.DurableTask.Netherite.AzureFunctions" Version="3.0.0" />
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.Storage.Blobs" Version="5.2.1" />
<PackageReference Include="Microsoft.NET.Sdk.Functions" Version="4.3.0" />
<PackageReference Include="Microsoft.Azure.Functions.Extensions" Version="1.1.0" />
<PackageReference Include="Microsoft.Extensions.DependencyInjection" Version="6.0.1" />
<PackageReference Include="AzureFunctions.MvcModelBinding" Version="4.2.1" />

Screenshots

If applicable, add screenshots to help explain your problem.

Gantt chart showing issue from Durable Functions Monitor:

If deployed to Azure

Timeframe issue observed: 2025-01-27T17:15:00 - 2025-01-28T10:31:00 (UTC)
Azure region: UK South
Orchestration instance ID(s): c11225bfd6ed45c9b73e147a5e204626, 2786e9f0fadd479fa3975b99638443fa:11
Function names:
- Migration (top-level orchestrator)
- Migration_contentTypes (sub-orchestrator, runs once)
- Migration_contentTypes_sync (activity, runs in loop sequentially)

The Functions app is Consumption tier, running on Window 64bit (although this was also observed on 32bit). Scaling settings are left as default.

Resource names can be provided privately if required.

The text was updated successfully, but these errors were encountered:

JakeStanger · 2025-01-29T09:48:54Z

Looks like the issue is with another department. Turns out they've been getting "stuck" due to blocking on a long-running operation from Graph, which I didn't spot due to a bit of a lazy implementation and the fact it was handled by another library. I refactored to avoid blocking and use a more appropriate polling pattern, which revealed that sometimes these jobs are taking 17 hours instead of the usual few seconds. Apologies for wasting your time, closing :)

It is interesting that these jobs showed as TaskScheduled rather than running, though, and that I was unable to get any meaningful logs out until the run had completed.

microsoft-github-policy-service bot added the Needs: Triage 🔍 label Jan 28, 2025

AnatoliB added P1 Priority 1 and removed Needs: Triage 🔍 labels Jan 28, 2025

AnatoliB self-assigned this Jan 28, 2025

JakeStanger closed this as completed Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activities and orchestrations stuck scheduled but not running for hours #3018

Activities and orchestrations stuck scheduled but not running for hours #3018

JakeStanger commented Jan 28, 2025

JakeStanger commented Jan 29, 2025 •

edited

Loading

Activities and orchestrations stuck scheduled but not running for hours #3018

Activities and orchestrations stuck scheduled but not running for hours #3018

Comments

JakeStanger commented Jan 28, 2025

Description

Expected behavior

Actual behavior

Relevant source code snippets

Known workarounds

App Details

Screenshots

If deployed to Azure

JakeStanger commented Jan 29, 2025 • edited Loading

JakeStanger commented Jan 29, 2025 •

edited

Loading