Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activities and orchestrations stuck scheduled but not running for hours #3018

Closed
JakeStanger opened this issue Jan 28, 2025 · 1 comment
Closed
Assignees
Labels
P1 Priority 1

Comments

@JakeStanger
Copy link

Description

We are observing that approximately 50% of the time, our sub-orchestrations or activities become stuck in the Pending or TaskScheduled states, taking many hours to become unstuck and progress again.

Possibly a red herring, but the Application Insights logging within the Azure portal seems to suggest that the sub-orchestrator job "forks" itself, with one copy using the parent ID and the other using its own unique ID.

These two orchestrator run entries show what started as the same job with the same timestamp.
Image

This behaviour has been observed using both a dedicated (newly provisioned) storage account, and using the Netherite storage provider.

Expected behavior

All activities and (sub)orchestrators should complete succesfully in a reasonable timeframe.

Actual behavior

Oftentimes the jobs appear to get stuck around the same point, taking a very long time to continue.

There is no other activity within the Functions app or the storage/event resources during this period.

Relevant source code snippets

Propritrary codebase, however can prepare relevant snippets on request.

Known workarounds

None

App Details

  • Durable Functions extension version: v3.0.2 (In-Process/WebJobs)
  • Azure Functions runtime version: ~4
  • Programming language used: C# (.NET 6)

All related dependencies:

<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.Storage.Queues" Version="5.2.0" />
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.DurableTask" Version="3.0.2" />
<PackageReference Include="Microsoft.Azure.DurableTask.Netherite.AzureFunctions" Version="3.0.0" />
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.Storage.Blobs" Version="5.2.1" />
<PackageReference Include="Microsoft.NET.Sdk.Functions" Version="4.3.0" />
<PackageReference Include="Microsoft.Azure.Functions.Extensions" Version="1.1.0" />
<PackageReference Include="Microsoft.Extensions.DependencyInjection" Version="6.0.1" />
<PackageReference Include="AzureFunctions.MvcModelBinding" Version="4.2.1" />

Screenshots

If applicable, add screenshots to help explain your problem.

Gantt chart showing issue from Durable Functions Monitor:

Image

If deployed to Azure

  • Timeframe issue observed: 2025-01-27T17:15:00 - 2025-01-28T10:31:00 (UTC)

  • Azure region: UK South

  • Orchestration instance ID(s): c11225bfd6ed45c9b73e147a5e204626, 2786e9f0fadd479fa3975b99638443fa:11

  • Function names:

    • Migration (top-level orchestrator)
    • Migration_contentTypes (sub-orchestrator, runs once)
    • Migration_contentTypes_sync (activity, runs in loop sequentially)

The Functions app is Consumption tier, running on Window 64bit (although this was also observed on 32bit). Scaling settings are left as default.

Resource names can be provided privately if required.

@JakeStanger
Copy link
Author

JakeStanger commented Jan 29, 2025

Looks like the issue is with another department. Turns out they've been getting "stuck" due to blocking on a long-running operation from Graph, which I didn't spot due to a bit of a lazy implementation and the fact it was handled by another library. I refactored to avoid blocking and use a more appropriate polling pattern, which revealed that sometimes these jobs are taking 17 hours instead of the usual few seconds. Apologies for wasting your time, closing :)

It is interesting that these jobs showed as TaskScheduled rather than running, though, and that I was unable to get any meaningful logs out until the run had completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 Priority 1
Projects
None yet
Development

No branches or pull requests

2 participants