Use separate dispatchers for MemoryQueue, QueueManager, and Akka heartbeat #5549

style95 · 2025-08-29T23:15:08Z

This is to make the system more stable. There are generally many numbers of queues running in a scheduler.
One queue will spawn multiple actors, and there could be a huge number of actors.
There are some critical actors to maintain the sanity of the system like akka-heartbeat.
This change will ensure isolating the performance impact of memory queues and guaranteeing the akka-heartbeat is not being starved.

Description

Related issue and scope

I opened an issue to propose and discuss this change (#????)

My changes affect the following components

Types of changes

Bug fix (generally a non-breaking change which closes an issue).
Enhancement or new feature (adds new functionality).
Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

I signed an Apache CLA.
I reviewed the style guides and followed the recommendations (Travis CI will check :).
I added tests to cover my changes.
My changes require further changes to the documentation.
I updated the documentation where necessary.

…tbeat

style95 · 2025-08-29T23:15:32Z

common/scala/src/main/resources/application.conf

@@ -41,6 +41,26 @@ akka.http {
        preview.enable-http2 = on
        parsing.illegal-header-warnings = off
    }
+
+    cluster {
+        use-dispatcher = "dispatchers.heartbeat-dispatcher"


I assigned a separate dispatcher for the akka-cluster heartbeat.

style95 · 2025-08-29T23:24:34Z

common/scala/src/main/resources/application.conf

@@ -72,7 +92,7 @@ kamon {
      service = "openwhisk-statsd"
    }
    metric {
-        tick-interval = 1 second
+        tick-interval = 10 second


This is one of the main changes. According to my analysis, it seems there is a leak in the Kamon metric.
As a scheduler is running longer, there are a huge number of MetricSnapshot instances created.

Below is the heap dump for a scheduler when it faced a thread starvation.

There are 97M numbers of scala.collection.immutable.$colon$colon and 96M numbers of kamon.metric.Instrument$Snapshot.

The reference dominator of scala.collection.immutable.$colon$colon is mostly MetricSnapshot.

Also, kamon.metric.Instrument$Snapshot is mostly referenced by scala.collection.immutable.$colon$colon, in turn, it results in MetricSnapshot.

All components other than MemoryQueue are emitting metrics with 10s intervals. So I updated the metric emission interval of MemoryQueue to 10s as well.
Since now we emit all metrics every 10s, we don't need to use a smaller tick interval like 1s because it will try to create a snapshot every 1 second, but the metric itself remains unchanged for 10s because we don't emit them in the middle of the interval(10s).

style95 · 2025-08-29T23:27:56Z

core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/queue/MemoryQueue.scala

@@ -181,7 +182,7 @@ class MemoryQueue(private val etcdClient: EtcdClient,
  private[queue] var limit: Option[Int] = None
  private[queue] var initialized = false

-  private val logScheduler: Cancellable = context.system.scheduler.scheduleWithFixedDelay(0.seconds, 1.seconds) { () =>
+  private val logScheduler: Cancellable = context.system.scheduler.scheduleWithFixedDelay(0.seconds, 10.seconds) { () =>


This was emitting 5 metrics every 1 second. If there are 400 queues running, they will emit around 2000 metrics per second. Considering the fact that one memory queue will spawn multiple sub-actors, and the combination with the use of CachedThreadPool, which spawns an unlimited number of actors on demand, it caused thread starvation.

style95 · 2025-08-29T23:28:21Z

core/scheduler/src/main/resources/reference.conf

@@ -0,0 +1,37 @@
+dispatchers {


I introduced separate dispatchers to guarantee performance and minimize performance impact.
I used the fork-join-executor as their jobs are mostly CPU-bound work.

dgrove-oss

LGTM. Thanks for the detailed comments explaining the PR.

Use separate dispatchers for MemoryQueue, QueueManager, and Akka hear…

a2c0254

…tbeat

style95 commented Aug 29, 2025

View reviewed changes

Add apache license header

d067e58

style95 requested review from rabbah, dgrove-oss and bdoyle0182 August 29, 2025 23:33

dgrove-oss approved these changes Aug 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use separate dispatchers for MemoryQueue, QueueManager, and Akka heartbeat #5549

Use separate dispatchers for MemoryQueue, QueueManager, and Akka heartbeat #5549

Uh oh!

style95 commented Aug 29, 2025 •

edited

Loading

Uh oh!

style95 Aug 29, 2025

Uh oh!

style95 Aug 29, 2025

Uh oh!

style95 Aug 29, 2025

Uh oh!

style95 Aug 29, 2025

Uh oh!

dgrove-oss left a comment

Uh oh!

Uh oh!

Use separate dispatchers for MemoryQueue, QueueManager, and Akka heartbeat #5549

Are you sure you want to change the base?

Use separate dispatchers for MemoryQueue, QueueManager, and Akka heartbeat #5549

Uh oh!

Conversation

style95 commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issue and scope

My changes affect the following components

Types of changes

Checklist:

Uh oh!

style95 Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

style95 Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

style95 Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

style95 Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

dgrove-oss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

style95 commented Aug 29, 2025 •

edited

Loading