[FM-751] Add system task offset evaluation strategy (#179)

* FM-751 Add system task offset evaluation strategy - Added option to customize strategy used for computation of a postponed system task per task type: conductor.app.system-task-offset-evaluation.[task-type]=[strategy] [task-type] - type of the task, e.g. join, simple, ... [strategy] - strategy used for computation of the system task offset; currently supported options are: a. 'constant_default_offset' b. 'backoff_to_default_offset' c. 'scaled_by_queue_size' - 'constant_default_offset' - uses constant value of set 'systemTaskWorkerCallbackDuration' configuration property; by default, it is used by all but 'join' system tasks - 'backoff_to_default_offset' - scales offset based on task poll-count in exponential way (2^n) up to value of the 'systemTaskWorkerCallbackDuration' configuration property; by default, it is used by 'join' system task - 'scaled_by_queue_size' - scales offset based on task poll-count and actual queue size in exponential way (2^n) up to value of: a. 'backoff_to_default_offset', if queue size == 0 b. 'backoff_to_default_offset'*'queue_size' otherwise this strategy is not used in the default configuration - Implemented new 'scaled_by_queue_size' strategy is appropriate for relatively big queues (100-1000s tasks) that contain long-running tasks (days-weeks) with high number of poll-counts. Reasoning: - New strategy was implemented primarily to solve performance issues on join queues that contain a large number of join tasks blocked by wait/human actions in some forks for several days/weeks. - Implemented strategies can easily be extended in the future while preserving backwards compatibility. - Improved configurability of the task offset evaluation. * FM-751 Change default offset strategy of join task - from BACKOFF_TO_DEFAULT_OFFSET - to SCALED_BY_QUEUE_SIZE * FM-751 Split OffsetEvaluationStrategy and implementations - goal: cleaner goals, separated configuration and implementation aspects - we can directly inject ConductorProperties into implementations of strategies that are represented by Spring components - introduction of TaskOffsetEvaluationSelector that allows other component to load implementation of specific strategy * FM-751 Add config property for SCALED_BY_TASK_DURATION strategy * FM-751 Implement ScaledByTaskDurationOffsetEvaluation - Computes the evaluation offset for a postponed task based on the task's duration and settings that define the offset for different levels of task durations. - In this strategy offset increases by steps based on settings that define the offset for different levels of task durations. Task duration is derived from {@link TaskModel#getScheduledTime()} and current time. - This strategy is appropriate for tasks that have a wide range of durations and the offset should be scaled based on the task's duration. - The defined keys in the settings compose the duration intervals for which the offset will be set to the corresponding value: <0, d1) = 0, <d1, d2) = d1, <d2, d3) = d2. - The order of the keys is not important as the map is sorted by the key before the evaluation. * FM-751 Revert offset settings to default value
FRINXio · Jan 8, 2025 · 54a66d3 · 54a66d3
1 parent bf6753a
commit 54a66d3
Show file tree

Hide file tree

Showing 16 changed files with 625 additions and 26 deletions.
diff --git a/core/src/main/java/com/netflix/conductor/core/config/ConductorProperties.java b/core/src/main/java/com/netflix/conductor/core/config/ConductorProperties.java
@@ -14,6 +14,7 @@
 
 import java.time.Duration;
 import java.time.temporal.ChronoUnit;
+import java.util.Collections;
 import java.util.HashMap;
 import java.util.Map;
 import java.util.Properties;
@@ -24,6 +25,8 @@
 import org.springframework.util.unit.DataSize;
 import org.springframework.util.unit.DataUnit;
 
+import com.netflix.conductor.common.metadata.tasks.TaskType;
+
 @ConfigurationProperties("conductor.app")
 public class ConductorProperties {
 
@@ -97,6 +100,33 @@ public class ConductorProperties {
     @DurationUnit(ChronoUnit.SECONDS)
     private Duration systemTaskWorkerCallbackDuration = Duration.ofSeconds(30);
 
+    /**
+     * The strategy to be used for evaluation of the offset for a postponed system task of certain
+     * type.<br>
+     * Tasks that are not listed here use {@link
+     * ConductorProperties#systemTaskWorkerCallbackDuration} value.
+     */
+    private Map<TaskType, OffsetEvaluationStrategy> systemTaskOffsetEvaluation =
+            Map.of(TaskType.JOIN, OffsetEvaluationStrategy.BACKOFF_TO_DEFAULT_OFFSET);
+
+    /**
+     * The duration of the task execution mapped to the calculated offset of the postponed task
+     * [seconds].<br>
+     * This setting is used only by the {@link OffsetEvaluationStrategy#SCALED_BY_TASK_DURATION}
+     * offset evaluation strategy.<br>
+     * Example: If settings contain two entries (10, 30) and (20, 60), then the evaluation offsets
+     * for the postponed tasks in the queue will be calculated according to the following intervals:
+     *
+     * <ul>
+     *   <li><0,10) seconds: offset = 0 seconds
+     *   <li><10,20) seconds: offset = 30 seconds
+     *   <li><20,N) seconds: offset = 60 seconds
+     * </ul>
+     *
+     * By default, the offset is always set to 0 seconds.
+     */
+    private Map<Long, Long> taskDurationToOffsetSteps = Collections.emptyMap();
+
     /**
      * The interval (in milliseconds) at which system task queues will be polled by the system task
      * workers.
@@ -353,6 +383,23 @@ public Duration getSystemTaskWorkerCallbackDuration() {
         return systemTaskWorkerCallbackDuration;
     }
 
+    public void setSystemTaskOffsetEvaluation(
+            final Map<TaskType, OffsetEvaluationStrategy> systemTaskOffsetEvaluation) {
+        this.systemTaskOffsetEvaluation = systemTaskOffsetEvaluation;
+    }
+
+    public Map<TaskType, OffsetEvaluationStrategy> getSystemTaskOffsetEvaluation() {
+        return systemTaskOffsetEvaluation;
+    }
+
+    public Map<Long, Long> getTaskDurationToOffsetSteps() {
+        return taskDurationToOffsetSteps;
+    }
+
+    public void setTaskDurationToOffsetSteps(Map<Long, Long> taskDurationToOffsetSteps) {
+        this.taskDurationToOffsetSteps = taskDurationToOffsetSteps;
+    }
+
     public void setSystemTaskWorkerCallbackDuration(Duration systemTaskWorkerCallbackDuration) {
         this.systemTaskWorkerCallbackDuration = systemTaskWorkerCallbackDuration;
     }

diff --git a/core/src/main/java/com/netflix/conductor/core/config/OffsetEvaluationStrategy.java b/core/src/main/java/com/netflix/conductor/core/config/OffsetEvaluationStrategy.java
@@ -0,0 +1,42 @@
+/*
+ * Copyright 2024 Netflix, Inc.
+ * <p>
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+ * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations under the License.
+ */
+package com.netflix.conductor.core.config;
+
+/**
+ * Strategies used for computation of the task offset. The offset is used to postpone the task
+ * execution in the queue.
+ */
+public enum OffsetEvaluationStrategy {
+    /** Constant offset evaluation strategy - using default offset value. */
+    CONSTANT_DEFAULT_OFFSET,
+    /**
+     * Computes the evaluation offset for a postponed task based on the task's poll count and a
+     * default offset. In this strategy offset increases exponentially until it reaches the default
+     * offset.
+     */
+    BACKOFF_TO_DEFAULT_OFFSET,
+    /**
+     * Computes the evaluation offset for a postponed task based on the queue size and the task's
+     * poll count. In this strategy offset increases exponentially until it reaches the (default
+     * offset * queue size) value.
+     */
+    SCALED_BY_QUEUE_SIZE,
+    /**
+     * Computes the evaluation offset for a postponed task based on the task's duration. In this
+     * strategy offset increases by steps that are proportional to the task's duration and defined
+     * by the user settings.
+     *
+     * @see ConductorProperties#getTaskDurationToOffsetSteps() setting used to define the steps
+     */
+    SCALED_BY_TASK_DURATION
+}
diff --git a/core/src/main/java/com/netflix/conductor/core/execution/AsyncSystemTaskExecutor.java b/core/src/main/java/com/netflix/conductor/core/execution/AsyncSystemTaskExecutor.java
@@ -12,12 +12,17 @@
  */
 package com.netflix.conductor.core.execution;
 
+import java.util.Map;
+
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.stereotype.Component;
 
+import com.netflix.conductor.common.metadata.tasks.TaskType;
 import com.netflix.conductor.core.config.ConductorProperties;
+import com.netflix.conductor.core.config.OffsetEvaluationStrategy;
 import com.netflix.conductor.core.dal.ExecutionDAOFacade;
+import com.netflix.conductor.core.execution.offset.TaskOffsetEvaluationSelector;
 import com.netflix.conductor.core.execution.tasks.WorkflowSystemTask;
 import com.netflix.conductor.core.utils.QueueUtils;
 import com.netflix.conductor.dao.MetadataDAO;
@@ -33,25 +38,27 @@ public class AsyncSystemTaskExecutor {
     private final QueueDAO queueDAO;
     private final MetadataDAO metadataDAO;
     private final long queueTaskMessagePostponeSecs;
-    private final long systemTaskCallbackTime;
+    private final TaskOffsetEvaluationSelector taskOffsetEvaluationSelector;
     private final WorkflowExecutor workflowExecutor;
+    private final Map<TaskType, OffsetEvaluationStrategy> systemTaskOffsetEvaluation;
 
     private static final Logger LOGGER = LoggerFactory.getLogger(AsyncSystemTaskExecutor.class);
 
     public AsyncSystemTaskExecutor(
             ExecutionDAOFacade executionDAOFacade,
             QueueDAO queueDAO,
             MetadataDAO metadataDAO,
+            TaskOffsetEvaluationSelector taskOffsetEvaluationSelector,
             ConductorProperties conductorProperties,
             WorkflowExecutor workflowExecutor) {
         this.executionDAOFacade = executionDAOFacade;
         this.queueDAO = queueDAO;
         this.metadataDAO = metadataDAO;
+        this.taskOffsetEvaluationSelector = taskOffsetEvaluationSelector;
         this.workflowExecutor = workflowExecutor;
-        this.systemTaskCallbackTime =
-                conductorProperties.getSystemTaskWorkerCallbackDuration().getSeconds();
         this.queueTaskMessagePostponeSecs =
                 conductorProperties.getTaskExecutionPostponeDuration().getSeconds();
+        this.systemTaskOffsetEvaluation = conductorProperties.getSystemTaskOffsetEvaluation();
     }
 
     /**
@@ -164,12 +171,15 @@ public void execute(WorkflowSystemTask systemTask, String taskId) {
                 hasTaskExecutionCompleted = true;
                 LOGGER.debug("{} removed from queue: {}", task, queueName);
             } else {
-                task.setCallbackAfterSeconds(systemTaskCallbackTime);
-                systemTask
-                        .getEvaluationOffset(task, systemTaskCallbackTime)
-                        .ifPresentOrElse(
-                                task::setCallbackAfterSeconds,
-                                () -> task.setCallbackAfterSeconds(systemTaskCallbackTime));
+                final var evaluationStrategy =
+                        systemTaskOffsetEvaluation.getOrDefault(
+                                TaskType.of(task.getTaskType()),
+                                OffsetEvaluationStrategy.CONSTANT_DEFAULT_OFFSET);
+                final var callbackAfterSeconds =
+                        taskOffsetEvaluationSelector
+                                .taskOffsetEvaluation(evaluationStrategy)
+                                .computeEvaluationOffset(task, queueDAO.getSize(queueName));
+                task.setCallbackAfterSeconds(callbackAfterSeconds);
                 queueDAO.postpone(
                         queueName,
                         task.getTaskId(),

diff --git a/...in/java/com/netflix/conductor/core/execution/offset/BackoffToDefaultOffsetEvaluation.java b/...in/java/com/netflix/conductor/core/execution/offset/BackoffToDefaultOffsetEvaluation.java
@@ -0,0 +1,60 @@
+/*
+ * Copyright 2024 Netflix, Inc.
+ * <p>
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+ * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations under the License.
+ */
+package com.netflix.conductor.core.execution.offset;
+
+import org.springframework.stereotype.Component;
+
+import com.netflix.conductor.core.config.ConductorProperties;
+import com.netflix.conductor.core.config.OffsetEvaluationStrategy;
+import com.netflix.conductor.model.TaskModel;
+
+/**
+ * Computes the evaluation offset for a postponed task based on the task's poll count and a default
+ * offset. In this strategy offset increases exponentially until it reaches the default offset.<br>
+ * This strategy is appropriate for queues that require low latency of all tasks.<br>
+ * Sample evaluationOffset for different pollCounts and defaultOffset (queueSize is ignored):
+ *
+ * <table>
+ * <tr><th>pollCount</th><th>defaultOffset</th><th>evaluationOffset</th></tr>
+ * <tr><td>0</td><td>5</td><td>0</td></tr>
+ * <tr><td>1</td><td>5</td><td>0</td></tr>
+ * <tr><td>2</td><td>5</td><td>2</td></tr>
+ * <tr><td>3</td><td>5</td><td>4</td></tr>
+ * <tr><td>4</td><td>5</td><td>5</td></tr>
+ * <tr><td>4</td><td>10</td><td>8</td></tr>
+ * <tr><td>5</td><td>10</td><td>10</td></tr>
+ * </table>
+ */
+@Component
+final class BackoffToDefaultOffsetEvaluation implements TaskOffsetEvaluation {
+
+    private final long defaultOffset;
+
+    BackoffToDefaultOffsetEvaluation(final ConductorProperties conductorProperties) {
+        defaultOffset = conductorProperties.getSystemTaskWorkerCallbackDuration().toSeconds();
+    }
+
+    @Override
+    public OffsetEvaluationStrategy type() {
+        return OffsetEvaluationStrategy.BACKOFF_TO_DEFAULT_OFFSET;
+    }
+
+    @Override
+    public long computeEvaluationOffset(final TaskModel taskModel, final int queueSize) {
+        final int index = taskModel.getPollCount() > 0 ? taskModel.getPollCount() - 1 : 0;
+        if (index == 0) {
+            return 0L;
+        }
+        return Math.min((long) Math.pow(2, index), defaultOffset);
+    }
+}
diff --git a/...ain/java/com/netflix/conductor/core/execution/offset/ConstantDefaultOffsetEvaluation.java b/...ain/java/com/netflix/conductor/core/execution/offset/ConstantDefaultOffsetEvaluation.java
@@ -0,0 +1,40 @@
+/*
+ * Copyright 2024 Netflix, Inc.
+ * <p>
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+ * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations under the License.
+ */
+package com.netflix.conductor.core.execution.offset;
+
+import org.springframework.stereotype.Component;
+
+import com.netflix.conductor.core.config.ConductorProperties;
+import com.netflix.conductor.core.config.OffsetEvaluationStrategy;
+import com.netflix.conductor.model.TaskModel;
+
+/** Dummy implementation of {@link TaskOffsetEvaluation} that always returns the default offset. */
+@Component
+final class ConstantDefaultOffsetEvaluation implements TaskOffsetEvaluation {
+
+    private final long defaultOffset;
+
+    ConstantDefaultOffsetEvaluation(final ConductorProperties conductorProperties) {
+        defaultOffset = conductorProperties.getSystemTaskWorkerCallbackDuration().toSeconds();
+    }
+
+    @Override
+    public OffsetEvaluationStrategy type() {
+        return OffsetEvaluationStrategy.CONSTANT_DEFAULT_OFFSET;
+    }
+
+    @Override
+    public long computeEvaluationOffset(final TaskModel taskModel, final int queueSize) {
+        return defaultOffset;
+    }
+}
diff --git a/...n/java/com/netflix/conductor/core/execution/offset/ScaledByQueueSizeOffsetEvaluation.java b/...n/java/com/netflix/conductor/core/execution/offset/ScaledByQueueSizeOffsetEvaluation.java
@@ -0,0 +1,63 @@
+/*
+ * Copyright 2024 Netflix, Inc.
+ * <p>
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+ * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations under the License.
+ */
+package com.netflix.conductor.core.execution.offset;
+
+import org.springframework.stereotype.Component;
+
+import com.netflix.conductor.core.config.ConductorProperties;
+import com.netflix.conductor.core.config.OffsetEvaluationStrategy;
+import com.netflix.conductor.model.TaskModel;
+
+/**
+ * Computes the evaluation offset for a postponed task based on the queue size and the task's poll
+ * count. In this strategy offset increases exponentially until it reaches the (default offset *
+ * queue size) value.<br>
+ * This strategy is appropriate for relatively big queues (100-1000s tasks) that contain
+ * long-running tasks (days-weeks) with high number of poll-counts.<br>
+ * Sample evaluationOffset for different pollCounts, defaultOffset and queueSize:
+ *
+ * <table>
+ * <tr><th>pollCount</th><th>defaultOffset</th><th>queueSize</th><th>evaluationOffset</th></tr>
+ * <tr><td>0</td><td>-</td><td>-</td><td>0</td></tr>
+ * <tr><td>1</td><td>-</td><td>-</td><td>0</td></tr>
+ * <tr><td>2</td><td>5</td><td>1</td><td>2</td></tr>
+ * <tr><td>3</td><td>5</td><td>1</td><td>4</td></tr>
+ * <tr><td>4</td><td>5</td><td>1</td><td>5</td></tr>
+ * <tr><td>4</td><td>5</td><td>0</td><td>5</td></tr>
+ * <tr><td>4</td><td>5</td><td>2</td><td>8</td></tr>
+ * </table>
+ */
+@Component
+final class ScaledByQueueSizeOffsetEvaluation implements TaskOffsetEvaluation {
+
+    private final long defaultOffset;
+
+    ScaledByQueueSizeOffsetEvaluation(final ConductorProperties conductorProperties) {
+        defaultOffset = conductorProperties.getSystemTaskWorkerCallbackDuration().toSeconds();
+    }
+
+    @Override
+    public OffsetEvaluationStrategy type() {
+        return OffsetEvaluationStrategy.SCALED_BY_QUEUE_SIZE;
+    }
+
+    @Override
+    public long computeEvaluationOffset(final TaskModel taskModel, final int queueSize) {
+        int index = taskModel.getPollCount() > 0 ? taskModel.getPollCount() - 1 : 0;
+        if (index == 0) {
+            return 0L;
+        }
+        final long scaledOffset = queueSize > 0 ? queueSize * defaultOffset : defaultOffset;
+        return Math.min((long) Math.pow(2, index), scaledOffset);
+    }
+}