Replies: 7 comments 8 replies
-
Beta Was this translation helpful? Give feedback.
-
Hi @npk1994 - We would love to help. This might require analysis of your configurations. Technically Conductor can schedule and run the next task in sub millis. Here is a screen grab of the same flow running in one of our test environments. Some of the settings you can change is the polling intervals for tasks. You can also increase the number of system workers handling the HTTP tasks. |
Beta Was this translation helpful? Give feedback.
-
Hi @boney9, we are using the default OOB configuration as of now. We would like to understand what are the configurations can be added to reduce this time delay. |
Beta Was this translation helpful? Give feedback.
-
Consider this use case. But, when we set up this as a workflow in Conductor, the total time to complete the flow is around 1 second. There is a lot of time lost inside the Conductor. I am unsure if Conductor is for real-time applications (for use cases mentioned above). I may be wrong, but at this point, I am unable to proceed with the conductor. If there are any references on implementation patterns, do's / don'ts, on how to build a workflow and setup Conductor with HTTP tasks, with minimum possible extra time added by Conductor (not more than 5%-10% over the time taken by HTTP tasks themselves). |
Beta Was this translation helpful? Give feedback.
-
@npk1994 @krishnapyde In your workflow, both You can also reduce the poll interval further using |
Beta Was this translation helpful? Give feedback.
-
I would like to shed some light in this discussion. I am facing the same issue and I believe I found something that might explain the root cause, but would like to discuss here and confirm it. I am using conductor 2.30.3, but looking at the code, regardless of the version this should happen. I created a simple workflow very similar to the one that @npk1994 created. Only difference is that mine has 6 http tasks in sequence while the one from @npk1994 has only two. Here is my workflow: {
"createTime": 1670456718043,
"name": "test-parent-flat",
"description": "test-parent-flat",
"version": 1,
"tasks": [
{
"name": "test-parent-flat-task1",
"taskReferenceName": "test-parent-flat-task1",
"inputParameters": {
"http_request": {
"uri": "http://localhost:8443/api/health",
"method": "GET",
"accept": "application/json"
}
},
"type": "HTTP",
"decisionCases": {},
"defaultCase": [],
"forkTasks": [],
"startDelay": 0,
"joinOn": [],
"optional": false,
"taskDefinition": {
"name": "test-parent-flat-task1",
"retryCount": 0,
"timeoutSeconds": 1200,
"inputKeys": [],
"outputKeys": [],
"timeoutPolicy": "TIME_OUT_WF",
"retryLogic": "FIXED",
"retryDelaySeconds": 5,
"responseTimeoutSeconds": 1200,
"inputTemplate": {},
"rateLimitPerFrequency": 0,
"rateLimitFrequencyInSeconds": 1
},
"defaultExclusiveJoinTask": [],
"asyncComplete": false,
"loopOver": []
},
{
"name": "test-parent-flat-task2",
"taskReferenceName": "test-parent-flat-task2",
"inputParameters": {
"http_request": {
"uri": "http://localhost:8443/api/health",
"method": "GET",
"accept": "application/json"
}
},
"type": "HTTP",
"decisionCases": {},
"defaultCase": [],
"forkTasks": [],
"startDelay": 0,
"joinOn": [],
"optional": false,
"taskDefinition": {
"name": "test-parent-flat-task2",
"retryCount": 0,
"timeoutSeconds": 1200,
"inputKeys": [],
"outputKeys": [],
"timeoutPolicy": "TIME_OUT_WF",
"retryLogic": "FIXED",
"retryDelaySeconds": 5,
"responseTimeoutSeconds": 1200,
"inputTemplate": {},
"rateLimitPerFrequency": 0,
"rateLimitFrequencyInSeconds": 1
},
"defaultExclusiveJoinTask": [],
"asyncComplete": false,
"loopOver": []
},
{
"name": "test-parent-flat-task3",
"taskReferenceName": "test-parent-flat-task3",
"inputParameters": {
"http_request": {
"uri": "http://localhost:8443/api/health",
"method": "GET",
"accept": "application/json"
}
},
"type": "HTTP",
"decisionCases": {},
"defaultCase": [],
"forkTasks": [],
"startDelay": 0,
"joinOn": [],
"optional": false,
"taskDefinition": {
"name": "test-parent-flat-task3",
"retryCount": 0,
"timeoutSeconds": 1200,
"inputKeys": [],
"outputKeys": [],
"timeoutPolicy": "TIME_OUT_WF",
"retryLogic": "FIXED",
"retryDelaySeconds": 5,
"responseTimeoutSeconds": 1200,
"inputTemplate": {},
"rateLimitPerFrequency": 0,
"rateLimitFrequencyInSeconds": 1
},
"defaultExclusiveJoinTask": [],
"asyncComplete": false,
"loopOver": []
},
{
"name": "test-parent-flat-task4",
"taskReferenceName": "test-parent-flat-task4",
"inputParameters": {
"http_request": {
"uri": "http://localhost:8443/api/health",
"method": "GET",
"accept": "application/json"
}
},
"type": "HTTP",
"decisionCases": {},
"defaultCase": [],
"forkTasks": [],
"startDelay": 0,
"joinOn": [],
"optional": false,
"taskDefinition": {
"name": "test-parent-flat-task4",
"retryCount": 0,
"timeoutSeconds": 1200,
"inputKeys": [],
"outputKeys": [],
"timeoutPolicy": "TIME_OUT_WF",
"retryLogic": "FIXED",
"retryDelaySeconds": 5,
"responseTimeoutSeconds": 1200,
"inputTemplate": {},
"rateLimitPerFrequency": 0,
"rateLimitFrequencyInSeconds": 1
},
"defaultExclusiveJoinTask": [],
"asyncComplete": false,
"loopOver": []
},
{
"name": "test-parent-flat-task5",
"taskReferenceName": "test-parent-flat-task5",
"inputParameters": {
"http_request": {
"uri": "http://localhost:8443/api/health",
"method": "GET",
"accept": "application/json"
}
},
"type": "HTTP",
"decisionCases": {},
"defaultCase": [],
"forkTasks": [],
"startDelay": 0,
"joinOn": [],
"optional": false,
"taskDefinition": {
"name": "test-parent-flat-task5",
"retryCount": 0,
"timeoutSeconds": 1200,
"inputKeys": [],
"outputKeys": [],
"timeoutPolicy": "TIME_OUT_WF",
"retryLogic": "FIXED",
"retryDelaySeconds": 5,
"responseTimeoutSeconds": 1200,
"inputTemplate": {},
"rateLimitPerFrequency": 0,
"rateLimitFrequencyInSeconds": 1
},
"defaultExclusiveJoinTask": [],
"asyncComplete": false,
"loopOver": []
},
{
"name": "test-parent-flat-task6",
"taskReferenceName": "test-parent-flat-task6",
"inputParameters": {
"http_request": {
"uri": "http://localhost:8443/api/health",
"method": "GET",
"accept": "application/json"
}
},
"type": "HTTP",
"decisionCases": {},
"defaultCase": [],
"forkTasks": [],
"startDelay": 0,
"joinOn": [],
"optional": false,
"taskDefinition": {
"name": "test-parent-flat-task6",
"retryCount": 0,
"timeoutSeconds": 1200,
"inputKeys": [],
"outputKeys": [],
"timeoutPolicy": "TIME_OUT_WF",
"retryLogic": "FIXED",
"retryDelaySeconds": 5,
"responseTimeoutSeconds": 1200,
"inputTemplate": {},
"rateLimitPerFrequency": 0,
"rateLimitFrequencyInSeconds": 1
},
"defaultExclusiveJoinTask": [],
"asyncComplete": false,
"loopOver": []
}
],
"inputParameters": [],
"outputParameters": {},
"schemaVersion": 2,
"restartable": true,
"workflowStatusListenerEnabled": true,
"ownerEmail": "[email protected]",
"timeoutPolicy": "ALERT_ONLY",
"timeoutSeconds": 0,
"variables": {}
} This workflows has six repeated tasks that is calling the health check endpoint of conductor, i.e., localhost. Each call shouldn't take more than 100ms max (being very extreme here!) which in the end would lead in the very worst case scenario of total workflow execution time of max 600ms. I am executing this multiple times and I am seeing times like 2 seconds, 2.5 seconds which is unacceptable. From conductor code, here is the sequence that happens:
With this basic understanding, I enabled debug logs in conductor and started to observe that the tasks are being scheduled, but I am seeing a considerable amount of time for the start time of the task. As an example, looks at this one: This is 230ms just to start executing the task. What is causing this is because there is a 200ms static blocking sleep call in the I created a sequence diagram (based in conductor 2.X) to try to explain it: See what happens with Task 2 (T2): It is scheduled just 3ms after the start o Polling cycle 3 (P3). And the cycle now is blocked by the above mentioned code block. It will only unblock after 200ms (Conductor 2.X) and 100ms (Conductor 3.X). Only after unblocked T2 will be fecthed from the queue. This delays T2. Note that there are properties for system task workers that could be configured. I already tried:
Does above make sense? If yes, how avoid this? It is impacting important use cases and any help is appreciated. |
Beta Was this translation helpful? Give feedback.
-
Hi. We're also considering Conductor for a real-time use case. Has this issue been solved? |
Beta Was this translation helpful? Give feedback.
-
Hi Team, we have deployed the conductor server in Kuberenetes setup and we have configured a sample workflow with 2 http tasks. (Both the tasks hits the service deployed in the same environment). We are not using the persistent layers (in memory) with Conductor 3.8.0
Below is the configuration of our workflow
Overall workflow execution is taking 591ms. First and second task took 107ms and 86ms. The conductor is taking 400ms. Timegap between End of Task1 and Start of the task2 is around 200ms.
I have added server Log file from the beginning of the workflow.
Is this normal time delay expected between the tasks. Is there anyway to bring this time down. Would like to hear your suggestions.
server-wf-execution-log-timestmp.txt
Beta Was this translation helpful? Give feedback.
All reactions