Distributed locking for conductor #2600
-
Hi, guys. Sometimes I face "weird" problems with conductor, specially when I am stressing it a lot (thousands of workflow executions in parallel). Problems like workflow stuck with task in IN_PROGRESS even tough task was already completed, workflow termination abruptly, etc. I can't say 100% sure, but doing a quick search on the open issues, lots of them pointing to questions like "What distributed locking are you using?". I understand this type of situation if I had two conductor instances running in parallel, but I just have one. In any case, I would like to understand:
I could only find this section in the docs, but this is very high level. I would like some guidance here if possible. If that matters, my current configuration is a single conductor instance with postgres as database. No redis and no dynomite. Thank you in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
@flavioschuindt The problems that you have mentioned above could be caused by concurrency issues to prevent which the distributed locking interface was introduced to Conductor.
In this case, (in the absence of distributed lock) there could be two or more separate threads on the same instance that could be evaluating a given workflow state at the same time.
In either of these cases, there are race conditions between which thread updates the workflow state in the persistence layer. This leads to the workflow being in an inconsistent state which cannot be auto-recovered from.
There are two options available for this - redis-lock or zookeeper-lock. The specific steps to setup either of these systems should be available on the specific product pages. The integration with Conductor is completely configuration-driven by setting the properties for redis or zookeeper. Additionally, you would need to enable locking using the property - |
Beta Was this translation helpful? Give feedback.
-
Hey @apanicker-nflx, very good intro. Thanks for that. So, is there any recommendation from conductor between redis-lock and zookeeper-lock? What are the pros and cons of each one specifically in the context of conductor? I will start to explore those, but if you have any inisght already that would be good to share. By the way, when you say "there could be two or more separate threads", are you referring to a single worker instance, but multiple threads configured due to the parameter |
Beta Was this translation helpful? Give feedback.
-
In the context of Conductor, if you are already using redis-persistence, it would be easier for you to setup redis-lock. Other than that, functionally both of these implementations work well as per our testing. We noticed that the implementation using redis was more performant at higher loads, however the difference between the two implementations is not significant.
No, that would be on the client. What I am referring to is -
|
Beta Was this translation helpful? Give feedback.
In the context of Conductor, if you are already using redis-persistence, it would be easier for you to setup redis-lock. Other than that, functionally both of these implementations work well as per our testing. We noticed that the implementation using redis was more performant at higher loads, however the difference between the two implementations is not significant.