Adding max_retries for ECSOperator #13725
-
Would it be valuable to add a max_retries param to the ECSOperator? In my use-case, I have a DAG that uses the ECSOperator to start a Task, but sometimes my ECS cluster is at capacity and the ECSOperator fails. Instead, if the ECSOperator retried a couple of times (say 3 times over the period of 5 mins), it would give my ECS cluster time to scale-out and accomodate the Task being created by the ECSOperator. What does everyone think? BTW I'm a first time poster here, and am looking forward to contribute to this project. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Sure. Go for it. We have similar retry mechanisms implemented in 'google" provider. This is especially useful if you have a way to distinguish the "capacity" (transient) error from any other "permanent" one): See here for example (this is "quota_retry" but there are few others): We used tenacity to provide a capability of exponential back-off for such retries and we recommend the same approach - this way you are even better in handling "big" spikes. We implemented it as decorators, so it could be applied to a wide range of Google Operators (and some Google APIs have built-in Retry capability in which case we used the built-in ones. Maybe you could also work out similar pattern for many Amazon operators? |
Beta Was this translation helpful? Give feedback.
-
Thanks for the quick (and helpful) response. I'll take a look at the existing pattern in the Google Operators. If I have any questions, where should I post them? Thanks again! |
Beta Was this translation helpful? Give feedback.
Sure. Go for it. We have similar retry mechanisms implemented in 'google" provider.
This is especially useful if you have a way to distinguish the "capacity" (transient) error from any other "permanent" one):
See here for example (this is "quota_retry" but there are few others):
airflow/airflow/providers/google/common/hooks/base_google.py
Line 359 in 2f79fb9
We used tenacity to provide a capability of exponential back-off for such retries and we recommend the same approach - this way you are even better in handling "big" spikes.
We implemented it as decorators, so it could be applied to a wide range of Google Operators (and s…