Job heartbeat retry with backoff #37378
Replies: 1 comment
-
We do have retries in specific DB calls when the call is easily recoverable, you can look at Airflow code, usually with
or simply methods decorated with:
There is eve configuration parameter in Airflow ( This is don on specific transactions - If the call is in a DB transaction, where processing the transaction leaves some other in-memory side-effects and we can safely assume we can re-do such a transaction safely. So absolutely - no problem, if you find a specific place and transaction that errored out and could be safely retried, and you can argue about it and make PR changing it, it's ok to have it. But it needs to be a specific transaction, and has to be argumented that yes, it is safe to retry it, so you need to find the particular place, review the code to make sure it can be retried, and submit a PR. |
Beta Was this translation helpful? Give feedback.
-
Hi, we are currently running airflow 2.7.3 with an externally managed postgres instance and recently saw a job terminated due to
This issue seemed to be transient as rerunning the job was successful. It may be a good idea to be able to configure a retry with backoff in
Job.heartbeat
ensuring the number of total backoff time is less than theJob.heartrate
. Would anyone be against this functionality? If you are against this functionality it would be good to understand why.I am happy to contribute a change
Thanks,
Alfie
Beta Was this translation helpful? Give feedback.
All reactions