Replies: 2 comments 6 replies
-
Is this happening for all tasks that are running when scheduler was restarted or only some of them ? Any more information /guesses of which tasks might be affected ? Also what kind of deployment you have - Kubernetes Executor? Kubernetes Celery Executor? Can you please elaborate more on that? BTW. Why do you have such frequent restarts of PGSQL/Kubnernetes API ? this seems like something you should address in general. This is very likely that the problem is not because if you have flakiness in DB and Kubernetes APIs, there is no way Airflow implementation can cover it up. Airlfow uses PRECISELY DB and Kubernetes APIs to store information and query for it so that it is able to recover from any kind of crashes, so if your flakiness is there in either of those, it might basically mean that Airlfow will not be able to recover (because the flakiness prevented it from storing the information necessary or querying it). This is rather unsolvable conundrum - we actually rely on stable deployment underneath. And I believe it should be handled first. |
Beta Was this translation helpful? Give feedback.
-
Converted it into discussion until more information is received/flakiness reasons investigated to get better understanding of what happens by the user. |
Beta Was this translation helpful? Give feedback.
-
Apache Airflow version
2.3.2
What happened
Scheduler was restarted.
After this it starts resetting some running tasks as orphaned.
I have seen #20982 which lists this as known issue for manually started tasks, but we also see it occasionally for scheduled tasks.
This issue appears to be a regression after 2.3 upgrade, i don't recall ever seeing it in 2.2, now we experience it almost every time scheduler restarts. Which happen almost once per day due to crashes caused by connection flakiness to the Kubernetes API or PGSQL.
What you think should happen instead
Tasks should be adopted
How to reproduce
Start some tasks running on Kubernetes with KubernetesCeleryExecutor.
Restart scheduler
Scheduler logs show following:
I don't know what other logs that might be relevant.
Operating System
Debian GNU/Linux 11 (bullseye)
Versions of Apache Airflow Providers
apache-airflow==2.3.2
apache-airflow-client==2.1.0
apache-airflow-providers-celery==3.0.0
apache-airflow-providers-cncf-kubernetes==4.0.2
apache-airflow-providers-docker==3.0.0
apache-airflow-providers-ftp==2.1.2
apache-airflow-providers-http==2.1.2
apache-airflow-providers-imap==2.2.3
apache-airflow-providers-postgres==5.0.0
apache-airflow-providers-sqlite==2.1.3
Deployment
Other Docker-based deployment
Deployment details
Pgsql as database
Anything else
Almost every time scheduler restarts
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions