Scheduler throws an error "Couldn't find dag in DagBag/DB" on each startup. #27711
Replies: 3 comments
-
You need to provide more information about the exact logs and more details about stack trace you get. I believe this is not what you think it is - you have not explained in which log and which context the error is printed, and most likely you have some misconfiguraiton or your own code that triggers that error (and you should corrrect it there). But witthout information and details on which component and where and in what context the error is displayed, it is diffiicult to help to find you the place you should correct (that is at least my understanding what happens in your case - but it should be confirmed by more context). It's also lilkely that for some reason your S3 syncs triggers some way of re-serializing the dags to the Database. Anyway, those are wild guesses. only and exact contest of where the error is generated - with more logs and context, stack-traces and surrounding lines - is needed to help you in investigating the cause. |
Beta Was this translation helpful? Give feedback.
-
Please let me know if this is enough or you need more. So the errors stops to be printed out when dag processor finishes to parse all of them
|
Beta Was this translation helpful? Give feedback.
-
I believe you have duplicated dag_ids in different files. The problem seems to be that you have is that some of the dag_ids are duplicated and scheduler thinks that they are in one of the files but they are in the other and that's why it gets confused. Note that due to dynamic nature of python files/generated DAGs, we cannot check it upfront, and we are not able to distinguish the situation where dag_id moved from one file to the other from plain deletion of the dag_id, so we cannot automatically determine if the dag_id has been moved or whether it is duplicated, so you have to make sure that you have unique dag_ids across all the dag files of yours independently. |
Beta Was this translation helpful? Give feedback.
-
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
Hi!
Airflow version 2.4.2
On every start of airflow scheduler I am getting an error “Couldn’t find dag %s in DagBag/DB!”. However the error disappears when all dags are processed by DagFileProcessorAgent. I noticed that agent runs in background updating
dag
andserialized_dag
tables. However when I go through the code I can’t understand why dags are not fetched from a database first rather than waiting for dags to be processed.What you think should happen instead
If scheduler is requiring to run an
DagFileProcessorAgent
on each start up then maybe it should sync with the first iteration of agent before being healthyHow to reproduce
Helm chart 1.7.0.
dag
,dag_code
,serialized_dag
tables are having dags data)Operating System
Linux - official airflow image from docker hub apache/airflow:slim-2.4.2
Versions of Apache Airflow Providers
apache-airflow[statsd]
apache-airflow-providers-amazon~=6.0.0
apache-airflow-providers-cncf-kubernetes~=4.4.0
apache-airflow-providers-databricks~=3.3.0
apache-airflow-providers-hashicorp~=3.1.0
apache-airflow-providers-postgres~=5.2.2
Deployment
Official Apache Airflow Helm Chart
Deployment details
k8s version - 1.23
Anything else
No response
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions