Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks fails without logs under heavy load #45078

Open
1 of 2 tasks
team-hawking-stam opened this issue Dec 19, 2024 · 1 comment
Open
1 of 2 tasks

Tasks fails without logs under heavy load #45078

team-hawking-stam opened this issue Dec 19, 2024 · 1 comment
Labels
area:core area:logging kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet provider:cncf-kubernetes Kubernetes provider related issues

Comments

@team-hawking-stam
Copy link

team-hawking-stam commented Dec 19, 2024

Apache Airflow version

2.10.4

If "Other Airflow 2 version" selected, which one?

No response

What happened?

I have multiple dag_run of a dag, running parallel on a kubernetes cluster with a single worker pod.
I use 16 as parallelism and a retry_count of 4.
This dags is composed of mapped_tasks. The bigger one spawns 36 mapped task.
Every day 100 dag_run will be spawned toghter and the dag_run with most task will fail with 3/4 mapped tasks failed.
Those tasks fails after 4 retry, but most of the times i see only 1 or 2 logs of execution.
Most of the time the log is :

[2024-12-19T11:06:21.433+0000] {local_task_job_runner.py:123} INFO - ::group::Pre task execution logs
[2024-12-19T11:06:21.792+0000] {taskinstance.py:2603} INFO - Dependencies not met for <TaskInstance: ExportSii.XMLGeneration manual__2024-12-19T11:05:23.599025+00:00 map_index=9 [up_for_retry]>, dependency 'Not In Retry Period' FAILED: Task is not ready for retry yet but will be retried automatically. Current date is 2024-12-19T11:06:21.791874+00:00 and task will be retried at 2024-12-19T11:06:44.472566+00:00.
[2024-12-19T11:06:21.805+0000] {local_task_job_runner.py:166} INFO - Task is not able to be run

This for example is attempt=2.log and i dont have 1,3 or 4. Neither in logs or in the UI.

Then when I clear the state of failed tasks they will run correctly without errors.

What you think should happen instead?

I would like to see all the attempt, and a more clear trace of what happened so i can debug the problem.

How to reproduce

It's mostly dependent on the workload. On another istance with the same code, but less stress it doesn't happen.

Operating System

helm-chart on kubernetes

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

Kubernetes on GKE

Anything else?

I dont have any error at kubernetes level or on the worker log

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@team-hawking-stam team-hawking-stam added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Dec 19, 2024
Copy link

boring-cyborg bot commented Dec 19, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@dosubot dosubot bot added area:logging provider:cncf-kubernetes Kubernetes provider related issues labels Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core area:logging kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

No branches or pull requests

1 participant