Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support concurrent logs collection for containers in KubernetesPodOperator #45061

Open
2 tasks done
mrk-andreev opened this issue Dec 18, 2024 · 5 comments
Open
2 tasks done
Labels
kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet provider:cncf-kubernetes Kubernetes provider related issues

Comments

@mrk-andreev
Copy link
Contributor

Description

During the discussion in PR #43853 with @dstandish, we noticed that there are two separate implementation flows for collecting logs from init_containers and containers that do similar things.

The current implementation processes logs sequentially, which is correct for init_containers because they are executed one after another—each starting only after the previous one has successfully completed.

However, for regular containers (not init_containers), this logic isn’t fully accurate since they can run in parallel.

Related code are https://github.com/apache/airflow/blob/main/providers/src/airflow/providers/cncf/kubernetes/utils/pod_manager.py#L617

I suggest to implement concurrent logs collection for containers.

Use case/motivation

This might cause issues during long-running executions, as we won’t see logs from other containers until one of them has completed.

Related issues

#43853

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@mrk-andreev mrk-andreev added kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet labels Dec 18, 2024
Copy link

boring-cyborg bot commented Dec 18, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@mrk-andreev
Copy link
Contributor Author

@potiuk , @eladkal , @dstandish

What do you think about that?

@dosubot dosubot bot added the provider:cncf-kubernetes Kubernetes provider related issues label Dec 18, 2024
@potiuk
Copy link
Member

potiuk commented Dec 20, 2024

I think it would be a great idea to review and make it "common" and reused accross the components. Also we have another, related discussion about logging in #45079 where a solutions is proposed to handle both - streaming logs and serving logs after task is completed - with regards to OOM issues.

Currently we merge and store logs in memory which - in some situations might in some situations lead to OOMs. And some refactoring and restructuring of the logging mechanism might be necessary to handle things better.

I think both of you @mrk-andreev and @jason810496 are very well skilled and prepared to handle that - under @dstandish @eladkal (and a little of mine) guidande and mentoring, so if you both work on this together to get both sides of logging improved.

I know it's not directly the same issue, but having a small team of skilled people to focus on both and collaborate, might be a great idea.

@dstandish - WDYT? Having the three of maintainers here to overlook it while Mark and LIU ZHE YOU work on those, might be a good idea to distract from other Airlfow 3 streams?

@jason810496
Copy link
Contributor

Hi @potiuk, I’m happy to get involved in the refactoring of the stream logs issue!

I might start by breaking down #45079 into some TODO tasks and proposing PRs. Do we need to collaborate on Slack, or is discussing on GitHub sufficient?

I think it would be a great idea to review and make it "common" and reused across the components.

From my perspective, it’s quite challenging to define a common interface for both use cases at this stage.
For #45079 , there are some additional logic to deal with metadata for logs.
Perhaps Mark and I could work in parallel for now, and later consolidate the streaming logs part into common modules, which might make it easier to identify shared utilities.

@potiuk
Copy link
Member

potiuk commented Dec 20, 2024

I might start by breaking down #45079 into some TODO tasks and proposing PRs. Do we need to collaborate on Slack, or is discussing on GitHub sufficient?

GitHub is enough.

From my perspective, it’s quite challenging to define a common interface for both use cases at this stage.
For #45079 , there are some additional logic to deal with metadata for logs.
Perhaps Mark and I could work in parallel for now, and later consolidate the streaming logs part into common modules, which might make it easier to identify shared utilities.

Yeah. I was more thinking about both of you starting to work in parallel and review and comment on each-other's PRs / issues with us overlooking it - since the subjects are very close (even if not the same) - having extra pairs of eyes on the work done while it is being done is always helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

No branches or pull requests

3 participants