Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Ease debugging of Elastic Agent providers #5324

Open
jlind23 opened this issue Aug 20, 2024 · 12 comments
Open

[Discuss] Ease debugging of Elastic Agent providers #5324

jlind23 opened this issue Aug 20, 2024 · 12 comments
Labels
Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@jlind23
Copy link
Contributor

jlind23 commented Aug 20, 2024

While working on some kubernetes issues we were stuck trying to figure who the leaders were.
As of today, the only option is to run the following command:

kubectl get leases.coordination.k8s.io -n kube-system | grep elastic-agent

In order to ease debugging it would be great to bubble up this information in Kibana UI somewhere in order to know:

  • Who the leader(s) is(are).
  • When was the lease acquired.
  • ..

This brought a global discussion of what are the information each providers should return and make available:

  • Status: Enable/Disable
  • ...

@nimarezainia @strawgate happy to get your thoughts on this.

cc @ycombinator @blakerouse as you recently worked on similar cases.

@jlind23 jlind23 added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Aug 20, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@jlind23 jlind23 changed the title [Discuss] Kubernetes Leader Election provider [Discuss] Ease debugging of Kubernetes Leader Election process Aug 20, 2024
@nimarezainia
Copy link
Contributor

@jlind23 in your debugging, what was the workflow? As in once you found which agent is the nominated leader, what was the next steps you were trying to perform? I'm trying to figure out where this information should reside. Perhaps it can just be part of the local metadata and not something that the UI needs to show.

@jlind23
Copy link
Contributor Author

jlind23 commented Aug 21, 2024

We wanted to ensure why one of the data exclusively collected by the leader wasn't sent hence the reason why we were looking for the leader.

@jlind23
Copy link
Contributor Author

jlind23 commented Aug 21, 2024

@blakerouse @ycombinator we need to report it through the Elastic Agent metadata, would you happen to know how complex this will be? Once reported we can do whatever we want with it in the UI.

@strawgate
Copy link
Contributor

With the Helm Chart, do we actually use leader election?

@jlind23
Copy link
Contributor Author

jlind23 commented Aug 21, 2024

According to @pkoutsovasilis' demo I don't think we do.

@blakerouse
Copy link
Contributor

@jlind23 The leader election is its own provider and not something that has any connection to Fleet or updating the overall core state of the Elastic Agent. It will be difficult to connect the two, so its not a quick change.

It should be possible to see that the state_* units are only running on the Elastic Agent that has leader election is on.

@pkoutsovasilis
Copy link
Contributor

Hi 👋 So the Helm chart for the built-in kubernetes integration and standalone mode disables leader election as it deploys multiple agents under daemonset for node-scope metrics and containers logs, deployment for cluster-scope metrics and statefulset with kube-state-metrics container alongside the agent on to monitor kube-state-metrics, thus no need for leader election. On the contrary, the same "topology" isn't possible for managed agents through Fleet, since config now is controlled by the latter, thus in that scenario the Helm chart doesn't disable it.

Thinking out loud since an agent instance knows whether it is the leader or not, and when it won the won/lost election can't this be propagated to Kibana?!

@blakerouse
Copy link
Contributor

blakerouse commented Aug 21, 2024

Hi 👋 So the Helm chart for the built-in kubernetes integration and standalone mode disables leader election as it deploys multiple agents under daemonset for node-scope metrics and containers logs, deployment for cluster-scope metrics and statefulset with kube-state-metrics container alongside the agent on to monitor kube-state-metrics, thus no need for leader election. On the contrary, the same "topology" isn't possible for managed agents through Fleet, since config now is controlled by the latter, thus in that scenario the Helm chart doesn't disable it.

It is actually possible to have the Elastic Agent deployed as a deployment with kube-state-metrics and enrolled into Fleet if that Elastic Agent was enrolled into a custom policy that only enabled state_* metrics. Another option would be to set an ENV on the container and then add a condition on the integration for that ENV, so that only the container with that ENV variable would run the state_* metrics.

Just want to make it clear that it is possible, but the current way the integration and the manifests are designed it doesn't operate that way.

This is not a limitation of the Elastic Agent, its just a limitation on how the manifests and integrations have been designed.

Thinking out loud since an agent instance knows whether it is the leader or not, and when it won the won/lost election can't this be propagated to Kibana?!

It is absolutely possible, but not something that is directly wired into the Elastic Agent currently. If we wanted to add this information to Kibana it might be better to add extra information from other providers as well. Possible that each provider could publish a status (just like components). That would also allow say the kubernetes provider in a non-kubernetes environment to say its not running as its unable to connect.

I think that also brings about the ability to configure providers in Fleet. Possible this just highlights that we should make providers a top-level thing in Fleet.

@pkoutsovasilis
Copy link
Contributor

It is actually possible to have the Elastic Agent deployed as a deployment with kube-state-metrics and enrolled into Fleet if that Elastic Agent was enrolled into a custom policy that only enabled state_* metrics. Another option would be to set an ENV on the container and then add a condition on the integration for that ENV, so that only the container with that ENV variable would run the state_* metrics.

yep I have done such an enrollment so it is possible; however somebody can enable other metrics in the integration which might results in undesired effects and there is no way to limit that at least as far as I can tell

Just want to make it clear that it is possible, but the current way the integration and the manifests are designed it doesn't operate that way.

yep 100% agree, the reason that made us take that decision with the Helm chart (not disabling leader election for managed mode) wasn't a limitation of Agent but rather how an integration, at least as of now, gets applied holistically

It is absolutely possible, but not something that is directly wired into the Elastic Agent currently. If we wanted to add this information to Kibana it might be better to add extra information from other providers as well. Possible that each provider could publish a status (just like components). That would also allow say the kubernetes provider in a non-kubernetes environment to say its not running as its unable to connect.

I think that also brings about the ability to configure providers in Fleet. Possible this just highlights that we should make providers a top-level thing in Fleet.

yep being able to configure providers in Fleet and expose them like components with a status does sound like a good addition to explore that could be helpful

@jlind23
Copy link
Contributor Author

jlind23 commented Aug 22, 2024

It is absolutely possible, but not something that is directly wired into the Elastic Agent currently. If we wanted to add this information to Kibana it might be better to add extra information from other providers as well. Possible that each provider could publish a status (just like components). That would also allow say the kubernetes provider in a non-kubernetes environment to say its not running as its unable to connect.
I think that also brings about the ability to configure providers in Fleet. Possible this just highlights that we should make providers a top-level thing in Fleet.

I am leaning towards updating this issue to focus on each providers and make sure they are returning the right set of informations.

@jlind23 jlind23 changed the title [Discuss] Ease debugging of Kubernetes Leader Election process [Discuss] Ease debugging of Elastic Agent providers Aug 22, 2024
@blakerouse
Copy link
Contributor

It is absolutely possible, but not something that is directly wired into the Elastic Agent currently. If we wanted to add this information to Kibana it might be better to add extra information from other providers as well. Possible that each provider could publish a status (just like components). That would also allow say the kubernetes provider in a non-kubernetes environment to say its not running as its unable to connect.
I think that also brings about the ability to configure providers in Fleet. Possible this just highlights that we should make providers a top-level thing in Fleet.

I am leaning towards updating this issue to focus on each providers and make sure they are returning the right set of informations.

This would actually be more inline with OTel as well, as each extension can also report a status. This alignment will help the transition over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

No branches or pull requests

6 participants