[Discuss] Ease debugging of Elastic Agent providers #5324

jlind23 · 2024-08-20T14:17:27Z

While working on some kubernetes issues we were stuck trying to figure who the leaders were.
As of today, the only option is to run the following command:

kubectl get leases.coordination.k8s.io -n kube-system | grep elastic-agent

In order to ease debugging it would be great to bubble up this information in Kibana UI somewhere in order to know:

Who the leader(s) is(are).
When was the lease acquired.
..

This brought a global discussion of what are the information each providers should return and make available:

Status: Enable/Disable
...

@nimarezainia @strawgate happy to get your thoughts on this.

cc @ycombinator @blakerouse as you recently worked on similar cases.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-08-20T14:17:29Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

nimarezainia · 2024-08-21T00:40:24Z

@jlind23 in your debugging, what was the workflow? As in once you found which agent is the nominated leader, what was the next steps you were trying to perform? I'm trying to figure out where this information should reside. Perhaps it can just be part of the local metadata and not something that the UI needs to show.

jlind23 · 2024-08-21T07:39:56Z

We wanted to ensure why one of the data exclusively collected by the leader wasn't sent hence the reason why we were looking for the leader.

jlind23 · 2024-08-21T16:37:44Z

@blakerouse @ycombinator we need to report it through the Elastic Agent metadata, would you happen to know how complex this will be? Once reported we can do whatever we want with it in the UI.

strawgate · 2024-08-21T17:07:01Z

With the Helm Chart, do we actually use leader election?

jlind23 · 2024-08-21T17:14:46Z

According to @pkoutsovasilis' demo I don't think we do.

blakerouse · 2024-08-21T17:29:59Z

@jlind23 The leader election is its own provider and not something that has any connection to Fleet or updating the overall core state of the Elastic Agent. It will be difficult to connect the two, so its not a quick change.

It should be possible to see that the state_* units are only running on the Elastic Agent that has leader election is on.

pkoutsovasilis · 2024-08-21T19:23:19Z

Hi 👋 So the Helm chart for the built-in kubernetes integration and standalone mode disables leader election as it deploys multiple agents under daemonset for node-scope metrics and containers logs, deployment for cluster-scope metrics and statefulset with kube-state-metrics container alongside the agent on to monitor kube-state-metrics, thus no need for leader election. On the contrary, the same "topology" isn't possible for managed agents through Fleet, since config now is controlled by the latter, thus in that scenario the Helm chart doesn't disable it.

Thinking out loud since an agent instance knows whether it is the leader or not, and when it won the won/lost election can't this be propagated to Kibana?!

blakerouse · 2024-08-21T19:35:54Z

Hi 👋 So the Helm chart for the built-in kubernetes integration and standalone mode disables leader election as it deploys multiple agents under daemonset for node-scope metrics and containers logs, deployment for cluster-scope metrics and statefulset with kube-state-metrics container alongside the agent on to monitor kube-state-metrics, thus no need for leader election. On the contrary, the same "topology" isn't possible for managed agents through Fleet, since config now is controlled by the latter, thus in that scenario the Helm chart doesn't disable it.

It is actually possible to have the Elastic Agent deployed as a deployment with kube-state-metrics and enrolled into Fleet if that Elastic Agent was enrolled into a custom policy that only enabled state_* metrics. Another option would be to set an ENV on the container and then add a condition on the integration for that ENV, so that only the container with that ENV variable would run the state_* metrics.

Just want to make it clear that it is possible, but the current way the integration and the manifests are designed it doesn't operate that way.

This is not a limitation of the Elastic Agent, its just a limitation on how the manifests and integrations have been designed.

Thinking out loud since an agent instance knows whether it is the leader or not, and when it won the won/lost election can't this be propagated to Kibana?!

It is absolutely possible, but not something that is directly wired into the Elastic Agent currently. If we wanted to add this information to Kibana it might be better to add extra information from other providers as well. Possible that each provider could publish a status (just like components). That would also allow say the kubernetes provider in a non-kubernetes environment to say its not running as its unable to connect.

I think that also brings about the ability to configure providers in Fleet. Possible this just highlights that we should make providers a top-level thing in Fleet.

pkoutsovasilis · 2024-08-21T20:02:42Z

It is actually possible to have the Elastic Agent deployed as a deployment with kube-state-metrics and enrolled into Fleet if that Elastic Agent was enrolled into a custom policy that only enabled state_* metrics. Another option would be to set an ENV on the container and then add a condition on the integration for that ENV, so that only the container with that ENV variable would run the state_* metrics.

yep I have done such an enrollment so it is possible; however somebody can enable other metrics in the integration which might results in undesired effects and there is no way to limit that at least as far as I can tell

Just want to make it clear that it is possible, but the current way the integration and the manifests are designed it doesn't operate that way.

yep 100% agree, the reason that made us take that decision with the Helm chart (not disabling leader election for managed mode) wasn't a limitation of Agent but rather how an integration, at least as of now, gets applied holistically

It is absolutely possible, but not something that is directly wired into the Elastic Agent currently. If we wanted to add this information to Kibana it might be better to add extra information from other providers as well. Possible that each provider could publish a status (just like components). That would also allow say the kubernetes provider in a non-kubernetes environment to say its not running as its unable to connect.

I think that also brings about the ability to configure providers in Fleet. Possible this just highlights that we should make providers a top-level thing in Fleet.

yep being able to configure providers in Fleet and expose them like components with a status does sound like a good addition to explore that could be helpful

jlind23 · 2024-08-22T07:38:58Z

It is absolutely possible, but not something that is directly wired into the Elastic Agent currently. If we wanted to add this information to Kibana it might be better to add extra information from other providers as well. Possible that each provider could publish a status (just like components). That would also allow say the kubernetes provider in a non-kubernetes environment to say its not running as its unable to connect.
I think that also brings about the ability to configure providers in Fleet. Possible this just highlights that we should make providers a top-level thing in Fleet.

I am leaning towards updating this issue to focus on each providers and make sure they are returning the right set of informations.

blakerouse · 2024-08-22T13:21:46Z

It is absolutely possible, but not something that is directly wired into the Elastic Agent currently. If we wanted to add this information to Kibana it might be better to add extra information from other providers as well. Possible that each provider could publish a status (just like components). That would also allow say the kubernetes provider in a non-kubernetes environment to say its not running as its unable to connect.
I think that also brings about the ability to configure providers in Fleet. Possible this just highlights that we should make providers a top-level thing in Fleet.

I am leaning towards updating this issue to focus on each providers and make sure they are returning the right set of informations.

This would actually be more inline with OTel as well, as each extension can also report a status. This alignment will help the transition over time.

jlind23 added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Aug 20, 2024

jlind23 changed the title ~~[Discuss] Kubernetes Leader Election provider~~ [Discuss] Ease debugging of Kubernetes Leader Election process Aug 20, 2024

jlind23 changed the title ~~[Discuss] Ease debugging of Kubernetes Leader Election process~~ [Discuss] Ease debugging of Elastic Agent providers Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discuss] Ease debugging of Elastic Agent providers #5324

[Discuss] Ease debugging of Elastic Agent providers #5324

jlind23 commented Aug 20, 2024 •

edited

Loading

elasticmachine commented Aug 20, 2024

nimarezainia commented Aug 21, 2024

jlind23 commented Aug 21, 2024

jlind23 commented Aug 21, 2024

strawgate commented Aug 21, 2024

jlind23 commented Aug 21, 2024

blakerouse commented Aug 21, 2024

pkoutsovasilis commented Aug 21, 2024

blakerouse commented Aug 21, 2024 •

edited

Loading

pkoutsovasilis commented Aug 21, 2024

jlind23 commented Aug 22, 2024

blakerouse commented Aug 22, 2024

[Discuss] Ease debugging of Elastic Agent providers #5324

[Discuss] Ease debugging of Elastic Agent providers #5324

Comments

jlind23 commented Aug 20, 2024 • edited Loading

elasticmachine commented Aug 20, 2024

nimarezainia commented Aug 21, 2024

jlind23 commented Aug 21, 2024

jlind23 commented Aug 21, 2024

strawgate commented Aug 21, 2024

jlind23 commented Aug 21, 2024

blakerouse commented Aug 21, 2024

pkoutsovasilis commented Aug 21, 2024

blakerouse commented Aug 21, 2024 • edited Loading

pkoutsovasilis commented Aug 21, 2024

jlind23 commented Aug 22, 2024

blakerouse commented Aug 22, 2024

jlind23 commented Aug 20, 2024 •

edited

Loading

blakerouse commented Aug 21, 2024 •

edited

Loading