Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Deep dive alerts #795

Open
2 tasks done
matthisholleville opened this issue Dec 19, 2023 · 1 comment · May be fixed by k8sgpt-ai/schemas#21
Open
2 tasks done

[Feature]: Deep dive alerts #795

matthisholleville opened this issue Dec 19, 2023 · 1 comment · May be fixed by k8sgpt-ai/schemas#21
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@matthisholleville
Copy link
Contributor

Checklist

  • I've searched for similar issues and couldn't find anything matching
  • I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

No

Problem Description

When monitoring my Kubernetes clusters, I often receive alerts due to a pod being in a crash loop or other issues for example. Investigating these alerts requires repetitive tasks such as log analysis, pod description, etc. K8sgpt allows me to anticipate some alerts by continuously analyzing my objects, but could we do even better?

Solution Description

Who hasn't dreamed of having a tool that can perform an initial investigation into one or more alerts during oncall rotation ? The solution I propose is to leverage OpenAI's assistant system (or a similar pattern) to transition from a proactive mode to a reactive mode specific to an alert. The architecture would be as follows:

  1. AlertManager sends one or more alerts to the K8SGPT operator via http.
  2. K8SGPT uses the OpenAI assistant system to determine which functions (or executors) it needs to execute to doing error analysis.
  3. With the results of the functions (or executors), we conduct an analysis and provide the user with an initial investigation of the error.

Benefits

The benefits would be multiple:

  1. Improved on-call process: Alerts received at night (when we are not operational at 100%) are automatically investigated.
  2. Reduction of repetitive tasks and the possibility of errors (accidentally running a command in the middle of the night).
  3. Automatic initiation and drafting of a post-mortem based on the analysis.

Potential Drawbacks

The drawbacks could be the compatibility of the solution with all AI systems supported by K8SGPT. Given that the concept of OpenAI's assistant has just been introduced, it needs to be verified whether this concept will become a standard in the future. Otherwise, the system may need to be "complexified" to use a pattern router. Another solution could be to offer this functionality exclusively to OpenAI users for the time being.

Additional Information

I have already tested this solution, and it has provided significant value for simple alerts. If the idea seems promising, the next step would be to test it on more complex alerts.
demo-alert

@github-project-automation github-project-automation bot moved this to Todo in Backlog Dec 19, 2023
@AlexsJones AlexsJones added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Dec 29, 2023
@AlexsJones
Copy link
Member

This is a really cool idea. I suppose my worry is putting too much dependency on OpenAI.
Could we feature gate this and have a generic implementation for when other backends support it? Perhaps an optional set of backend commands?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
Status: Proposed
2 participants