-
Notifications
You must be signed in to change notification settings - Fork 858
Description
Checklist
- I've searched for similar issues and couldn't find anything matching
- I've discussed this feature request in the K8sGPT Slack and got positive feedback
Is this feature request related to a problem?
No
Problem Description
When monitoring my Kubernetes clusters, I often receive alerts due to a pod being in a crash loop or other issues for example. Investigating these alerts requires repetitive tasks such as log analysis, pod description, etc. K8sgpt allows me to anticipate some alerts by continuously analyzing my objects, but could we do even better?
Solution Description
Who hasn't dreamed of having a tool that can perform an initial investigation into one or more alerts during oncall rotation ? The solution I propose is to leverage OpenAI's assistant system (or a similar pattern) to transition from a proactive mode to a reactive mode specific to an alert. The architecture would be as follows:
- AlertManager sends one or more alerts to the K8SGPT operator via http.
- K8SGPT uses the OpenAI assistant system to determine which functions (or executors) it needs to execute to doing error analysis.
- With the results of the functions (or executors), we conduct an analysis and provide the user with an initial investigation of the error.
Benefits
The benefits would be multiple:
- Improved on-call process: Alerts received at night (when we are not operational at 100%) are automatically investigated.
- Reduction of repetitive tasks and the possibility of errors (accidentally running a command in the middle of the night).
- Automatic initiation and drafting of a post-mortem based on the analysis.
Potential Drawbacks
The drawbacks could be the compatibility of the solution with all AI systems supported by K8SGPT. Given that the concept of OpenAI's assistant has just been introduced, it needs to be verified whether this concept will become a standard in the future. Otherwise, the system may need to be "complexified" to use a pattern router. Another solution could be to offer this functionality exclusively to OpenAI users for the time being.
Additional Information
I have already tested this solution, and it has provided significant value for simple alerts. If the idea seems promising, the next step would be to test it on more complex alerts.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status