COS / More alerts #517

gustavosr98 · 2024-11-27T17:45:23Z

Steps to reproduce

Follow tutorial for COS integration

Expected behavior

A few alerts I would like to see

Machine is down
Machine is up but service is down
Cluster is not writable
Cluster will not be writable if I lose one more node
Query latency goes over X ms
Number of connections is close to max connections limit

And any similar related to degradation of to prevent the system stops working as expected

Actual behavior

No errors in the alerts, just two alerts

Versions

Juju 3.5.4

data-integrator                    active      1  data-integrator           latest/stable   41  no
grafana-agent                      active      3  grafana-agent             latest/edge    299  no       tracing: off
mongodb                            active      3  mongodb                   6/stable       199  no
self-signed-certificates           active      1  self-signed-certificates  latest/stable  155  no

The text was updated successfully, but these errors were encountered:

syncronize-issues-to-jira · 2024-11-27T17:45:31Z

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-6073.

This message was autogenerated

MiaAltieri · 2024-12-09T13:20:32Z

Hi @gustavosr98 I have just opened a PR(#521) for these desired alerts. Thanks for taking the time to outline your wishlist! <3

I've gone ahead and added most of them. The ones I did not add in that PR are:

Machine is down
Machine is up but service is down
and
Query latency goes over X ms

The first two I did not add since they are already there (screenshots 1+2) the third I do not believe is possible since I don't think grafana supports alerts based on user provided input (i.e. X), if you have a specific latency that you have in mind let me know and I can implement that for you ASAP

Please note that #1 will be further improved on o11y end since they are currently undergoing work for it

Addressing #517 by adding the following requested alerts: - Cluster is not writable - Cluster will not be writable if I lose one more node - Number of connections is close to max connections limit along with a few others from the Percona alert rules ## testing - Cluster is not writable <img width="1137" alt="Screenshot 2024-12-09 at 14 22 58" src="https://github.com/user-attachments/assets/9deb7250-7701-4a9f-bdc0-ee74b5069641"> - Cluster will not be writable if I lose one more node - note this is firing because it was deployed with a single replica, when the replica set is scaled up it goes back to green <img width="1148" alt="Screenshot 2024-12-09 at 14 22 04" src="https://github.com/user-attachments/assets/50516710-97b5-4c08-a37d-37e43796bfb9"> - Number of connections is close to max connections limit (80%) <img width="1117" alt="Screenshot 2024-12-09 at 14 23 32" src="https://github.com/user-attachments/assets/14da278e-e9e7-42b6-ba69-11927f6c9b0e">

gustavosr98 added the bug Something isn't working label Nov 27, 2024

MiaAltieri mentioned this issue Dec 9, 2024

[DPE-6073] add requested alerts #521

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COS / More alerts #517

COS / More alerts #517

gustavosr98 commented Nov 27, 2024 •

edited

Loading

syncronize-issues-to-jira bot commented Nov 27, 2024

MiaAltieri commented Dec 9, 2024 •

edited

Loading

COS / More alerts #517

COS / More alerts #517

Comments

gustavosr98 commented Nov 27, 2024 • edited Loading

Steps to reproduce

Expected behavior

Actual behavior

Versions

syncronize-issues-to-jira bot commented Nov 27, 2024

MiaAltieri commented Dec 9, 2024 • edited Loading

gustavosr98 commented Nov 27, 2024 •

edited

Loading

MiaAltieri commented Dec 9, 2024 •

edited

Loading