Skip to content

Commit

Permalink
feat: alert for MariaDB backup failures (rackerlabs#605)
Browse files Browse the repository at this point in the history
* feat: alert for MariaDB backup failures

JIRA:OSPC-550

* doc: Add doc for MariaDB backup alert

JIRA:OSPC-550
  • Loading branch information
awfabian-rs authored Dec 4, 2024
1 parent 926241b commit c32aa38
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 0 deletions.
18 changes: 18 additions & 0 deletions base-helm-configs/prometheus/alerting_rules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -123,3 +123,21 @@ additionalPrometheusRulesMap:
annotations:
summary: OVN backup volume >= 90% disk usage
description: "OVN backup volume >= 90% disk usage"
- name: MariaDB backup alerts
rules:
- alert: mariadbBackupWarning
expr: time() - kube_cronjob_status_last_successful_time{cronjob="mariadb-backup"} > 21600
for: 1h
labels:
severity: warning
annotations:
summary: Last MariaDB backup not successful within 1 hour of scheduled run
description: "Last MariaDB backup not successful within 1 hour of scheduled run"
- alert: mariadbBackupCritical
expr: time() - kube_cronjob_status_last_successful_time{cronjob="mariadb-backup"} > 43200
for: 1h
labels:
severity: critical
annotations:
summary: Second successive MariaDB backup not successful within 1 hour of scheduled run
description: "Second successive MariaDB backup not successful within 1 hour of scheduled run"
11 changes: 11 additions & 0 deletions docs/alerting-info.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,3 +80,14 @@ The following list contains a few examples of these receivers as part of the [al
* [Microsoft Teams Receiver](alertmanager-msteams.md)

We can now take all this information and build out an alerting workflow that suits our needs!

## Genestack alerts

This section contains some information on individual Genestack alert.

### MariaDB backup alert

Based on a schedule of 6 hours by default, it allows 1 hour to upload and
alerts when MySQL doesn't successfully complete a backup.

It alerts at warning level the first time this happens, and at critical level the second time this happens.

0 comments on commit c32aa38

Please sign in to comment.