Skip to content

Conversation

@janboll
Copy link
Collaborator

@janboll janboll commented Apr 2, 2025

ARO-15912

What

Adding call to promtool

Adding support for service owned alerts, where alerts folder and files are stored in service directory:

  • cluster-service/alerts
  • backend/alerts
    ....
    By putting alerts into service folders we can easily grant code ownership to these files.

These folders will be referenced by a configuration file in observability/alerts:

rulesFolders:
 - cluster-service/alerts
 - backend/alerts
untestedRules:
additionalRules:
 - kubernetesControlPlane-prometheusRule.yaml
outputFile:
 - dev-infrastructure/obsevability/generated-prometheus-rules.bicep

All files in rulesFolders must have tests
Files referenced in additionalRulesuntestedRules are exemptions and don't require a test

Why

Want to have strict testing for prometheus alerts
Need to support more than one alert file

if err := yaml.Unmarshal(rawRules, &o.rules); err != nil {
return fmt.Errorf("failed to parse input rules: %v", err)
}
o.output, err = os.Create(o.outputBicep)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this move out of complete? The point of this pattern is to have all necessary data read in and created here so testing past this step is easy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my point was, why create the output if the test could fail?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, but losing out on having options as useful input to execution is a shame.

@janboll janboll marked this pull request as draft April 3, 2025 06:37
@janboll janboll changed the title Add promtool rule unit testing step Promtool-rules change configuration and add rule unit testing Apr 3, 2025
@janboll janboll force-pushed the add-promtool-testing branch 5 times, most recently from 2acc83d to cacd555 Compare April 10, 2025 10:43
@janboll janboll marked this pull request as ready for review April 10, 2025 10:43
@janboll janboll force-pushed the add-promtool-testing branch 7 times, most recently from 72911bb to a7ef1d7 Compare April 10, 2025 11:59
tony-schndr
tony-schndr previously approved these changes Apr 10, 2025
Copy link
Collaborator

@tony-schndr tony-schndr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is great! Some improvements/nits for a follow up if you wish:

When testing I don't see any output on what tests were ran, it would be nice to see a format looking something like:
🟢 cluster-service/alert/InstancesDownV1 passed
❌ cluster-service/alert/InstancesDownV1 failed

I saw output only when it failed.

I also tried to generate the alerts in bicep, I updated the name InstancesDownV1 to InstancesDownV2 and I didn't see the update in the generated bicep module. Not sure if I was doing something wrong.

Maybe you're already working on it, but could we also get some documentation on how this is used?

@github-actions
Copy link

Please rebase pull request.

@janboll
Copy link
Collaborator Author

janboll commented Apr 11, 2025

Great idea with the output. Let me take another look at this.

I planned on documenting it once it's set. But I can add some docs in this PR

@janboll
Copy link
Collaborator Author

janboll commented Apr 11, 2025

@tony-schndr that is a bit complicated, since promtool does not provide this kind of output. I think it's not worth the effort, since the we are simply calling promtool.

Give this test file my plan would be:

rule_files:
- testing-prometheusRule.yaml
evaluation_interval: 1m
tests:
- interval: 1m
  input_series:
  - series: 'up{job="app", instance="app-1:2223"}'
    # 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    values: "0x14"
  - series: 'up{job="app", instance="app-2:2223"}'
    # 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
    values: "1x4 0x9 1x4"
  alert_rule_test:
  - eval_time: 4m
    alertname: InstancesDownV1
  - eval_time: 5m
    alertname: InstancesDownV1
    exp_alerts:
    - exp_labels:
        severity: critical
      exp_annotations:
        summary: "All instances of the App are down"
        description: "All instances of the App are down"
  - eval_time: 15m
    alertname: InstancesDownV1

for each alert_rule_test create a testfile and run promtool test on it.

@janboll
Copy link
Collaborator Author

janboll commented Apr 11, 2025

@tony-schndr regarding the missing update, did you run make alerts in the observability folder?

janboll added 5 commits April 11, 2025 13:31
Allows putting alerts into one folder.
Propably want to change this to support multiple folders.
Alternatively add a configuration struct to support multiple input
folder configuration.
Call make alerts and ensure no uncommited files exists
@janboll janboll force-pushed the add-promtool-testing branch from f476d81 to 77511b9 Compare April 11, 2025 13:32
@janboll janboll requested a review from tony-schndr April 11, 2025 13:33
@janboll janboll merged commit 96dea03 into main Apr 14, 2025
28 checks passed
@janboll janboll deleted the add-promtool-testing branch April 14, 2025 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants