Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(collector): actually enable the deployment of PrometheusRule when using PodMonitor #1416

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

crutonjohn
Copy link

@crutonjohn crutonjohn commented Nov 16, 2024

When fixing this logic regressions were introduced that would prevent the PrometheusRule from rendering in some very unique monitoring scenarios.

I have checked this with as many permutations that I can manually.

If a user wants to render the PrometheusRule, they should be able to so long as they have enabled ServiceMonitor or PodMonitor, and in the case of the ServiceMonitor they have set Values.service.enabled == true.

closes #1379

…PodMonitor

PodMonitor was not being deployed because this logic was not in place.

open-telemetry#1379
…loyed

this fixes a regression introduced by myself in my last commit

open-telemetry#1379
this fixes a regression i introduced related to rendering the PrometheusRule conditions. it should
now render in a wide array of scenarios.

open-telemetry#1379
@crutonjohn
Copy link
Author

i didn't realize at first, but i think i introduced a regression initially in #1415. i think testing should be expanded to validate that the output matches the input of values for these specific scenarios, but is somewhat obfuscated by the helper here which can obfuscate user input a little, but i understand why it exists.

@@ -1,4 +1,4 @@
{{- if and .Values.prometheusRule.enabled .Values.serviceMonitor.enabled }}
{{- if and .Values.prometheusRule.enabled (or (eq (include "opentelemetry-collector.serviceEnabled" .) "true") (or .Values.serviceMonitor.enabled .Values.podMonitor.enabled)) }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user wants to render the PrometheusRule, they should be able to so long as they have enabled ServiceMonitor or PodMonitor, and in the case of the ServiceMonitor they have set Values.service.enabled == true

@crutonjohn this sentence reads as if the condition should be

if (or .Values.podMonitor.enabled (and .Values.serviceMonitor.enabled (eq (include "opentelemetry-collector.serviceEnabled" .) "true")))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in hindsight i probably shouldn't have tried to gotmpl that late in the day 😅

i think i mostly have it figured out:

Suggested change
{{- if and .Values.prometheusRule.enabled (or (eq (include "opentelemetry-collector.serviceEnabled" .) "true") (or .Values.serviceMonitor.enabled .Values.podMonitor.enabled)) }}
{{- if and .Values.prometheusRule.enabled (or .Values.podMonitor.enabled (and (eq (include "opentelemetry-collector.serviceEnabled" .) "true") .Values.serviceMonitor.enabled)) }}

however, with this logic a user could use these values:

mode: "deployment"
image:
  repository: "otel/opentelemetry-collector-k8s"
prometheusRule:
  enabled: true
ports:
  metrics:
    enabled: true
podMonitor:
  enabled: false
serviceMonitor:
  enabled: false
service:
  enabled: false

and believe they are going to get a PrometheusRule, but instead get no PrometheusRule/ServiceMonitor/PodMonitor because neither service nor pod monitors are enabled.

would it be considered acceptable UX to prevent the user from getting a PrometheusRule CR when they explicitly say they want one? or should we remain somewhat opinionated in that you should only be deploying a PrometheusRule CR when you have a Pod/ServiceMonitor getting deployed in tandem with said PrometheusRule?

happy to accommodate either way, i just don't know what's best for the project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm thats a good point.

Is there any harm in creating a prometheusRule even if pod/service monitor don't exist? What will happen if something tries to use the Prometheus rule but the monitors dont exist?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd have to do some verification, but iirc a PrometheusRule is just what it sounds like. it would effectively create the rule(s) contained within the file. there is some nuance with how prometheus operator is deployed that can affect how the PrometheusRule can "land" on the cluster...for example if you are running multiple prometheus instances in the same cluster scraping different targets. the principle is still the same.

the rules in the collector chart are generic enough (read: have low granularity due to promql label selector omission) that there could be a scenario wherein a metric like otelcol_exporter_send_failed_log_records exists in the TSDB from a source that isn't our deployed collector (or the user has this chart deployed multiple times for different situations), users could hypothetically be getting alerts for something unrelated to our collector deployment.

this would be kind of a "perfect storm" type scenario to be fair, but definitely possible. i've seen some pretty funky prometheus operator setups before. i think the crux of the issue is that there are tons of potential deployment scenarios that we couldn't possibly account for every one with the logic within the chart. part of the responsibility is on the end user being familiar with what the chart's behavior enough to know "hey i don't need to enable the PrometheusRule if i'm not going to be using the Service/PodMonitor". could add some docs somewhere, idk where the best place would be tho.

sorry, not trying to be a headache.

@@ -1,6 +1,6 @@
apiVersion: v2
name: opentelemetry-collector
version: 0.109.0
version: 0.109.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run make generate-examples CHARTS=opentelemetry-collector to generate the examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[collector] PrometheusRule is not created when using PodMonitor
2 participants