rfc: Opinionated OpenTelemetry Operator Sampling CR #3297

frzifus · 2024-09-20T13:27:59Z

Description:

Link to tracking Issue(s):

Belongs to: Tracking issues apis/v1alpha1 Sampling CR #3279

Testing:

Documentation:
https://github.com/frzifus/opentelemetry-operator/blob/rfc/sampling/docs/rfcs/opinionated_sampling_cr.md

Signed-off-by: Benedikt Bongartz <[email protected]>

iblancasa · 2024-09-23T10:27:59Z

docs/rfcs/opinionated_sampling_cr.md

+  components:
+    loadbalancer:
+      # Defines if the component is managed by the operator
+      managementState: managed


Why is it needed to set this field at the component level instead of just doing it at the full object level?

Actually Ive no strong opinion on that.

jaronoff97

A few thoughts. Overall, I think i'm unclear how the CR solves the goals / use cases. It feels like this is very cluster admin focused but doesn't give users many options for customization. What if we used some annotations on the pods to determine something about the sampling configuration? Also, how would a user opt into using the created sampler architecture?

I think I would prefer a design that's a bit more composable for cluster admins / automatic for users. I.e.

Cluster admins:

determine which services should be instrumented via a label selector (maybe this is in its own CR)
define a SamplerPolicy CR
- I like the components section, but I want this CR to be more flexible in terms of architecture based on sampler type
  - it would also be nice of users could bring their own collectors in the mix.
- I also want support for sampling at the SDK level in here, as IMO that's the biggest gains in terms of efficiency

Users:

Optionally add an otel-instrumentation specific label to get instrumentation if the cluster admin requires on
Adds annotations to override the instrumentation policy within their pod template. The cluster admin should be able to define the limits for these (something like a valid range for sample % or max cardinality, etc.)

LMK if you want to talk through this more :) I like where this is going though.

jaronoff97 · 2024-10-08T14:00:33Z

docs/rfcs/opinionated_sampling_cr.md

+## Goals and non-goals
+
+**Goals**
+- Provide an opinionated CR to simply sampling configuration in distributed environments


Suggested change

- Provide an opinionated CR to simply sampling configuration in distributed environments

- Provide an opinionated CR to simplify sampling configuration in distributed environments

jaronoff97 · 2024-10-08T14:00:58Z

docs/rfcs/opinionated_sampling_cr.md

+
+### CASE 3
+
+As a user I want to be able to filter relevant data without much specific open telemetry knowledge. 


Suggested change

As a user I want to be able to filter relevant data without much specific open telemetry knowledge.

As a user I want to be able to filter relevant data without much specific OpenTelemetry knowledge.

jaronoff97 · 2024-10-08T14:02:06Z

docs/rfcs/opinionated_sampling_cr.md

+
+![sampling arch](./images/arch_sampling_crd.png)
+
+This custom resource creates an environment that allows us to apply e.g. tailbased sampling in a distributed environment. The operator takes care of creating an optional otel LB service and sampling instances similar to the figure shown above.


Is there a reason we prefer tail sampling in the collector versus a global sampling policy that we apply to instrumentations? I guess not everything will be otel instrumented...

jaronoff97 · 2024-10-08T14:04:19Z

docs/rfcs/opinionated_sampling_cr.md

+
+LB instances will be pre-configured to distribute traces based on a given routing key like the traceID to the sampler instances.
+
+A policy is used to define which telemetry data should be sampled. Available policies can be found in the [tailbased sampling description](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor).


i also want us to be able to support the probabilistic sampler which requires no sticky load balancing configuration https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/probabilisticsamplerprocessor. How would this design work with that?

jaronoff97 · 2024-10-08T14:04:59Z

docs/rfcs/opinionated_sampling_cr.md

+1. Introduction of the CRD in v1alpha1.
+2. First controller implementation.
+3. Implementation of e2e tests.
+4. CRD becomes part of the operator bundle.


Let's adjust this to include some more iteration and design changes prior to being in the main bundle.

jaronoff97 · 2024-10-08T14:05:41Z

docs/rfcs/opinionated_sampling_cr.md

+
+**Goals**
+- Provide an opinionated CR to simply sampling configuration in distributed environments
+- Allow managing access using RBAC to different parts of the collector configuration


It's not clear to me how the design solves this goal.

frzifus requested a review from a team as a code owner September 20, 2024 13:28

frzifus added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Sep 20, 2024

frzifus force-pushed the rfc/sampling branch from edaf054 to 2e2acce Compare September 20, 2024 14:10

rfc: Opinionated OpenTelemetry Operator Sampling CR

abd2c76

Signed-off-by: Benedikt Bongartz <[email protected]>

frzifus force-pushed the rfc/sampling branch from 2e2acce to abd2c76 Compare September 20, 2024 14:13

iblancasa reviewed Sep 23, 2024

View reviewed changes

frzifus added discuss-at-sig This issue or PR should be discussed at the next SIG meeting and removed discuss-at-sig This issue or PR should be discussed at the next SIG meeting labels Sep 26, 2024

jaronoff97 reviewed Oct 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc: Opinionated OpenTelemetry Operator Sampling CR #3297

rfc: Opinionated OpenTelemetry Operator Sampling CR #3297

frzifus commented Sep 20, 2024 •

edited

Loading

iblancasa Sep 23, 2024

frzifus Sep 23, 2024

jaronoff97 left a comment

jaronoff97 Oct 8, 2024

jaronoff97 Oct 8, 2024

jaronoff97 Oct 8, 2024

jaronoff97 Oct 8, 2024

jaronoff97 Oct 8, 2024

jaronoff97 Oct 8, 2024

	- Provide an opinionated CR to simply sampling configuration in distributed environments
	- Provide an opinionated CR to simplify sampling configuration in distributed environments


		### CASE 3

		As a user I want to be able to filter relevant data without much specific open telemetry knowledge.


		![sampling arch](./images/arch_sampling_crd.png)

		This custom resource creates an environment that allows us to apply e.g. tailbased sampling in a distributed environment. The operator takes care of creating an optional otel LB service and sampling instances similar to the figure shown above.


		LB instances will be pre-configured to distribute traces based on a given routing key like the traceID to the sampler instances.

		A policy is used to define which telemetry data should be sampled. Available policies can be found in the [tailbased sampling description](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor).

rfc: Opinionated OpenTelemetry Operator Sampling CR #3297

Are you sure you want to change the base?

rfc: Opinionated OpenTelemetry Operator Sampling CR #3297

Conversation

frzifus commented Sep 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaronoff97 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frzifus commented Sep 20, 2024 •

edited

Loading