-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Sampling SIG research notes #213
base: main
Are you sure you want to change the base?
Conversation
In SIG meeting Peter noted that the resulting estimates would have such enormous error if ever a span _was_ sampled at a rate of 2^-63 that this "workaround" is merely a curiosity, not something that'd be practically useful.
1. "Statistics" can be anything from [RED metrics](https://www.weave.works/blog/the-red-method-key-metrics-for-microservices-architecture/) by service, to data used to answer richer questions like "Which dimensions of trace data are correlated with higher error rate?". You want to ensure that all inferences made from the data you *do* collect are valid. | ||
2. Setting sampling error targets is akin to setting Service Level Objectives: just as one aspires to build *appropriately reliable* systems, so too one needs statistics which are *just accurate enough* to get valid insights from, but not so accurate that you excessively sacrifice goals #1 and #2. | ||
3. An example consequence of this goal being unmet: metrics derived from the trace data become spiky and unfit for purpose. | ||
4. Ensure traces are complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is a bit too strong. While we want to see many complete traces, we must not require that all traces are complete. Consider infrequently used sub-services which might not get enough representation when all sampling decisions are made at the root. BTW, poor coverage for such services is a weak point across the whole surveyed landscape today.
- limiting: Support all of both spans per second, spans per month, GB per month (approximation is ok) | ||
- degree of limiting: Soft is ok | ||
- horizontally scalable: Yes | ||
- Prioritize tail sampling in Collector over head sampling in SDK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this fits well here. It is a technical design decision. While I agree with this sentence in practice, users may even have the opposite opinion, as tail-based sampling is generally more expensive than head-based sampling.
1. Reduce or limit costs stemming from the construction and transmission of spans. | ||
2. Analytics queries are faster when searching less data. | ||
2. Respect limits of downstream storage systems. | ||
1. Trace storage systems often have data ingest limits (e.g., GBs per second, spans per second, spans per calendar month). The costs of exceeding these limits can be either reduced reliability or increased hosting expenditures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think similarly "hard" limitations apply for the tracers, collector and the network. Collecting too much data up front can lead to excessive memory usage, CPU or network saturation and can cause not only performance issues, but application malfunction as well.
- Per-stratum limiting: Partition input traces into strata, and sample such that each stratum's throughput does not exceed a threshold. | ||
- Global limiting: Sample such that total throughput doesn't exceed a threshold. | ||
|
||
Note that in addition to limiting traces per unit time, there are also use cases to support limting spans per unit time, or bytes per unit time. In such cases the limiter implementation should take care not to impart bias by systematically preferring traces comprising fewer spans, or fewer bytes, over "larger" traces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "limting"
|
||
##### TraceIdRatioBased | ||
|
||
`TraceIdRatioBased` may be used to consistently sample or drop a certain fixed percentage of spans. The decision is based on a random value, the trace ID, rather than any of the span metadata available to ShouldSample (span name, initial attributes, etc.) As a result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, since OpenTelemetry doesn't specify what hashing algorithm to use, my understanding is this statement is not fully accurate (since different language SDKs could have different approaches) - e.g. when it is used for sampling non-root spans. It may be good to clarify that current limitation here.
1. "Statistics" can be anything from [RED metrics](https://www.weave.works/blog/the-red-method-key-metrics-for-microservices-architecture/) by service, to data used to answer richer questions like "Which dimensions of trace data are correlated with higher error rate?". You want to ensure that all inferences made from the data you *do* collect are valid. | ||
2. Setting sampling error targets is akin to setting Service Level Objectives: just as one aspires to build *appropriately reliable* systems, so too one needs statistics which are *just accurate enough* to get valid insights from, but not so accurate that you excessively sacrifice goals #1 and #2. | ||
3. An example consequence of this goal being unmet: metrics derived from the trace data become spiky and unfit for purpose. | ||
4. Ensure traces are complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like "Ensure traces are as complete and consistent as possible" better express the intent here?
2. Setting sampling error targets is akin to setting Service Level Objectives: just as one aspires to build *appropriately reliable* systems, so too one needs statistics which are *just accurate enough* to get valid insights from, but not so accurate that you excessively sacrifice goals #1 and #2. | ||
3. An example consequence of this goal being unmet: metrics derived from the trace data become spiky and unfit for purpose. | ||
4. Ensure traces are complete. | ||
1. "Complete" means that all of the spans belonging to the trace are collected. For more information, see ["Trace completeness"](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.12.0/specification/trace/tracestate-probability-sampling.md#trace-completeness) in the trace spec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have there been any discussions around how sampling impacts "linked traces" and whether the consistency/completeness goals should support linked traces as well? Since many async operations are modelled as linked traces, I am trying to understand if there's a way consistent sampling can be achieved across linked traces.
|
||
Notes: | ||
|
||
- TODO(Spencer): Look at https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/telemetryquerylanguage/tql and give feedback. Split out from transformprocessor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- TODO(Spencer): Look at https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/telemetryquerylanguage/tql and give feedback. Split out from transformprocessor. | |
- TODO(Spencer): Look at https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/ottl and give feedback. Split out from transformprocessor. |
FYI I would like to merge this for the record. Thank you @spencerwilson!! |
Not sure if this is something this PR will address, but we were experimenting with a client/api that enabled folks to set their sampling rate by span attributes. Using RQL would allow =, !=, regex, like/not like statements and would give more control over the data being traced. Yes, it does require the information that's being used to determine the sampling rate at the first time a span is created. Our use case was to enable folks to increase sampling rates for a given value (or set of values) without having to make any code changes and the "sampling rules" would re-compile each time it got a change from the API (i.e. when a user makes a change). Moving forward, RQL isn't required, but flexibility could be helpful IF folks want more than = / != for comparing span attributes to something they want to sample by. |
@spencerwilson we are cleaning up stale OTEP PRs. If there is no further action at this time, we will close this PR in one week. Feel free to open it again when it is time to pick it back up. |
The scope of the OTEP (that appears in this PR in partial form) is a prerequisite to #191.
This PR needn't be merged. Its purpose is to provide a vehicle for review and iteration by the Sampling SIG. I previously distributed these documents as GitHub gists, but the SIG decided this would be superior to that. I expect that at some point this PR will have served its purpose and at that point may be closed.