-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[otel/kube-stack]: Add gateway collector #6444
Conversation
This pull request does not have a backport label. Could you fix it @rogercoll? 🙏
|
|
please link an issue if such exists |
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Let's get it in and iterate from there.
@elastic/elastic-agent-control-plane PR back to ready for review |
|
* feat: move telemetry aggregation and forwarding to gateway * ci: use Elastic envs in gateway * chore: add changelog entry * fix: format values file * feat: add apm loadbalancing * chore: increase resource limits * revert resource limits increase * chore: remove config warnings * docs: add Gateway collectors section * revert: enable daemonset storagechecks * rename metrics/otel pipeline and use signaltometrics * unify k8s and host metrics pipelines * use default traceID as loadbalancing routing_key * chore: reuse k8s integration test helpers * format values with Helm linter * replace loadbalancing in favor of headless otlp * Update testing/integration/otel_helm_test.go Co-authored-by: Panos Koutsovasilis <[email protected]> * Update testing/integration/otel_helm_test.go Co-authored-by: Panos Koutsovasilis <[email protected]> * rename k8s values options helper function * move process attributes remove processor to gateway * add batch processor for aggregation pipeline * enable compression for cluster otlp connections * chore: remove elastic endpoint references * fix: do not generate service's signals for non apm data * Revert "fix: do not generate service's signals for non apm data" This reverts commit ffa6620. * fix: set agent.name as edot-collector * fix: enable daemon hostNetwork * set unknown as default signaltometrics agent.name resource attribute * remove signaltometrics for metrics-only services --------- Co-authored-by: Panos Koutsovasilis <[email protected]> (cherry picked from commit daed81e)
* feat: move telemetry aggregation and forwarding to gateway * ci: use Elastic envs in gateway * chore: add changelog entry * fix: format values file * feat: add apm loadbalancing * chore: increase resource limits * revert resource limits increase * chore: remove config warnings * docs: add Gateway collectors section * revert: enable daemonset storagechecks * rename metrics/otel pipeline and use signaltometrics * unify k8s and host metrics pipelines * use default traceID as loadbalancing routing_key * chore: reuse k8s integration test helpers * format values with Helm linter * replace loadbalancing in favor of headless otlp * Update testing/integration/otel_helm_test.go Co-authored-by: Panos Koutsovasilis <[email protected]> * Update testing/integration/otel_helm_test.go Co-authored-by: Panos Koutsovasilis <[email protected]> * rename k8s values options helper function * move process attributes remove processor to gateway * add batch processor for aggregation pipeline * enable compression for cluster otlp connections * chore: remove elastic endpoint references * fix: do not generate service's signals for non apm data * Revert "fix: do not generate service's signals for non apm data" This reverts commit ffa6620. * fix: set agent.name as edot-collector * fix: enable daemon hostNetwork * set unknown as default signaltometrics agent.name resource attribute * remove signaltometrics for metrics-only services --------- Co-authored-by: Panos Koutsovasilis <[email protected]> (cherry picked from commit daed81e) Co-authored-by: Roger Coll <[email protected]>
* feat: move telemetry aggregation and forwarding to gateway * ci: use Elastic envs in gateway * chore: add changelog entry * fix: format values file * feat: add apm loadbalancing * chore: increase resource limits * revert resource limits increase * chore: remove config warnings * docs: add Gateway collectors section * revert: enable daemonset storagechecks * rename metrics/otel pipeline and use signaltometrics * unify k8s and host metrics pipelines * use default traceID as loadbalancing routing_key * chore: reuse k8s integration test helpers * format values with Helm linter * replace loadbalancing in favor of headless otlp * Update testing/integration/otel_helm_test.go Co-authored-by: Panos Koutsovasilis <[email protected]> * Update testing/integration/otel_helm_test.go Co-authored-by: Panos Koutsovasilis <[email protected]> * rename k8s values options helper function * move process attributes remove processor to gateway * add batch processor for aggregation pipeline * enable compression for cluster otlp connections * chore: remove elastic endpoint references * fix: do not generate service's signals for non apm data * Revert "fix: do not generate service's signals for non apm data" This reverts commit ffa6620. * fix: set agent.name as edot-collector * fix: enable daemon hostNetwork * set unknown as default signaltometrics agent.name resource attribute * remove signaltometrics for metrics-only services --------- Co-authored-by: Panos Koutsovasilis <[email protected]> (cherry picked from commit daed81e)
* feat: move telemetry aggregation and forwarding to gateway * ci: use Elastic envs in gateway * chore: add changelog entry * fix: format values file * feat: add apm loadbalancing * chore: increase resource limits * revert resource limits increase * chore: remove config warnings * docs: add Gateway collectors section * revert: enable daemonset storagechecks * rename metrics/otel pipeline and use signaltometrics * unify k8s and host metrics pipelines * use default traceID as loadbalancing routing_key * chore: reuse k8s integration test helpers * format values with Helm linter * replace loadbalancing in favor of headless otlp * Update testing/integration/otel_helm_test.go Co-authored-by: Panos Koutsovasilis <[email protected]> * Update testing/integration/otel_helm_test.go Co-authored-by: Panos Koutsovasilis <[email protected]> * rename k8s values options helper function * move process attributes remove processor to gateway * add batch processor for aggregation pipeline * enable compression for cluster otlp connections * chore: remove elastic endpoint references * fix: do not generate service's signals for non apm data * Revert "fix: do not generate service's signals for non apm data" This reverts commit ffa6620. * fix: set agent.name as edot-collector * fix: enable daemon hostNetwork * set unknown as default signaltometrics agent.name resource attribute * remove signaltometrics for metrics-only services --------- Co-authored-by: Panos Koutsovasilis <[email protected]> (cherry picked from commit daed81e) Co-authored-by: Roger Coll <[email protected]>
What does this PR do?
This PR adds a new K8s deployment of the EDOT collector named "gateway". The main purpose of this new deployment is to simplify the daemonset collector configuration and unify managed/self-managed scenarios. The gateway collector configuration contains all Elastic's custom Otel components needed for the signals transformations in self-managed scenarios, which are currently configured in the daemonset collector.
Elastic configured components in the "Gateway" collectors (previously in the daemonset):
Another important change is that the Gateway collectors will forward the Otel data to Elasticsearch, the daemonset and cluster collectors configurations have been updated to export all collected data to the corresponding Gateway OTLP endpoint. Although the daemonset collectors are still configured to collect the auto-instrumentation OTLP data, the data is load balanced (loadbalancing exporter) to the gateway collectors based on the service name.
Additional context: https://github.com/elastic/opentelemetry-dev/issues/587
Why is it important?
The key benefit of this architecture is to decouple data collection from data transformations (e.g. APM enrichment and aggregations), for managed scenarios, users would just need to remove (or comment) the "gateway" collector configuration. Note that moving the data processing from a k8s "daemonset" to a "deployment", it eases its horizontal scaling.
Checklist
./changelog/fragments
using the changelog toolDisruptive User Impact
How to test this PR locally
Related issues
Questions to ask yourself