Skip to content

[exporter/loadbalancer] Unable to retry to other downstream backends #42398

@nbata-paddle

Description

@nbata-paddle

Component(s)

exporter/loadbalancing

What happened?

Description

When utilising the loadbalancerexporter, the downstream OTLP exporters are able to use retry/timeout/queue mechanisms to attempt to keep data before eventually dropping. The loadbalancerexporter also allows these settings, as per the docs, but they are disabled by default.

When enabling this, I am unable to get my failing downstream backend's data to be retried in the other working downstream backend

Steps to Reproduce

Look at the config provided, and send traces to collector.

Expected Result

I'd eventually see all my traces within my local Jaeger backend

Actual Result

I see the attached logs from the collector

And this in Jaeger (I am sending 100 traces)

Image

Collector version

0.132.2

Environment information

Environment

OS: macOS Sequoia 15.6.1
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
        cors:
          allowed_origins:
            - "http://*"
            - "https://*"

  # Health check receiver
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 10s
          static_configs:
            - targets: ['0.0.0.0:8888']

processors:
  # Batch processor to efficiently send spans
  batch:
    timeout: 1s
    send_batch_size: 1024
    send_batch_max_size: 2048

exporters:
  # Export to Jaeger
  loadbalancing/otel_collector_sampling_local:
    timeout: 10s
    protocol:
      otlp:
        timeout: 3s
        retry_on_failure:
          max_elapsed_time: 10s
          max_interval: 5s
          initial_interval: 5s
        tls:
          insecure: true
    retry_on_failure:
      enabled: true
      initial_interval: 10s
      max_interval: 30s
      max_elapsed_time: 300s
    sending_queue:
      enabled: true
      num_consumers: 10
      # wait_for_result: false
      # block_on_overflow: true
      sizer: requests 
      queue_size: 1000
    resolver:
      static:
        hostnames:
          - localhost:1234
          - jaeger:4317

  # Logging exporter for debugging
  debug:

  # Prometheus metrics exporter
  prometheus:
    endpoint: "0.0.0.0:8889"

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

service:
  extensions: [health_check, pprof, zpages]

  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [loadbalancing/otel_collector_sampling_local, debug]

Log output

2025-09-01T15:53:02.742Z	info	[email protected]/otlp.go:175	Starting HTTP server	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}, "otelcol.component.id": "otlp", "otelcol.component.kind": "receiver", "endpoint": "0.0.0.0:4318"}

2025-09-01T15:53:02.742Z	info	healthcheck/handler.go:131	Health Check state change	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}, "otelcol.component.id": "health_check", "otelcol.component.kind": "extension", "status": "ready"}

2025-09-01T15:53:02.742Z	info	[email protected]/service.go:272	Everything is ready. Begin running and processing data.	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}}

2025-09-01T15:53:02.742Z	warn	[email protected]/clientconn.go:1414	[core] [Channel #3 SubChannel #4]grpc: addrConn.createTransport failed to connect to {Addr: "localhost:1234", ServerName: "localhost:1234", }. Err: connection error: desc = "transport: Error while dialing: dial tcp [::1]:1234: connect: connection refused"	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}, "grpc_log": true}

2025-09-01T15:53:03.744Z	warn	[email protected]/clientconn.go:1414	[core] [Channel #3 SubChannel #4]grpc: addrConn.createTransport failed to connect to {Addr: "localhost:1234", ServerName: "localhost:1234", }. Err: connection error: desc = "transport: Error while dialing: dial tcp [::1]:1234: connect: connection refused"	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}, "grpc_log": true}

2025-09-01T15:53:05.284Z	warn	[email protected]/clientconn.go:1414	[core] [Channel #3 SubChannel #4]grpc: addrConn.createTransport failed to connect to {Addr: "localhost:1234", ServerName: "localhost:1234", }. Err: connection error: desc = "transport: Error while dialing: dial tcp [::1]:1234: connect: connection refused"	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}, "grpc_log": true}

2025-09-01T15:53:07.408Z	warn	[email protected]/clientconn.go:1414	[core] [Channel #3 SubChannel #4]grpc: addrConn.createTransport failed to connect to {Addr: "localhost:1234", ServerName: "localhost:1234", }. Err: connection error: desc = "transport: Error while dialing: dial tcp [::1]:1234: connect: connection refused"	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}, "grpc_log": true}

2025-09-01T15:53:10.763Z	info	Traces	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}, "otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "resource spans": 3, "spans": 400}

2025-09-01T15:53:10.767Z	info	internal/retry_sender.go:133	Exporting failed. Will retry the request after interval.	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}, "otelcol.component.id": "loadbalancing/otel_collector_sampling_local", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "endpoint": "localhost:1234", "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [::1]:1234: connect: connection refused\"", "interval": "6.074990172s"}

2025-09-01T15:53:12.323Z	warn	[email protected]/clientconn.go:1414	[core] [Channel #3 SubChannel #4]grpc: addrConn.createTransport failed to connect to {Addr: "localhost:1234", ServerName: "localhost:1234", }. Err: connection error: desc = "transport: Error while dialing: dial tcp [::1]:1234: connect: connection refused"	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}, "grpc_log": true}

2025-09-01T15:53:16.845Z	error	internal/queue_sender.go:51	Exporting failed. Dropping data.	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}, "otelcol.component.id": "loadbalancing/otel_collector_sampling_local", "otelcol.component.kind": "exporter", "otelcol.signal": "traces", "endpoint": "localhost:1234", "error": "no more retries left: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [::1]:1234: connect: connection refused\"", "dropped_items": 228}

go.opentelemetry.io/collector/exporter/exporterhelper/internal.NewQueueSender.func1

	go.opentelemetry.io/collector/[email protected]/exporterhelper/internal/queue_sender.go:51

go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch.(*disabledBatcher[...]).Consume

	go.opentelemetry.io/collector/[email protected]/exporterhelper/internal/queuebatch/disabled_batcher.go:23

go.opentelemetry.io/collector/exporter/exporterhelper/internal/queue.(*asyncQueue[...]).Start.func1

	go.opentelemetry.io/collector/[email protected]/exporterhelper/internal/queue/async_queue.go:47

2025-09-01T15:53:19.487Z	warn	[email protected]/clientconn.go:1414	[core] [Channel #3 SubChannel #4]grpc: addrConn.createTransport failed to connect to {Addr: "localhost:1234", ServerName: "localhost:1234", }. Err: connection error: desc = "transport: Error while dialing: dial tcp [::1]:1234: connect: connection refused"	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}, "grpc_log": true}

2025-09-01T15:53:28.864Z	warn	[email protected]/clientconn.go:1414	[core] [Channel #3 SubChannel #4]grpc: addrConn.createTransport failed to connect to {Addr: "localhost:1234", ServerName: "localhost:1234", }. Err: connection error: desc = "transport: Error while dialing: dial tcp [::1]:1234: connect: connection refused"	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}, "grpc_log": true}

2025-09-01T15:53:46.197Z	warn	[email protected]/clientconn.go:1414	[core] [Channel #3 SubChannel #4]grpc: addrConn.createTransport failed to connect to {Addr: "localhost:1234", ServerName: "localhost:1234", }. Err: connection error: desc = "transport: Error while dialing: dial tcp [::1]:1234: connect: connection refused"	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}, "grpc_log": true}

2025-09-01T15:54:17.297Z	warn	[email protected]/clientconn.go:1414	[core] [Channel #3 SubChannel #4]grpc: addrConn.createTransport failed to connect to {Addr: "localhost:1234", ServerName: "localhost:1234", }. Err: connection error: desc = "transport: Error while dialing: dial tcp [::1]:1234: connect: connection refused"	{"resource": {"service.instance.id": "c5871bb9-a835-4d34-ad15-673dc7b1bfcc", "service.name": "otelcol-contrib", "service.version": "0.132.0"}, "grpc_log": true}

Additional context

No response

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions