Upgrade operator, the default auto-instrumentation image in Instrumentation has not been upgraded #3468

chinaran · 2024-11-17T03:35:02Z

Component(s)

auto-instrumentation

What happened?

Description

Upgrade operator, the default auto-instrumentation image in Instrumentation has not been upgraded

Steps to Reproduce

operator v0.107.0 was deployed using helm chart, to be migrated to OLM deployment and upgraded to v0.113.0.
Also, the default-auto-instrumentation-java-image version should be upgraded.
When v0.113.0 was deployed after removing v0.107.0, the mutate webhook failed during the upgrade of Instrumentation because the service was not ready.
As a result, auto-instrumentation-java-image was not upgraded.

Expected Result

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  annotations:
    instrumentation.opentelemetry.io/default-auto-instrumentation-java-image: xxx-v2
spec:
  java:
    image: xxx-v2

Actual Result

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  annotations:
    instrumentation.opentelemetry.io/default-auto-instrumentation-java-image: xxx-v1
spec:
  java:
    image: xxx-v1

Kubernetes Version

v1.29.2

Operator version

v0.113.0

Collector version

v0.108.1

Environment information

No response

Log output

{"level":"ERROR","timestamp":"2024-11-17T03:30:27.899239101Z","logger":"instrumentation-upgrade","message":"failed to apply changes to instance","name":"xxx-java","namespace":"cpaas-system","error":"Internal error occurred: failed calling webhook \"minstrumentation.kb.io\": failed to call webhook: Post \"https://opentelemetry-operator-controller-manager-service.xxx-ns.svc:443/mutate-opentelemetry-io-v1alpha1-instrumentation?timeout=10s\": dial tcp xxx.xxx.xxx.xxx:443: connect: connection refused","stacktrace":
"github.com/open-telemetry/opentelemetry-operator/pkg/instrumentation/upgrade.(*InstrumentationUpgrade).ManagedInstances
/workspace/source/pkg/instrumentation/upgrade/upgrade.go:99
main.addDependencies.func2
/workspace/source/main.go:473
sigs.k8s.io/controller-runtime/pkg/manager.RunnableFunc.Start
/workspace/cache/gomod/sigs.k8s.io/[email protected]/pkg/manager/manager.go:307
sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1
/workspace/cache/gomod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:226"}

Additional context

Should I add retry logic to upgrade Instrumentation?

swiatekm · 2024-11-18T11:51:12Z

I don't think controller-runtime lets us specify dependencies between runnables, so it won't help us in this case. It does sound like we have a race condition at startup, where the upgraders depend on webhooks to run, but we don't guarantee that the webhooks are started first.

Maybe the solution is to just run the upgraders periodically forever? That they're only run at startup doesn't make that much sense to me. @pavolloffay @jaronoff97 wdyt?

chinaran added bug Something isn't working needs triage labels Nov 17, 2024

swiatekm removed the needs triage label Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade operator, the default auto-instrumentation image in Instrumentation has not been upgraded #3468

Upgrade operator, the default auto-instrumentation image in Instrumentation has not been upgraded #3468

chinaran commented Nov 17, 2024

swiatekm commented Nov 18, 2024

Upgrade operator, the default auto-instrumentation image in Instrumentation has not been upgraded #3468

Upgrade operator, the default auto-instrumentation image in Instrumentation has not been upgraded #3468

Comments

chinaran commented Nov 17, 2024

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Kubernetes Version

Operator version

Collector version

Environment information

Log output

Additional context

swiatekm commented Nov 18, 2024