Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose metric for log export failure #6709 #6779

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

harshitrjpt
Copy link

Ran in local, otel collector log shows the metric as following:

ScopeMetrics #1
ScopeMetrics SchemaURL: 
InstrumentationScope io.opentelemetry.sdk.logs
...
Metric #2
Descriptor:
     -> Name: logsExportFailure
     -> Description: Logs export failure in BatchLogRecordProcessor.
     -> Unit: 1
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> processorType: Str(BatchLogRecordProcessor)
StartTimestamp: 2024-10-01 19:05:12.150731 +0000 UTC
Timestamp: 2024-10-01 19:12:12.161627 +0000 UTC
Value: 3

@harshitrjpt harshitrjpt requested a review from a team as a code owner October 10, 2024 14:55
Copy link

linux-foundation-easycla bot commented Oct 10, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link

codecov bot commented Oct 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.11%. Comparing base (b927d9d) to head (9859eaf).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #6779      +/-   ##
============================================
+ Coverage     90.10%   90.11%   +0.01%     
- Complexity     6541     6542       +1     
============================================
  Files           728      728              
  Lines         19695    19703       +8     
  Branches       1935     1935              
============================================
+ Hits          17746    17756      +10     
+ Misses         1349     1347       -2     
  Partials        600      600              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -197,6 +199,12 @@ private Worker(
"The number of logs processed by the BatchLogRecordProcessor. "
+ "[dropped=true if they were dropped due to high throughput]")
.build();
logsExportFailureCounter =
meter
.counterBuilder("logsExportFailure")
Copy link
Member

@jack-berg jack-berg Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this seems like a small change, I'm reluctant to make it because there have been some attempts to standardize the SDKs' internal telemetry (e.g. OTEP#238).

The problem with continuing the pattern of these current metrics is that the structure doesn't conform to our semantic convention recommendations.

  • The unit is wrong - should probably be {export} instead of 1
  • The metric name doesn't include a namespace
  • The attributes don't have a namespace

Extending the instrumentation extends bad patterns. Fixing the bad patterns exposes our users to breaking changes, only to have more later if / when semantic conventions emerge. So we appear to be stuck. I'll bring it up at next week's java SIG to see if can reach any conclusion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jack-berg agree with you here that current pattern (existing as well as any proposed metric in future) doesn't conform to semantic recommendations like metric name having namespace, well defined units, etc.
So we are stuck between extending new instrumentations/ rectifying' the existing instrumentations with bad semantics AND the recommended ones. Do let us know how the discussions go with this. As this will be applicable in general, not just here.

@@ -197,6 +199,12 @@ private Worker(
"The number of logs processed by the BatchLogRecordProcessor. "
+ "[dropped=true if they were dropped due to high throughput]")
.build();
logsExportFailureCounter =
meter
.counterBuilder("logsExportFailure")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OTLP exporters already have dedicated metrics to track failures: https://github.com/open-telemetry/opentelemetry-java/blob/main/exporters/common/src/main/java/io/opentelemetry/exporter/internal/ExporterMetrics.java

Would these serve your needs?

Copy link
Author

@harshitrjpt harshitrjpt Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jack-berg Thanks. I think this does address the requirement. I tried finding if something already exists for exporter in general, as this is a generic need for any kind of exporter not just BatchLogExporter.
I enabled 'OTEL_EXPORTER_METRICS_ENABLED' and got this output. Let me check with the original reporter of the issue.

ScopeMetrics #2
ScopeMetrics SchemaURL:
InstrumentationScope io.opentelemetry.exporters.otlp-grpc
Metric #0
Descriptor:
     -> Name: otlp.exporter.exported
     -> Description:
     -> Unit:
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> success: Bool(false)
     -> type: Str(log)
StartTimestamp: 2024-10-14 10:06:54.40763 +0000 UTC
Timestamp: 2024-10-14 10:10:54.418291 +0000 UTC
Value: 9
Metric #1
Descriptor:
     -> Name: otlp.exporter.seen
     -> Description:
     -> Unit:
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> type: Str(log)
StartTimestamp: 2024-10-14 10:06:54.40763 +0000 UTC
Timestamp: 2024-10-14 10:10:54.418291 +0000 UTC
Value: 9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics should be enabled by default if using autoconfigure. Note if not using autoconfigure, you need to carefully order the initialization so that the configured meter provider can be passed to the OTLP exporters for spans and logs to collect internal telemetry.

I'm not sure what OTEL_EXPORTER_METRICS_ENABLED is a reference to. Its not a property that's used in this repository.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I didn't backup the entire collector logs and misinterpreted that these metrics need to be enabled. These are present by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants