-
Notifications
You must be signed in to change notification settings - Fork 114
[server] Otel integration in server for HeartBeatStat Metrics #2363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[server] Otel integration in server for HeartBeatStat Metrics #2363
Conversation
...nt/src/main/java/com/linkedin/davinci/stats/ingestion/heartbeat/HeartbeatVersionedStats.java
Outdated
Show resolved
Hide resolved
...-client/src/main/java/com/linkedin/davinci/stats/ingestion/heartbeat/HeartbeatOtelStats.java
Outdated
Show resolved
Hide resolved
sixpluszero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good, thank you for putting up this PR. Left some comments for clarifications..
...nal/venice-client-common/src/main/java/com/linkedin/venice/stats/dimensions/VersionType.java
Outdated
Show resolved
Hide resolved
clients/da-vinci-client/src/main/java/com/linkedin/davinci/stats/ServerMetricEntity.java
Show resolved
Hide resolved
...nt/src/main/java/com/linkedin/davinci/stats/ingestion/heartbeat/HeartbeatVersionedStats.java
Show resolved
Hide resolved
...venice-client-common/src/test/java/com/linkedin/venice/stats/dimensions/ReplicaTypeTest.java
Show resolved
Hide resolved
|
Thanks @m-nagarajan for addressing my comments and it LGTM! |
sixpluszero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm! thank you for the change.
only one small nit about sharing the enum, I feel like if it is a common one, maybe it should not in the stat package.
Problem Statement
Otel integration in server for HeartBeatStat Metrics
Solution
Current tehuti integration: using heart beat metrics as an example. There are 2 layers
Layer 1: metrics in
HeartbeatStatclass of different types (Rate(),Min(),Max(), etc) are created used local metrics repository, thus can't be exported. These are created for each version availableLayer 2: metrics in
HeartbeatStatReporterclass, all of which areAsyncGauge()which reads from the metrics in layer 1 for the current/future versions. These gets exported.This brings in a complexity where 2 sets of metrics needs to be maintained and the 2nd layer should be always a Gauge leading to the metric name having _min/_min/_avg suffixes to note what its measuring, thus limiting what could be exported. Moving to OpenTelemetry, I want to remove this complexity and make it just a single layer and use a dimension to denote whether it belongs to current/future/backup versions and thus could export histograms directly. The new metrics are defined and directly recorded at
HeartbeatVersionedStatsclass where the previous layer 1 metrics are recorded.New Otel metric:
venice.server.ingestion.replication.heartbeat.delay(Histogram)New Dimensions:
venice.replica.type: leader/followervenice.replica.state: ready_to_serve/catching_upvenice.version.role: current/future/backupCode changes
Concurrency-Specific Checks
Both reviewer and PR author to verify
synchronized,RWLock) are used where needed.ConcurrentHashMap,CopyOnWriteArrayList).How was this PR tested?
Does this PR introduce any user-facing or breaking changes?