-
Notifications
You must be signed in to change notification settings - Fork 114
[server] Otel integration in server for HeartBeatStat Metrics #2363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
m-nagarajan
merged 5 commits into
linkedin:main
from
m-nagarajan:serverOtelIntegration1
Jan 17, 2026
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
6cb7d25
integrate otel metrics in server HeartbeatStats
m-nagarajan 5ed8b4e
enabling Otel in VeniceServerWrapper
m-nagarajan ea6748a
address comments and add tests
m-nagarajan f4a2c06
Address review comments and add more tests
m-nagarajan 97cf8ec
Remove duplication of VeniceType and VeniceRole
m-nagarajan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
53 changes: 53 additions & 0 deletions
53
clients/da-vinci-client/src/main/java/com/linkedin/davinci/stats/ServerMetricEntity.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| package com.linkedin.davinci.stats; | ||
|
|
||
| import static com.linkedin.venice.stats.dimensions.VeniceMetricsDimensions.VENICE_CLUSTER_NAME; | ||
| import static com.linkedin.venice.stats.dimensions.VeniceMetricsDimensions.VENICE_REGION_NAME; | ||
| import static com.linkedin.venice.stats.dimensions.VeniceMetricsDimensions.VENICE_REPLICA_STATE; | ||
| import static com.linkedin.venice.stats.dimensions.VeniceMetricsDimensions.VENICE_REPLICA_TYPE; | ||
| import static com.linkedin.venice.stats.dimensions.VeniceMetricsDimensions.VENICE_STORE_NAME; | ||
| import static com.linkedin.venice.stats.dimensions.VeniceMetricsDimensions.VENICE_VERSION_ROLE; | ||
| import static com.linkedin.venice.utils.Utils.setOf; | ||
|
|
||
| import com.linkedin.venice.stats.dimensions.VeniceMetricsDimensions; | ||
| import com.linkedin.venice.stats.metrics.MetricEntity; | ||
| import com.linkedin.venice.stats.metrics.MetricType; | ||
| import com.linkedin.venice.stats.metrics.MetricUnit; | ||
| import com.linkedin.venice.stats.metrics.ModuleMetricEntityInterface; | ||
| import java.util.Set; | ||
|
|
||
|
|
||
| /** | ||
| * List all metric entities for Venice server (storage node). | ||
| */ | ||
| public enum ServerMetricEntity implements ModuleMetricEntityInterface { | ||
| /** | ||
| * Heartbeat replication delay: Tracks nearline replication lag in milliseconds. | ||
| */ | ||
| INGESTION_HEARTBEAT_DELAY( | ||
| "ingestion.replication.heartbeat.delay", MetricType.HISTOGRAM, MetricUnit.MILLISECOND, | ||
| "Nearline ingestion replication lag", | ||
| setOf( | ||
| VENICE_STORE_NAME, | ||
| VENICE_CLUSTER_NAME, | ||
| VENICE_REGION_NAME, | ||
| VENICE_VERSION_ROLE, | ||
| VENICE_REPLICA_TYPE, | ||
| VENICE_REPLICA_STATE) | ||
| ); | ||
|
|
||
| private final MetricEntity metricEntity; | ||
|
|
||
| ServerMetricEntity( | ||
| String name, | ||
| MetricType metricType, | ||
| MetricUnit unit, | ||
| String description, | ||
| Set<VeniceMetricsDimensions> dimensionsList) { | ||
| this.metricEntity = new MetricEntity(name, metricType, unit, description, dimensionsList); | ||
| } | ||
|
|
||
| @Override | ||
| public MetricEntity getMetricEntity() { | ||
| return metricEntity; | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
147 changes: 147 additions & 0 deletions
147
...ient/src/main/java/com/linkedin/davinci/stats/ingestion/heartbeat/HeartbeatOtelStats.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,147 @@ | ||
| package com.linkedin.davinci.stats.ingestion.heartbeat; | ||
|
|
||
| import static com.linkedin.davinci.stats.ServerMetricEntity.INGESTION_HEARTBEAT_DELAY; | ||
| import static com.linkedin.venice.meta.Store.NON_EXISTING_VERSION; | ||
| import static com.linkedin.venice.stats.metrics.ModuleMetricEntityInterface.getUniqueMetricEntities; | ||
|
|
||
| import com.google.common.annotations.VisibleForTesting; | ||
| import com.linkedin.davinci.stats.ServerMetricEntity; | ||
| import com.linkedin.venice.stats.OpenTelemetryMetricsSetup; | ||
| import com.linkedin.venice.stats.VeniceOpenTelemetryMetricsRepository; | ||
| import com.linkedin.venice.stats.dimensions.ReplicaState; | ||
| import com.linkedin.venice.stats.dimensions.ReplicaType; | ||
| import com.linkedin.venice.stats.dimensions.VeniceMetricsDimensions; | ||
| import com.linkedin.venice.stats.dimensions.VersionRole; | ||
| import com.linkedin.venice.stats.metrics.MetricEntity; | ||
| import com.linkedin.venice.stats.metrics.MetricEntityStateThreeEnums; | ||
| import com.linkedin.venice.utils.concurrent.VeniceConcurrentHashMap; | ||
| import io.tehuti.metrics.MetricsRepository; | ||
| import java.util.Collection; | ||
| import java.util.HashMap; | ||
| import java.util.Map; | ||
|
|
||
|
|
||
| /** | ||
| * OpenTelemetry metrics for heartbeat monitoring. | ||
| * Note: Tehuti metrics are managed separately in {@link HeartbeatStatReporter}. | ||
| */ | ||
| public class HeartbeatOtelStats { | ||
| public static final Collection<MetricEntity> SERVER_METRIC_ENTITIES = | ||
| getUniqueMetricEntities(ServerMetricEntity.class); | ||
| private final boolean emitOtelMetrics; | ||
| private final VeniceOpenTelemetryMetricsRepository otelRepository; | ||
| private final Map<VeniceMetricsDimensions, String> baseDimensionsMap; | ||
|
|
||
| // Per-region metric entity states | ||
| private final Map<String, MetricEntityStateThreeEnums<VersionRole, ReplicaType, ReplicaState>> metricsByRegion; | ||
|
|
||
| private static class VersionInfo { | ||
| private final int currentVersion; | ||
| private final int futureVersion; | ||
|
|
||
| VersionInfo(int currentVersion, int futureVersion) { | ||
| this.currentVersion = currentVersion; | ||
| this.futureVersion = futureVersion; | ||
| } | ||
| } | ||
|
|
||
| private volatile VersionInfo versionInfo = new VersionInfo(NON_EXISTING_VERSION, NON_EXISTING_VERSION); | ||
|
|
||
| public HeartbeatOtelStats(MetricsRepository metricsRepository, String storeName, String clusterName) { | ||
| this.metricsByRegion = new VeniceConcurrentHashMap<>(); | ||
|
|
||
| OpenTelemetryMetricsSetup.OpenTelemetryMetricsSetupInfo otelSetup = | ||
| OpenTelemetryMetricsSetup.builder(metricsRepository) | ||
| .setStoreName(storeName) | ||
| .setClusterName(clusterName) | ||
| .build(); | ||
|
|
||
| this.emitOtelMetrics = otelSetup.emitOpenTelemetryMetrics(); | ||
| this.otelRepository = otelSetup.getOtelRepository(); | ||
| this.baseDimensionsMap = otelSetup.getBaseDimensionsMap(); | ||
| } | ||
|
|
||
| /** | ||
| * Returns true if OTel metrics are emitted. | ||
| */ | ||
| public boolean emitOtelMetrics() { | ||
| return emitOtelMetrics; | ||
| } | ||
|
|
||
| /** | ||
| * Updates the current and future version for this store. | ||
| * | ||
| * @param currentVersion The current serving version | ||
| * @param futureVersion The future/upcoming version | ||
| */ | ||
| public void updateVersionInfo(int currentVersion, int futureVersion) { | ||
| this.versionInfo = new VersionInfo(currentVersion, futureVersion); | ||
| } | ||
|
|
||
| /** | ||
| * Records a heartbeat delay with all dimensional attributes to OTel metrics. | ||
| * Returns early if OTel metrics are disabled or version is invalid. | ||
| * | ||
| * @param version The version number | ||
| * @param region The region name | ||
| * @param replicaType The replica type {@link ReplicaType} | ||
| * @param replicaState The replica state {@link ReplicaState} | ||
| * @param delayMs The delay in milliseconds | ||
| */ | ||
| public void recordHeartbeatDelayOtelMetrics( | ||
| int version, | ||
| String region, | ||
| ReplicaType replicaType, | ||
| ReplicaState replicaState, | ||
| long delayMs) { | ||
| if (!emitOtelMetrics()) { | ||
| return; | ||
| } | ||
| VersionRole versionRole = classifyVersion(version, this.versionInfo); | ||
|
|
||
| MetricEntityStateThreeEnums<VersionRole, ReplicaType, ReplicaState> metricState = getOrCreateMetricState(region); | ||
|
|
||
| // Records to OTel metrics only | ||
| metricState.record(delayMs, versionRole, replicaType, replicaState); | ||
| } | ||
|
|
||
| /** | ||
| * Gets or creates a metric entity state for a specific region. | ||
| */ | ||
| private MetricEntityStateThreeEnums<VersionRole, ReplicaType, ReplicaState> getOrCreateMetricState(String region) { | ||
| return metricsByRegion.computeIfAbsent(region, r -> { | ||
| // Add region to base dimensions | ||
| Map<VeniceMetricsDimensions, String> regionBaseDimensions = new HashMap<>(baseDimensionsMap); | ||
| regionBaseDimensions.put(VeniceMetricsDimensions.VENICE_REGION_NAME, r); | ||
|
|
||
| return MetricEntityStateThreeEnums.create( | ||
| INGESTION_HEARTBEAT_DELAY.getMetricEntity(), | ||
| otelRepository, | ||
| regionBaseDimensions, | ||
| VersionRole.class, | ||
| ReplicaType.class, | ||
| ReplicaState.class); | ||
| }); | ||
| } | ||
|
|
||
| /** | ||
| * Classifies a version as CURRENT or FUTURE or BACKUP | ||
| * | ||
| * @param version The version number to classify | ||
| * @param versionInfo The current/future version (cached) | ||
| * @return {@link VersionRole} | ||
| */ | ||
| static VersionRole classifyVersion(int version, VersionInfo versionInfo) { | ||
| if (version == versionInfo.currentVersion) { | ||
| return VersionRole.CURRENT; | ||
| } else if (version == versionInfo.futureVersion) { | ||
| return VersionRole.FUTURE; | ||
| } | ||
| return VersionRole.BACKUP; | ||
| } | ||
|
|
||
| @VisibleForTesting | ||
| public VersionInfo getVersionInfo() { | ||
| return versionInfo; | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.