feat(metrics): add Git sync observability metrics to feature flag backend #4673

rohitnarayan · 2025-09-01T06:53:15Z

This change enhances Flipt's Git-based feature flag backend observability by adding detailed synchronization metrics. Currently, failures during Git sync are only logged without metric visibility, limiting proactive monitoring and alerting capabilities.

- Introduce new OpenTelemetry metrics for Git sync operations:
  - Last sync time as an observable gauge (timestamp).
  - Sync duration histogram.
  - Counters for number of flags fetched.
  - Success and failure counts with failure reason attributes.

- Instrument the `SnapshotStore.update` method, the core sync loop, to record these metrics accurately on every sync attempt, including partial failures and cleanups.

- Extend the `Snapshot` type with `TotalFlagsCount()` to count all flags across namespaces for metric reporting.

- Integrate metrics initialization in app startup ensuring consistent telemetry setup.

- Improve test coverage by suggesting strategies to verify metric emission and sync behavior.

These metric additions enable operators to monitor Git sync health, detect failures promptly, and troubleshoot issues efficiently, significantly improving runtime observability and system reliability

…lity by adding detailed synchronization metrics. Currently, failures during Git sync are only logged without metric visibility, limiting proactive monitoring and alerting capabilities. - Introduce new OpenTelemetry metrics for Git sync operations: - Last sync time as an observable gauge (timestamp). - Sync duration histogram. - Counters for number of flags fetched. - Success and failure counts with failure reason attributes. - Instrument the `SnapshotStore.update` method, the core sync loop, to record these metrics accurately on every sync attempt, including partial failures and cleanups. - Extend the `Snapshot` type with `TotalFlagsCount()` to count all flags across namespaces for metric reporting. - Integrate metrics initialization in app startup ensuring consistent telemetry setup. - Improve test coverage by suggesting strategies to verify metric emission and sync behavior. These metric additions enable operators to monitor Git sync health, detect failures promptly, and troubleshoot issues efficiently, significantly improving runtime observability and system reliability. Signed-off-by: Rohit Jaiswal <[email protected]>

Signed-off-by: Rohit Jaiswal <[email protected]>

markphelps · 2025-09-01T22:33:54Z

internal/metrics/metrics.go

 func init() {
 	if otel.GetMeterProvider() == nil {
 		otel.SetMeterProvider(metricnoop.NewMeterProvider())
 	}
 }

+func InitMetrics() {


i dont think these metrics should live here. we already have metrics for git in internal/storage/git/metrics.go. could we not add the necessary metrics there instead? that way they dont need to be exported and we dont need the Init

Oh sorry I didn't realize this was for v1 not v2 😬

@markphelps yeah, these changes are for v1.

@markphelps I notice that the basic workflows like unit tests, lint, etc require approvals from maintainer to run. Could we make them run automatically on each branch push? It would be helpful to speed up the development.

@rohitnarayan internal/cache/metrics.go is a good example in v1.

You could run linters and tests locally with mage. Please run mage -l to see all available tasks.

I do still think these git specific metrics should be moved to the git package (https://github.com/flipt-io/flipt/tree/main/internal/storage/fs/git)

I notice that the basic workflows like unit tests, lint, etc require approvals from maintainer to run. Could we make them run automatically on each branch push? It would be helpful to speed up the development.

@rohitnarayan this is a standard practice in most open source projects on GitHub for first contributors. After your first PR is merged I dont think it will require approvals from maintainers to run the workflows

Signed-off-by: Rohit Jaiswal <[email protected]>

markphelps · 2025-09-02T14:16:32Z

cmd/flipt/main.go

@@ -105,6 +106,8 @@ func exec() error {
 					return err
 				}

+				metrics.InitMetrics()


i would prefer to not do this init here and just use the regular package level init in the git metrics package

markphelps · 2025-09-02T14:17:33Z

internal/metrics/metrics.go

 func init() {
 	if otel.GetMeterProvider() == nil {
 		otel.SetMeterProvider(metricnoop.NewMeterProvider())
 	}
 }

+func InitMetrics() {


I do still think these git specific metrics should be moved to the git package (https://github.com/flipt-io/flipt/tree/main/internal/storage/fs/git)

markphelps · 2025-09-02T14:19:35Z

internal/metrics/metrics.go

 func init() {
 	if otel.GetMeterProvider() == nil {
 		otel.SetMeterProvider(metricnoop.NewMeterProvider())
 	}
 }

+func InitMetrics() {


I notice that the basic workflows like unit tests, lint, etc require approvals from maintainer to run. Could we make them run automatically on each branch push? It would be helpful to speed up the development.

@rohitnarayan this is a standard practice in most open source projects on GitHub for first contributors. After your first PR is merged I dont think it will require approvals from maintainers to run the workflows

Signed-off-by: Rohit Jaiswal <[email protected]>

rohitnarayan · 2025-09-03T03:09:02Z

@markphelps @erka I've addressed comments from both of you. Please review when you get time. Thank you!

Copilot

Pull Request Overview

This PR adds comprehensive Git synchronization observability metrics to Flipt's feature flag backend, enabling better monitoring and alerting for Git sync operations. Previously, sync failures were only logged without metric visibility.

Introduces OpenTelemetry metrics including sync duration histograms, flag count counters, success/failure rates, and last sync timestamp gauge
Instruments the core SnapshotStore.update method to emit metrics on every sync attempt
Adds TotalFlagsCount() method to the Snapshot type for accurate flag counting across namespaces

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
internal/storage/fs/git/metrics.go	New metrics definitions and observation functions for Git sync operations
internal/storage/fs/git/metrics_test.go	Test coverage for all metric observation functions
internal/storage/fs/git/store.go	Instrumentation of sync operations with timing, flag counting, and error tracking
internal/storage/fs/snapshot.go	Addition of TotalFlagsCount method for cross-namespace flag counting
internal/storage/fs/snapshot_test.go	Test cases for the new TotalFlagsCount functionality
internal/metrics/metrics.go	Export of Meter function for external metric creation
DEVELOPMENT.md	Minor formatting improvement to numbered list

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-03T13:02:32Z

internal/storage/fs/git/metrics.go

+			lastSyncTimeMu.RLock()
+			value := lastSyncTimeValue
+			lastSyncTimeMu.RUnlock()
+			observer.ObserveInt64(LastTime, value/1e9)


Magic number 1e9 should be replaced with a named constant like nanosPerSecond = 1e9 to improve code readability and maintainability.

Copilot · 2025-09-03T13:02:33Z

internal/storage/fs/git/store.go

+	duration := time.Since(syncStart).Seconds()
+
+	if len(errs) > 0 {
+		ObserveSync(ctx, duration, flagsFetched, false, syncType, fmt.Sprintf("%v", errs))


Using fmt.Sprintf(\"%v\", errs) creates unclear error messages. Consider using errors.Join(errs...).Error() or a more descriptive error formatting approach that provides better insight into the failure reasons.

Suggested change

ObserveSync(ctx, duration, flagsFetched, false, syncType, fmt.Sprintf("%v", errs))

ObserveSync(ctx, duration, flagsFetched, false, syncType, errors.Join(errs...).Error())

Those errors probably shouldn’t be included in metrics at all.

Removed them.

Copilot · 2025-09-03T13:02:33Z

internal/storage/fs/git/store.go

+	if len(errs) > 0 {
+		ObserveSync(ctx, duration, flagsFetched, false, syncType, fmt.Sprintf("%v", errs))
+		return true, errors.Join(errs...)
 	}
-	return true, errors.Join(errs...)
+
+	ObserveSync(ctx, duration, flagsFetched, true, syncType, "")


The success metrics call on line 414 is unreachable code because the function returns on line 411 when errors exist. This means successful syncs after error recovery won't be properly recorded. Move the success case before the error handling block.

Suggested change

if len(errs) > 0 {

ObserveSync(ctx, duration, flagsFetched, false, syncType, fmt.Sprintf("%v", errs))

return true, errors.Join(errs...)

}

return true, errors.Join(errs...)

ObserveSync(ctx, duration, flagsFetched, true, syncType, "")

ObserveSync(ctx, duration, flagsFetched, true, syncType, "")

if len(errs) > 0 {

ObserveSync(ctx, duration, flagsFetched, false, syncType, fmt.Sprintf("%v", errs))

return true, errors.Join(errs...)

}

erka · 2025-09-03T12:34:54Z

internal/storage/fs/git/store.go

@@ -294,6 +302,10 @@ func (s *SnapshotStore) View(ctx context.Context, storeRef storage.Reference, fn
 	return fn(snap)
 }

+func (s *SnapshotStore) Resolve(ref string) (plumbing.Hash, error) {


Please delete it as it isn't in use

erka · 2025-09-03T13:43:32Z

internal/storage/fs/git/store.go

+	duration := time.Since(syncStart).Seconds()
+
+	if len(errs) > 0 {
+		ObserveSync(ctx, duration, flagsFetched, false, syncType, fmt.Sprintf("%v", errs))


Those errors probably shouldn’t be included in metrics at all.

codecov · 2025-09-03T14:02:00Z

Codecov Report

❌ Patch coverage is 70.00000% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.83%. Comparing base (94735fc) to head (b8039c7).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
internal/storage/fs/git/store.go	57.14%	7 Missing and 2 partials ⚠️
internal/metrics/metrics.go	40.00%	4 Missing and 2 partials ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #4673   +/-   ##
=======================================
  Coverage   63.83%   63.83%           
=======================================
  Files         171      172    +1     
  Lines       17617    17659   +42     
=======================================
+ Hits        11245    11273   +28     
- Misses       5700     5709    +9     
- Partials      672      677    +5

Flag	Coverage Δ
unittests	`63.83% <70.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Rohit Jaiswal <[email protected]>

Signed-off-by: Roman Dmytrenko <[email protected]>

rohitnarayan · 2025-09-04T13:41:41Z

@erka @markphelps Please can you approve the workflow. Thank you!

rohitnarayan · 2025-09-04T14:22:43Z

@markphelps @erka All checks have passed. Can we merge this ?

markphelps · 2025-09-04T14:22:49Z

internal/storage/fs/git/store.go

 	if !updated && fetchErr == nil {
+		// No update and no error: record metrics for a successful no-change sync
+		duration := time.Since(syncStart).Seconds()


should this be milliseconds? im not sure which is more common (sub second syncing or it taking longer than a second)

@markphelps time.Since(syncStart).Seconds() already provides sub-second precision as a float64, capturing durations down to milliseconds and microseconds, which is sufficient for metrics without needing conversion.

markphelps · 2025-09-04T14:23:47Z

Hey @rohitnarayan !

Thank you again for adding Git sync observability! Instead of commenting on each line/block I just figured I'd give an overview of requested changes in this comment, as there are a few inconsistencies with our existing metrics patterns

Critical Issues

1. Inconsistent Metric Naming Conventions

internal/storage/fs/git/metrics.go Lines 40, 47: The metrics use inconsistent suffix patterns:

// Inconsistent - has redundant _count suffix
prometheus.BuildFQName(namespace, subsystem, "success_count")
prometheus.BuildFQName(namespace, subsystem, "failure_count") 

// Also inconsistent - no suffix  
prometheus.BuildFQName(namespace, subsystem, "flags_fetched")

Expected Pattern (based on existing metrics in /internal/server/metrics/metrics.go and /internal/cache/metrics.go):

// Should be consistent without redundant suffixes
prometheus.BuildFQName(namespace, subsystem, "success")
prometheus.BuildFQName(namespace, subsystem, "error")  // See next issue
prometheus.BuildFQName(namespace, subsystem, "flags_fetched")

2. Wrong Error Terminology

internal/storage/fs/git/metrics.go Line 47: Uses "failure_count" but our codebase consistently uses "error" for error metrics:

internal/cache/metrics.go:33: "error"
internal/server/metrics/metrics.go:21: "errors"
internal/server/metrics/metrics.go:38: "errors"

Should be:

prometheus.BuildFQName(namespace, subsystem, "error")

Major Issues

3. Complex Observable Gauge Implementation

internal/storage/fs/git/metrics.go Lines 59-85: The init() function with manual gauge creation and global state management doesn't follow our existing patterns. All other metrics in the codebase use simple variable declarations with metrics.Must*() helpers.

Current approach:

func init() {
    m := metrics.Meter()
    // Complex manual setup with panic handling
}

Existing pattern (see other metrics files): Simple variable declarations using helpers.

4. Missing Unit Specification

internal/storage/fs/git/metrics.go Line 26: The duration metric should specify units:

Duration = metrics.MustFloat64().
    Histogram(
        prometheus.BuildFQName(namespace, subsystem, "duration_seconds"),
        metric.WithDescription("The duration of git sync operations in seconds"),
        metric.WithUnit("s"), // Add this
    )

Minor Issues

5. Missing Attribute Constants

internal/storage/fs/git/metrics.go Lines 89, 96, 103, 110: Should define attribute keys as constants following the pattern in /internal/server/metrics/metrics.go:58-64:

// Add to top of file
var (
    AttributeSyncType = attribute.Key("sync_type")
)

// Then use consistently
Success.Add(ctx, 1, metric.WithAttributeSet(
    attribute.NewSet(AttributeSyncType.String(typ)),
))

6. Inconsistent API Usage

internal/storage/fs/git/metrics.go Line 64: Uses direct m.Int64ObservableGauge() while other metrics use metrics.Must*() helpers. Should be consistent with existing patterns.

Summary of Required Changes

internal/storage/fs/git/metrics.go Line 40: Change "success_count" → "success"
internal/storage/fs/git/metrics.go Line 47: Change "failure_count" → "error"
internal/storage/fs/git/metrics.go Line 26: Add metric.WithUnit("s") (or ms) if we decide milliseconds make more sense
internal/storage/fs/git/metrics.go Lines 59-85: Simplify observable gauge to match existing patterns
internal/storage/fs/git/metrics.go Lines 89, 96, 103, 110: Define and use attribute constants
Variable name: Rename Failure → Error for consistency

Signed-off-by: Rohit Jaiswal <[email protected]>

rohitnarayan · 2025-09-04T15:25:47Z

@markphelps thanks for those comments. I've addressed them, please check. Thank you!

refactor: simplify git sync metrics

rohitnarayan · 2025-09-04T16:27:39Z

@erka @markphelps please can you re-run the approval workflows. Thank you!

Signed-off-by: Rohit Jaiswal <[email protected]>

markphelps

looks great to me! thank you @rohitnarayan for bearing with us!! and thank you for the contribution

Signed-off-by: Roman Dmytrenko <[email protected]>

erka · 2025-09-04T21:11:51Z

@all-contributors please add @rohitnarayan for code

allcontributors · 2025-09-04T21:12:01Z

@erka

I've put up a pull request to add @rohitnarayan! 🎉

rohitnarayan requested a review from a team as a code owner September 1, 2025 06:53

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Sep 1, 2025

rohitnarayan changed the base branch from v2 to main September 1, 2025 06:53

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Sep 1, 2025

rohitnarayan mentioned this pull request Sep 1, 2025

Add metrics for Git-based backend sync operations #4590

Open

4 tasks

rohitnarayan force-pushed the flipt-4590_git_metrics branch from 583922f to 35edf0c Compare September 1, 2025 11:49

rohitnarayan and others added 2 commits September 1, 2025 17:22

Merge branch 'main' into flipt-4590_git_metrics

1116927

fix(tests): close channel only once

65fd758

Signed-off-by: Rohit Jaiswal <[email protected]>

markphelps reviewed Sep 1, 2025

View reviewed changes

fix(tests): add check for missing features repoURL

d399be0

Signed-off-by: Rohit Jaiswal <[email protected]>

rohitnarayan force-pushed the flipt-4590_git_metrics branch from 76f986f to d399be0 Compare September 2, 2025 01:41

Merge branch 'main' into flipt-4590_git_metrics

8d61382

markphelps reviewed Sep 2, 2025

View reviewed changes

feat(metrics): move git metrics in git package

d0c7faa

Signed-off-by: Rohit Jaiswal <[email protected]>

erka requested a review from Copilot September 3, 2025 12:59

Copilot AI reviewed Sep 3, 2025

View reviewed changes

erka reviewed Sep 3, 2025

View reviewed changes

Rohit Jaiswal added 2 commits September 3, 2025 22:16

feat(metrics): remove reason from sync metrics

f734f7f

Signed-off-by: Rohit Jaiswal <[email protected]>

feat(metrics): remove url check

0977600

Signed-off-by: Rohit Jaiswal <[email protected]>

rohitnarayan requested review from erka and markphelps September 3, 2025 17:23

Rohit Jaiswal and others added 2 commits September 4, 2025 07:06

fix(tests): metrics unit tests

5287fc7

Signed-off-by: Rohit Jaiswal <[email protected]>

refactor: simplify git sync metrics

e2a629c

Signed-off-by: Roman Dmytrenko <[email protected]>

markphelps reviewed Sep 4, 2025

View reviewed changes

Rohit Jaiswal added 2 commits September 4, 2025 20:34

feat(metrics): use consistent naming and code patterns

3aafa8d

Signed-off-by: Rohit Jaiswal <[email protected]>

feat(metrics): add unit tests for metrics

24d2b9b

Signed-off-by: Rohit Jaiswal <[email protected]>

rohitnarayan requested a review from markphelps September 4, 2025 15:24

erka mentioned this pull request Sep 4, 2025

refactor: simplify git sync metrics rohitnarayan/flipt#1

Merged

rohitnarayan added 2 commits September 4, 2025 21:55

Merge branch 'flipt-4590_git_metrics' into rd/v1/flipt-4590_git_metrics

691c094

refactor(metrics): simplify git sync metrics

e4826e5

refactor: simplify git sync metrics

feat(metrics): use correct meter func

bac0338

Signed-off-by: Rohit Jaiswal <[email protected]>

markphelps approved these changes Sep 4, 2025

View reviewed changes

remove unused code

b8039c7

Signed-off-by: Roman Dmytrenko <[email protected]>

erka approved these changes Sep 4, 2025

View reviewed changes

erka added the automerge Used by Kodiak bot to automerge PRs label Sep 4, 2025

kodiakhq bot merged commit 3fef0d1 into flipt-io:main Sep 4, 2025
36 checks passed

allcontributors bot mentioned this pull request Sep 4, 2025

docs: add rohitnarayan as a contributor for code #4684

Closed

erka mentioned this pull request Sep 7, 2025

docs: add rohitnarayan as a contributor for code #4708

Merged

	ObserveSync(ctx, duration, flagsFetched, false, syncType, fmt.Sprintf("%v", errs))
	ObserveSync(ctx, duration, flagsFetched, false, syncType, errors.Join(errs...).Error())

feat(metrics): add Git sync observability metrics to feature flag backend #4673

feat(metrics): add Git sync observability metrics to feature flag backend #4673

Uh oh!

Conversation

rohitnarayan commented Sep 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rohitnarayan commented Sep 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rohitnarayan commented Sep 4, 2025

Uh oh!

rohitnarayan commented Sep 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

markphelps commented Sep 4, 2025

Critical Issues

1. Inconsistent Metric Naming Conventions

2. Wrong Error Terminology

Major Issues

3. Complex Observable Gauge Implementation

4. Missing Unit Specification

Minor Issues

5. Missing Attribute Constants

6. Inconsistent API Usage

Summary of Required Changes

Uh oh!

codecov bot commented Sep 3, 2025 •

edited

Loading