feature(metrics): add basic metrics func for relay #1600

wzxjohn · 2025-08-15T15:38:58Z

Summary by CodeRabbit

New Features
- Optional Prometheus metrics for relay activity. Enable via ENABLE_METRICS to expose a /metrics endpoint on the existing diagnostics port.
- Metrics include labeled success and failure counters; no impact when disabled. Compatible with existing profiling behavior.
Chores
- Added Prometheus client libraries.
- Updated dependencies (golang.org/x/*, protobuf).
- Removed an unused indirect dependency.

coderabbitai · 2025-08-15T15:39:09Z

Walkthrough

Introduces Prometheus metrics support: adds metrics package and dependencies, initializes metrics and /metrics endpoint when enabled, and reports success/failure from relay and consume log paths. Adjusts pprof server start condition to also cover metrics. No public API changes except new metrics package functions.

Changes

Cohort / File(s)	Summary
Metrics Infrastructure `metrics/metrics.go`, `go.mod`, `main.go`	Adds Prometheus dependencies and new metrics package with InitMetrics/ReportSuccess/ReportFailure. Initializes metrics when ENABLE_METRICS is true, exposes /metrics via promhttp, and starts the diagnostics server when pprof or metrics are enabled.
Relay Error Handling Metrics `controller/relay.go`	Imports metrics; on downstream errors reports failure with labels (origin model, upstream model, group, channel_id, code). Refactors variable scope in error logging to avoid shadowing.
Consume Log Success Metrics `model/log.go`	Imports metrics; extracts upstream model from params.Other and reports success with labels before existing log guards.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Env as Env Vars
    participant Main as main.go
    participant Prom as Prometheus Handler
    participant PProf as pprof Server

    Env->>Main: ENABLE_METRICS / ENABLE_PPROF
    alt ENABLE_METRICS
        Main->>Main: metrics.InitMetrics()
        Main->>Prom: http.Handle("/metrics", promhttp.Handler())
    end
    alt ENABLE_METRICS or ENABLE_PPROF
        Main->>PProf: ListenAndServe :8005 (pprof/metrics mux)
        note right of PProf: Serves /metrics if enabled
    end

sequenceDiagram
    autonumber
    participant Client
    participant Relay as controller/relay.Relay
    participant Upstream as Upstream API
    participant Log as model.RecordConsumeLog
    participant Metrics as metrics pkg

    Client->>Relay: Request
    Relay->>Upstream: Forward
    alt Upstream error
        Upstream-->>Relay: Error (code)
        Relay->>Metrics: ReportFailure(origin, upstream, group, channel_id, code)
        Relay-->>Client: Error response
    else Success
        Upstream-->>Relay: Response
        Relay->>Log: RecordConsumeLog(params)
        Log->>Metrics: ReportSuccess(model, upstream, group, channel_id)
        Relay-->>Client: Success response
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my ears at counters bright,
Success and failure hop in sight.
A nibble of metrics, a sip of logs,
Prometheus fields where data jogs.
On port eight-zero-zero-five I peek—
“All clear,” thumps the rabbit’s cheek. 🐇📈

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

metrics/metrics.go (1)
26-40: Add meaningful Help text and re-evaluate label cardinality

Help strings are empty. Prometheus linters expect a descriptive help; it makes dashboards and alerts clearer.

Labels origin_model, upstream_model, group, and channel_id can create high series cardinality. Consider trimming labels (e.g., drop channel_id) or aggregating values upstream. Alternatively, collapse success/failure into a single counter with an outcome label.

Apply this diff to add Help text now:
 func doInitMetrics() {
   enableMetrics = true
   relaySuccess = promauto.NewCounterVec(prometheus.CounterOpts{
     Namespace: "newapi",
     Subsystem: "relay",
     Name:      "success",
-    Help:      "",
+    Help:      "Total number of successful relay operations",
   }, []string{"origin_model", "upstream_model", "group", "channel_id"})
   relayFailure = promauto.NewCounterVec(prometheus.CounterOpts{
     Namespace: "newapi",
     Subsystem: "relay",
     Name:      "failure",
-    Help:      "",
+    Help:      "Total number of failed relay operations",
   }, []string{"origin_model", "upstream_model", "group", "channel_id", "code"})
 }
If helpful, I can propose a follow-up to consolidate into newapi_relay_requests_total with an outcome label and fewer high-cardinality labels.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f6a4843 and e7d5435.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (5)

controller/relay.go (2 hunks)
go.mod (4 hunks)
main.go (3 hunks)
metrics/metrics.go (1 hunks)
model/log.go (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (3)

model/log.go (1)

metrics/metrics.go (1)

ReportSuccess (42-50)

main.go (3)

metrics/metrics.go (1)

InitMetrics (22-24)

controller/uptime_kuma.go (1)

Monitor (25-30)

common/sys_log.go (1)

SysLog (10-13)

controller/relay.go (3)

metrics/metrics.go (1)

ReportFailure (52-60)

constant/env.go (1)

ErrorLogEnabled (15-15)

types/error.go (1)

IsRecordErrorLog (302-311)

🔇 Additional comments (3)

go.mod (2)

29-29: Prometheus client_golang dependency addition looks good

Directly adding github.com/prometheus/client_golang v1.23.0 aligns with the new metrics usage (promhttp in main.go and promauto in metrics). No concerns from my side on this pin.

50-50: Confirm indirect Prometheus deps are consistent — go mod tidy clean

Ran go mod tidy and git diff --exit-code go.mod go.sum; there were no changes to go.mod/go.sum. The indirect additions (beorn7/perks, client_model, common, procfs, goautoneg) match expected transitive deps for prometheus/client_golang.

Locations to note:

go.mod — line ~50 (e.g. github.com/beorn7/perks v1.0.1 // indirect)

Also check lines ~86–89 as previously mentioned.

Note: the sandbox showed a non-fatal "failed to start telemetry sidecar" warning while running go mod tidy; this did not change go.mod/go.sum.

metrics/metrics.go (1)

42-50: No-op gating and UNKNOWN fallback are solid

Gracefully no-op when metrics aren’t initialized and defaulting empty upstreamModel to UNKNOWN avoids nil derefs and sparse-label pitfalls.

Also applies to: 52-60

coderabbitai · 2025-08-15T16:10:29Z

controller/relay.go

+			modelName := c.GetString("original_model")
+			channelId := c.GetInt("channel_id")
+			userGroup := c.GetString("group")
+			metrics.ReportFailure(modelName, relayInfo.UpstreamModelName, userGroup, channelId, newAPIError.StatusCode)
 			if constant.ErrorLogEnabled && types.IsRecordErrorLog(newAPIError) {


🛠️ Refactor suggestion

Avoid overcounting failures: Report failure once per request, not per retry attempt

ReportFailure is inside the retry loop and will increment on every attempt, inflating failure counts. Prefer emitting a single failure metric after the loop when we know the request truly failed.

Also minor: reuse originalModel already captured earlier rather than re-fetching from context.

Apply this diff to address both:

- modelName := c.GetString("original_model") + modelName := originalModel channelId := c.GetInt("channel_id") userGroup := c.GetString("group") - metrics.ReportFailure(modelName, relayInfo.UpstreamModelName, userGroup, channelId, newAPIError.StatusCode)

Then, emit the metric once after the retry loop (see snippet below).

Add this outside the selected range (after the retry loop), right before building the retry log, so it's counted once per failed request: ```go // After the for-loop and before logging retries: if newAPIError != nil { modelName := originalModel channelId := c.GetInt("channel_id") userGroup := c.GetString("group") metrics.ReportFailure(modelName, relayInfo.UpstreamModelName, userGroup, channelId, newAPIError.StatusCode) }

🤖 Prompt for AI Agents

In controller/relay.go around lines 177-181, remove the metrics.ReportFailure call from inside the retry loop and instead emit it once after the loop when the request truly failed: delete the in-loop call that re-reads original_model from context, and add a single post-loop block that checks newAPIError != nil (and, if desired, preserve the existing guards like constant.ErrorLogEnabled and types.IsRecordErrorLog), then call metrics.ReportFailure using the already-captured originalModel, c.GetInt("channel_id"), c.GetString("group"), and newAPIError.StatusCode so failure is counted only once per request.

coderabbitai · 2025-08-15T16:10:29Z

main.go

+	if os.Getenv("ENABLE_PPROF") == "true" || os.Getenv("ENABLE_METRICS") == "true" {
 		gopool.Go(func() {
+			if os.Getenv("ENABLE_METRICS") == "true" {
+				metrics.InitMetrics()
+				http.Handle("/metrics", promhttp.Handler())
+			}
 			log.Println(http.ListenAndServe("0.0.0.0:8005", nil))
 		})
-		go common.Monitor()
-		common.SysLog("pprof enabled")
+		if os.Getenv("ENABLE_PPROF") == "true" {
+			go common.Monitor()
+		}
+		common.SysLog("pprof or metrics enabled")
 	}


⚠️ Potential issue

Metrics-only path unintentionally exposes pprof endpoints; use a dedicated mux when pprof is disabled

Blank-importing net/http/pprof registers handlers on DefaultServeMux. In the current code, when ENABLE_METRICS=true and ENABLE_PPROF=false, you still start an HTTP server with the default mux (nil), thereby exposing pprof unintentionally. This is a security/observability footgun.

Use a dedicated ServeMux for metrics-only so pprof stays hidden unless explicitly enabled.

Apply this diff:

- if os.Getenv("ENABLE_PPROF") == "true" || os.Getenv("ENABLE_METRICS") == "true" { - gopool.Go(func() { - if os.Getenv("ENABLE_METRICS") == "true" { - metrics.InitMetrics() - http.Handle("/metrics", promhttp.Handler()) - } - log.Println(http.ListenAndServe("0.0.0.0:8005", nil)) - }) - if os.Getenv("ENABLE_PPROF") == "true" { - go common.Monitor() - } - common.SysLog("pprof or metrics enabled") - } + if os.Getenv("ENABLE_PPROF") == "true" || os.Getenv("ENABLE_METRICS") == "true" { + gopool.Go(func() { + addr := "0.0.0.0:8005" + if os.Getenv("ENABLE_PPROF") == "true" { + // pprof enabled: use DefaultServeMux so pprof endpoints are available + if os.Getenv("ENABLE_METRICS") == "true" { + metrics.InitMetrics() + http.Handle("/metrics", promhttp.Handler()) + } + log.Println(http.ListenAndServe(addr, nil)) + return + } + // metrics only: do NOT use DefaultServeMux to avoid exposing pprof endpoints + metrics.InitMetrics() + mux := http.NewServeMux() + mux.Handle("/metrics", promhttp.Handler()) + log.Println(http.ListenAndServe(addr, mux)) + }) + if os.Getenv("ENABLE_PPROF") == "true" { + go common.Monitor() + } + common.SysLog("pprof or metrics enabled") + }

Optional hardening:

Bind metrics to 127.0.0.1 by default (override via METRICS_LISTEN_ADDR) to avoid exposing labels containing group/channel_id externally.

Or add reverse-proxy auth in front of /metrics.

🤖 Prompt for AI Agents

In main.go around lines 120-132, the code starts an HTTP server using the default ServeMux (nil) which exposes net/http/pprof handlers even when ENABLE_PPROF=false; create and use a dedicated http.NewServeMux() when serving metrics-only so pprof remains hidden. Specifically, build a mux := http.NewServeMux(); if ENABLE_METRICS == "true" register mux.Handle("/metrics", promhttp.Handler()); when ENABLE_PPROF == "true" either register pprof handlers into that mux or switch to the default mux only then; pass the chosen mux to http.ListenAndServe instead of nil. Also consider using an env METRICS_LISTEN_ADDR (default 127.0.0.1:8005) to bind metrics locally unless explicitly overridden.

coderabbitai · 2025-08-15T16:10:29Z

model/log.go

+	var upstreamModel string
+	if params.Other != nil && params.Other["upstream_model_name"] != nil {
+		upstreamModel = params.Other["upstream_model_name"].(string)
+	}
+	metrics.ReportSuccess(params.ModelName, upstreamModel, params.Group, params.ChannelId)


⚠️ Potential issue

Fix potential panic: unsafe type assertion on upstream_model_name

params.Other["upstream_model_name"] is asserted to string without an ok check. If it’s present but not a string, this will panic. Safely assert and fall back to fmt.Sprint for non-strings.

Apply this diff:

- var upstreamModel string - if params.Other != nil && params.Other["upstream_model_name"] != nil { - upstreamModel = params.Other["upstream_model_name"].(string) - } + var upstreamModel string + if params.Other != nil { + if v, ok := params.Other["upstream_model_name"]; ok && v != nil { + if s, ok := v.(string); ok { + upstreamModel = s + } else { + upstreamModel = fmt.Sprint(v) + } + } + } metrics.ReportSuccess(params.ModelName, upstreamModel, params.Group, params.ChannelId)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

var upstreamModel string

if params.Other != nil && params.Other["upstream_model_name"] != nil {

upstreamModel = params.Other["upstream_model_name"].(string)

}

metrics.ReportSuccess(params.ModelName, upstreamModel, params.Group, params.ChannelId)

var upstreamModel string

if params.Other != nil {

if v, ok := params.Other["upstream_model_name"]; ok && v != nil {

if s, ok := v.(string); ok {

upstreamModel = s

} else {

upstreamModel = fmt.Sprint(v)

}

}

}

metrics.ReportSuccess(params.ModelName, upstreamModel, params.Group, params.ChannelId)

🤖 Prompt for AI Agents

In model/log.go around lines 155-159, the code unsafely asserts params.Other["upstream_model_name"].(string) which can panic if the value exists but is not a string; change it to safely handle the lookup and types: ensure params.Other != nil, fetch val := params.Other["upstream_model_name"], then if s, ok := val.(string) { upstreamModel = s } else if val != nil { upstreamModel = fmt.Sprint(val) } to gracefully convert non-string values; add an import for fmt if not already present.

wzxjohn added 2 commits August 15, 2025 23:10

feat(metrics): add basic relay metrics

11da69c

feat(metrics): add upstream model name to label

f68cb39

wzxjohn changed the base branch from main to alpha August 15, 2025 15:39

fix(metrics): panic if not enabled

e7d5435

coderabbitai bot reviewed Aug 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feature(metrics): add basic metrics func for relay #1600

feature(metrics): add basic metrics func for relay #1600

Uh oh!

wzxjohn commented Aug 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 15, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 15, 2025

Uh oh!

coderabbitai bot Aug 15, 2025

Uh oh!

coderabbitai bot Aug 15, 2025

Uh oh!

Uh oh!

Uh oh!

feature(metrics): add basic metrics func for relay #1600

Are you sure you want to change the base?

feature(metrics): add basic metrics func for relay #1600

Uh oh!

Conversation

wzxjohn commented Aug 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wzxjohn commented Aug 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 15, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)