Skip to content

Conversation

wzxjohn
Copy link
Contributor

@wzxjohn wzxjohn commented Aug 15, 2025

Summary by CodeRabbit

  • New Features

    • Optional Prometheus metrics for relay activity. Enable via ENABLE_METRICS to expose a /metrics endpoint on the existing diagnostics port.
    • Metrics include labeled success and failure counters; no impact when disabled. Compatible with existing profiling behavior.
  • Chores

    • Added Prometheus client libraries.
    • Updated dependencies (golang.org/x/*, protobuf).
    • Removed an unused indirect dependency.

Copy link
Contributor

coderabbitai bot commented Aug 15, 2025

Walkthrough

Introduces Prometheus metrics support: adds metrics package and dependencies, initializes metrics and /metrics endpoint when enabled, and reports success/failure from relay and consume log paths. Adjusts pprof server start condition to also cover metrics. No public API changes except new metrics package functions.

Changes

Cohort / File(s) Summary
Metrics Infrastructure
metrics/metrics.go, go.mod, main.go
Adds Prometheus dependencies and new metrics package with InitMetrics/ReportSuccess/ReportFailure. Initializes metrics when ENABLE_METRICS is true, exposes /metrics via promhttp, and starts the diagnostics server when pprof or metrics are enabled.
Relay Error Handling Metrics
controller/relay.go
Imports metrics; on downstream errors reports failure with labels (origin model, upstream model, group, channel_id, code). Refactors variable scope in error logging to avoid shadowing.
Consume Log Success Metrics
model/log.go
Imports metrics; extracts upstream model from params.Other and reports success with labels before existing log guards.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Env as Env Vars
    participant Main as main.go
    participant Prom as Prometheus Handler
    participant PProf as pprof Server

    Env->>Main: ENABLE_METRICS / ENABLE_PPROF
    alt ENABLE_METRICS
        Main->>Main: metrics.InitMetrics()
        Main->>Prom: http.Handle("/metrics", promhttp.Handler())
    end
    alt ENABLE_METRICS or ENABLE_PPROF
        Main->>PProf: ListenAndServe :8005 (pprof/metrics mux)
        note right of PProf: Serves /metrics if enabled
    end
Loading
sequenceDiagram
    autonumber
    participant Client
    participant Relay as controller/relay.Relay
    participant Upstream as Upstream API
    participant Log as model.RecordConsumeLog
    participant Metrics as metrics pkg

    Client->>Relay: Request
    Relay->>Upstream: Forward
    alt Upstream error
        Upstream-->>Relay: Error (code)
        Relay->>Metrics: ReportFailure(origin, upstream, group, channel_id, code)
        Relay-->>Client: Error response
    else Success
        Upstream-->>Relay: Response
        Relay->>Log: RecordConsumeLog(params)
        Log->>Metrics: ReportSuccess(model, upstream, group, channel_id)
        Relay-->>Client: Success response
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my ears at counters bright,
Success and failure hop in sight.
A nibble of metrics, a sip of logs,
Prometheus fields where data jogs.
On port eight-zero-zero-five I peek—
“All clear,” thumps the rabbit’s cheek. 🐇📈

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@wzxjohn wzxjohn changed the base branch from main to alpha August 15, 2025 15:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
metrics/metrics.go (1)

26-40: Add meaningful Help text and re-evaluate label cardinality

  • Help strings are empty. Prometheus linters expect a descriptive help; it makes dashboards and alerts clearer.
  • Labels origin_model, upstream_model, group, and channel_id can create high series cardinality. Consider trimming labels (e.g., drop channel_id) or aggregating values upstream. Alternatively, collapse success/failure into a single counter with an outcome label.

Apply this diff to add Help text now:

 func doInitMetrics() {
   enableMetrics = true
   relaySuccess = promauto.NewCounterVec(prometheus.CounterOpts{
     Namespace: "newapi",
     Subsystem: "relay",
     Name:      "success",
-    Help:      "",
+    Help:      "Total number of successful relay operations",
   }, []string{"origin_model", "upstream_model", "group", "channel_id"})
   relayFailure = promauto.NewCounterVec(prometheus.CounterOpts{
     Namespace: "newapi",
     Subsystem: "relay",
     Name:      "failure",
-    Help:      "",
+    Help:      "Total number of failed relay operations",
   }, []string{"origin_model", "upstream_model", "group", "channel_id", "code"})
 }

If helpful, I can propose a follow-up to consolidate into newapi_relay_requests_total with an outcome label and fewer high-cardinality labels.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f6a4843 and e7d5435.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (5)
  • controller/relay.go (2 hunks)
  • go.mod (4 hunks)
  • main.go (3 hunks)
  • metrics/metrics.go (1 hunks)
  • model/log.go (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
model/log.go (1)
metrics/metrics.go (1)
  • ReportSuccess (42-50)
main.go (3)
metrics/metrics.go (1)
  • InitMetrics (22-24)
controller/uptime_kuma.go (1)
  • Monitor (25-30)
common/sys_log.go (1)
  • SysLog (10-13)
controller/relay.go (3)
metrics/metrics.go (1)
  • ReportFailure (52-60)
constant/env.go (1)
  • ErrorLogEnabled (15-15)
types/error.go (1)
  • IsRecordErrorLog (302-311)
🔇 Additional comments (3)
go.mod (2)

29-29: Prometheus client_golang dependency addition looks good

Directly adding github.com/prometheus/client_golang v1.23.0 aligns with the new metrics usage (promhttp in main.go and promauto in metrics). No concerns from my side on this pin.


50-50: Confirm indirect Prometheus deps are consistent — go mod tidy clean

Ran go mod tidy and git diff --exit-code go.mod go.sum; there were no changes to go.mod/go.sum. The indirect additions (beorn7/perks, client_model, common, procfs, goautoneg) match expected transitive deps for prometheus/client_golang.

  • Locations to note:
    • go.mod — line ~50 (e.g. github.com/beorn7/perks v1.0.1 // indirect)
    • Also check lines ~86–89 as previously mentioned.

Note: the sandbox showed a non-fatal "failed to start telemetry sidecar" warning while running go mod tidy; this did not change go.mod/go.sum.

metrics/metrics.go (1)

42-50: No-op gating and UNKNOWN fallback are solid

Gracefully no-op when metrics aren’t initialized and defaulting empty upstreamModel to UNKNOWN avoids nil derefs and sparse-label pitfalls.

Also applies to: 52-60

Comment on lines +177 to 181
modelName := c.GetString("original_model")
channelId := c.GetInt("channel_id")
userGroup := c.GetString("group")
metrics.ReportFailure(modelName, relayInfo.UpstreamModelName, userGroup, channelId, newAPIError.StatusCode)
if constant.ErrorLogEnabled && types.IsRecordErrorLog(newAPIError) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid overcounting failures: Report failure once per request, not per retry attempt

ReportFailure is inside the retry loop and will increment on every attempt, inflating failure counts. Prefer emitting a single failure metric after the loop when we know the request truly failed.

Also minor: reuse originalModel already captured earlier rather than re-fetching from context.

Apply this diff to address both:

-			modelName := c.GetString("original_model")
+			modelName := originalModel
 			channelId := c.GetInt("channel_id")
 			userGroup := c.GetString("group")
-			metrics.ReportFailure(modelName, relayInfo.UpstreamModelName, userGroup, channelId, newAPIError.StatusCode)

Then, emit the metric once after the retry loop (see snippet below).


Add this outside the selected range (after the retry loop), right before building the retry log, so it's counted once per failed request:

```go
// After the for-loop and before logging retries:
if newAPIError != nil {
	modelName := originalModel
	channelId := c.GetInt("channel_id")
	userGroup := c.GetString("group")
	metrics.ReportFailure(modelName, relayInfo.UpstreamModelName, userGroup, channelId, newAPIError.StatusCode)
}
🤖 Prompt for AI Agents
In controller/relay.go around lines 177-181, remove the metrics.ReportFailure
call from inside the retry loop and instead emit it once after the loop when the
request truly failed: delete the in-loop call that re-reads original_model from
context, and add a single post-loop block that checks newAPIError != nil (and,
if desired, preserve the existing guards like constant.ErrorLogEnabled and
types.IsRecordErrorLog), then call metrics.ReportFailure using the
already-captured originalModel, c.GetInt("channel_id"), c.GetString("group"),
and newAPIError.StatusCode so failure is counted only once per request.

Comment on lines +120 to 132
if os.Getenv("ENABLE_PPROF") == "true" || os.Getenv("ENABLE_METRICS") == "true" {
gopool.Go(func() {
if os.Getenv("ENABLE_METRICS") == "true" {
metrics.InitMetrics()
http.Handle("/metrics", promhttp.Handler())
}
log.Println(http.ListenAndServe("0.0.0.0:8005", nil))
})
go common.Monitor()
common.SysLog("pprof enabled")
if os.Getenv("ENABLE_PPROF") == "true" {
go common.Monitor()
}
common.SysLog("pprof or metrics enabled")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Metrics-only path unintentionally exposes pprof endpoints; use a dedicated mux when pprof is disabled

Blank-importing net/http/pprof registers handlers on DefaultServeMux. In the current code, when ENABLE_METRICS=true and ENABLE_PPROF=false, you still start an HTTP server with the default mux (nil), thereby exposing pprof unintentionally. This is a security/observability footgun.

Use a dedicated ServeMux for metrics-only so pprof stays hidden unless explicitly enabled.

Apply this diff:

-	if os.Getenv("ENABLE_PPROF") == "true" || os.Getenv("ENABLE_METRICS") == "true" {
-		gopool.Go(func() {
-			if os.Getenv("ENABLE_METRICS") == "true" {
-				metrics.InitMetrics()
-				http.Handle("/metrics", promhttp.Handler())
-			}
-			log.Println(http.ListenAndServe("0.0.0.0:8005", nil))
-		})
-		if os.Getenv("ENABLE_PPROF") == "true" {
-			go common.Monitor()
-		}
-		common.SysLog("pprof or metrics enabled")
-	}
+	if os.Getenv("ENABLE_PPROF") == "true" || os.Getenv("ENABLE_METRICS") == "true" {
+		gopool.Go(func() {
+			addr := "0.0.0.0:8005"
+			if os.Getenv("ENABLE_PPROF") == "true" {
+				// pprof enabled: use DefaultServeMux so pprof endpoints are available
+				if os.Getenv("ENABLE_METRICS") == "true" {
+					metrics.InitMetrics()
+					http.Handle("/metrics", promhttp.Handler())
+				}
+				log.Println(http.ListenAndServe(addr, nil))
+				return
+			}
+			// metrics only: do NOT use DefaultServeMux to avoid exposing pprof endpoints
+			metrics.InitMetrics()
+			mux := http.NewServeMux()
+			mux.Handle("/metrics", promhttp.Handler())
+			log.Println(http.ListenAndServe(addr, mux))
+		})
+		if os.Getenv("ENABLE_PPROF") == "true" {
+			go common.Monitor()
+		}
+		common.SysLog("pprof or metrics enabled")
+	}

Optional hardening:

  • Bind metrics to 127.0.0.1 by default (override via METRICS_LISTEN_ADDR) to avoid exposing labels containing group/channel_id externally.
  • Or add reverse-proxy auth in front of /metrics.
🤖 Prompt for AI Agents
In main.go around lines 120-132, the code starts an HTTP server using the
default ServeMux (nil) which exposes net/http/pprof handlers even when
ENABLE_PPROF=false; create and use a dedicated http.NewServeMux() when serving
metrics-only so pprof remains hidden. Specifically, build a mux :=
http.NewServeMux(); if ENABLE_METRICS == "true" register mux.Handle("/metrics",
promhttp.Handler()); when ENABLE_PPROF == "true" either register pprof handlers
into that mux or switch to the default mux only then; pass the chosen mux to
http.ListenAndServe instead of nil. Also consider using an env
METRICS_LISTEN_ADDR (default 127.0.0.1:8005) to bind metrics locally unless
explicitly overridden.

Comment on lines +155 to +159
var upstreamModel string
if params.Other != nil && params.Other["upstream_model_name"] != nil {
upstreamModel = params.Other["upstream_model_name"].(string)
}
metrics.ReportSuccess(params.ModelName, upstreamModel, params.Group, params.ChannelId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix potential panic: unsafe type assertion on upstream_model_name

params.Other["upstream_model_name"] is asserted to string without an ok check. If it’s present but not a string, this will panic. Safely assert and fall back to fmt.Sprint for non-strings.

Apply this diff:

-	var upstreamModel string
-	if params.Other != nil && params.Other["upstream_model_name"] != nil {
-		upstreamModel = params.Other["upstream_model_name"].(string)
-	}
+	var upstreamModel string
+	if params.Other != nil {
+		if v, ok := params.Other["upstream_model_name"]; ok && v != nil {
+			if s, ok := v.(string); ok {
+				upstreamModel = s
+			} else {
+				upstreamModel = fmt.Sprint(v)
+			}
+		}
+	}
 	metrics.ReportSuccess(params.ModelName, upstreamModel, params.Group, params.ChannelId)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var upstreamModel string
if params.Other != nil && params.Other["upstream_model_name"] != nil {
upstreamModel = params.Other["upstream_model_name"].(string)
}
metrics.ReportSuccess(params.ModelName, upstreamModel, params.Group, params.ChannelId)
var upstreamModel string
if params.Other != nil {
if v, ok := params.Other["upstream_model_name"]; ok && v != nil {
if s, ok := v.(string); ok {
upstreamModel = s
} else {
upstreamModel = fmt.Sprint(v)
}
}
}
metrics.ReportSuccess(params.ModelName, upstreamModel, params.Group, params.ChannelId)
🤖 Prompt for AI Agents
In model/log.go around lines 155-159, the code unsafely asserts
params.Other["upstream_model_name"].(string) which can panic if the value exists
but is not a string; change it to safely handle the lookup and types: ensure
params.Other != nil, fetch val := params.Other["upstream_model_name"], then if
s, ok := val.(string) { upstreamModel = s } else if val != nil { upstreamModel =
fmt.Sprint(val) } to gracefully convert non-string values; add an import for fmt
if not already present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant