-
-
Notifications
You must be signed in to change notification settings - Fork 2k
feature(metrics): add basic metrics func for relay #1600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: alpha
Are you sure you want to change the base?
Conversation
WalkthroughIntroduces Prometheus metrics support: adds metrics package and dependencies, initializes metrics and /metrics endpoint when enabled, and reports success/failure from relay and consume log paths. Adjusts pprof server start condition to also cover metrics. No public API changes except new metrics package functions. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Env as Env Vars
participant Main as main.go
participant Prom as Prometheus Handler
participant PProf as pprof Server
Env->>Main: ENABLE_METRICS / ENABLE_PPROF
alt ENABLE_METRICS
Main->>Main: metrics.InitMetrics()
Main->>Prom: http.Handle("/metrics", promhttp.Handler())
end
alt ENABLE_METRICS or ENABLE_PPROF
Main->>PProf: ListenAndServe :8005 (pprof/metrics mux)
note right of PProf: Serves /metrics if enabled
end
sequenceDiagram
autonumber
participant Client
participant Relay as controller/relay.Relay
participant Upstream as Upstream API
participant Log as model.RecordConsumeLog
participant Metrics as metrics pkg
Client->>Relay: Request
Relay->>Upstream: Forward
alt Upstream error
Upstream-->>Relay: Error (code)
Relay->>Metrics: ReportFailure(origin, upstream, group, channel_id, code)
Relay-->>Client: Error response
else Success
Upstream-->>Relay: Response
Relay->>Log: RecordConsumeLog(params)
Log->>Metrics: ReportSuccess(model, upstream, group, channel_id)
Relay-->>Client: Success response
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
metrics/metrics.go (1)
26-40
: Add meaningful Help text and re-evaluate label cardinality
- Help strings are empty. Prometheus linters expect a descriptive help; it makes dashboards and alerts clearer.
- Labels origin_model, upstream_model, group, and channel_id can create high series cardinality. Consider trimming labels (e.g., drop channel_id) or aggregating values upstream. Alternatively, collapse success/failure into a single counter with an outcome label.
Apply this diff to add Help text now:
func doInitMetrics() { enableMetrics = true relaySuccess = promauto.NewCounterVec(prometheus.CounterOpts{ Namespace: "newapi", Subsystem: "relay", Name: "success", - Help: "", + Help: "Total number of successful relay operations", }, []string{"origin_model", "upstream_model", "group", "channel_id"}) relayFailure = promauto.NewCounterVec(prometheus.CounterOpts{ Namespace: "newapi", Subsystem: "relay", Name: "failure", - Help: "", + Help: "Total number of failed relay operations", }, []string{"origin_model", "upstream_model", "group", "channel_id", "code"}) }If helpful, I can propose a follow-up to consolidate into newapi_relay_requests_total with an outcome label and fewer high-cardinality labels.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (5)
controller/relay.go
(2 hunks)go.mod
(4 hunks)main.go
(3 hunks)metrics/metrics.go
(1 hunks)model/log.go
(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
model/log.go (1)
metrics/metrics.go (1)
ReportSuccess
(42-50)
main.go (3)
metrics/metrics.go (1)
InitMetrics
(22-24)controller/uptime_kuma.go (1)
Monitor
(25-30)common/sys_log.go (1)
SysLog
(10-13)
controller/relay.go (3)
metrics/metrics.go (1)
ReportFailure
(52-60)constant/env.go (1)
ErrorLogEnabled
(15-15)types/error.go (1)
IsRecordErrorLog
(302-311)
🔇 Additional comments (3)
go.mod (2)
29-29
: Prometheus client_golang dependency addition looks goodDirectly adding github.com/prometheus/client_golang v1.23.0 aligns with the new metrics usage (promhttp in main.go and promauto in metrics). No concerns from my side on this pin.
50-50
: Confirm indirect Prometheus deps are consistent — go mod tidy cleanRan
go mod tidy
andgit diff --exit-code go.mod go.sum
; there were no changes to go.mod/go.sum. The indirect additions (beorn7/perks, client_model, common, procfs, goautoneg) match expected transitive deps for prometheus/client_golang.
- Locations to note:
- go.mod — line ~50 (e.g.
github.com/beorn7/perks v1.0.1 // indirect
)- Also check lines ~86–89 as previously mentioned.
Note: the sandbox showed a non-fatal "failed to start telemetry sidecar" warning while running
go mod tidy
; this did not change go.mod/go.sum.metrics/metrics.go (1)
42-50
: No-op gating and UNKNOWN fallback are solidGracefully no-op when metrics aren’t initialized and defaulting empty upstreamModel to UNKNOWN avoids nil derefs and sparse-label pitfalls.
Also applies to: 52-60
modelName := c.GetString("original_model") | ||
channelId := c.GetInt("channel_id") | ||
userGroup := c.GetString("group") | ||
metrics.ReportFailure(modelName, relayInfo.UpstreamModelName, userGroup, channelId, newAPIError.StatusCode) | ||
if constant.ErrorLogEnabled && types.IsRecordErrorLog(newAPIError) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid overcounting failures: Report failure once per request, not per retry attempt
ReportFailure is inside the retry loop and will increment on every attempt, inflating failure counts. Prefer emitting a single failure metric after the loop when we know the request truly failed.
Also minor: reuse originalModel already captured earlier rather than re-fetching from context.
Apply this diff to address both:
- modelName := c.GetString("original_model")
+ modelName := originalModel
channelId := c.GetInt("channel_id")
userGroup := c.GetString("group")
- metrics.ReportFailure(modelName, relayInfo.UpstreamModelName, userGroup, channelId, newAPIError.StatusCode)
Then, emit the metric once after the retry loop (see snippet below).
Add this outside the selected range (after the retry loop), right before building the retry log, so it's counted once per failed request:
```go
// After the for-loop and before logging retries:
if newAPIError != nil {
modelName := originalModel
channelId := c.GetInt("channel_id")
userGroup := c.GetString("group")
metrics.ReportFailure(modelName, relayInfo.UpstreamModelName, userGroup, channelId, newAPIError.StatusCode)
}
🤖 Prompt for AI Agents
In controller/relay.go around lines 177-181, remove the metrics.ReportFailure
call from inside the retry loop and instead emit it once after the loop when the
request truly failed: delete the in-loop call that re-reads original_model from
context, and add a single post-loop block that checks newAPIError != nil (and,
if desired, preserve the existing guards like constant.ErrorLogEnabled and
types.IsRecordErrorLog), then call metrics.ReportFailure using the
already-captured originalModel, c.GetInt("channel_id"), c.GetString("group"),
and newAPIError.StatusCode so failure is counted only once per request.
if os.Getenv("ENABLE_PPROF") == "true" || os.Getenv("ENABLE_METRICS") == "true" { | ||
gopool.Go(func() { | ||
if os.Getenv("ENABLE_METRICS") == "true" { | ||
metrics.InitMetrics() | ||
http.Handle("/metrics", promhttp.Handler()) | ||
} | ||
log.Println(http.ListenAndServe("0.0.0.0:8005", nil)) | ||
}) | ||
go common.Monitor() | ||
common.SysLog("pprof enabled") | ||
if os.Getenv("ENABLE_PPROF") == "true" { | ||
go common.Monitor() | ||
} | ||
common.SysLog("pprof or metrics enabled") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics-only path unintentionally exposes pprof endpoints; use a dedicated mux when pprof is disabled
Blank-importing net/http/pprof registers handlers on DefaultServeMux. In the current code, when ENABLE_METRICS=true and ENABLE_PPROF=false, you still start an HTTP server with the default mux (nil), thereby exposing pprof unintentionally. This is a security/observability footgun.
Use a dedicated ServeMux for metrics-only so pprof stays hidden unless explicitly enabled.
Apply this diff:
- if os.Getenv("ENABLE_PPROF") == "true" || os.Getenv("ENABLE_METRICS") == "true" {
- gopool.Go(func() {
- if os.Getenv("ENABLE_METRICS") == "true" {
- metrics.InitMetrics()
- http.Handle("/metrics", promhttp.Handler())
- }
- log.Println(http.ListenAndServe("0.0.0.0:8005", nil))
- })
- if os.Getenv("ENABLE_PPROF") == "true" {
- go common.Monitor()
- }
- common.SysLog("pprof or metrics enabled")
- }
+ if os.Getenv("ENABLE_PPROF") == "true" || os.Getenv("ENABLE_METRICS") == "true" {
+ gopool.Go(func() {
+ addr := "0.0.0.0:8005"
+ if os.Getenv("ENABLE_PPROF") == "true" {
+ // pprof enabled: use DefaultServeMux so pprof endpoints are available
+ if os.Getenv("ENABLE_METRICS") == "true" {
+ metrics.InitMetrics()
+ http.Handle("/metrics", promhttp.Handler())
+ }
+ log.Println(http.ListenAndServe(addr, nil))
+ return
+ }
+ // metrics only: do NOT use DefaultServeMux to avoid exposing pprof endpoints
+ metrics.InitMetrics()
+ mux := http.NewServeMux()
+ mux.Handle("/metrics", promhttp.Handler())
+ log.Println(http.ListenAndServe(addr, mux))
+ })
+ if os.Getenv("ENABLE_PPROF") == "true" {
+ go common.Monitor()
+ }
+ common.SysLog("pprof or metrics enabled")
+ }
Optional hardening:
- Bind metrics to 127.0.0.1 by default (override via METRICS_LISTEN_ADDR) to avoid exposing labels containing group/channel_id externally.
- Or add reverse-proxy auth in front of /metrics.
🤖 Prompt for AI Agents
In main.go around lines 120-132, the code starts an HTTP server using the
default ServeMux (nil) which exposes net/http/pprof handlers even when
ENABLE_PPROF=false; create and use a dedicated http.NewServeMux() when serving
metrics-only so pprof remains hidden. Specifically, build a mux :=
http.NewServeMux(); if ENABLE_METRICS == "true" register mux.Handle("/metrics",
promhttp.Handler()); when ENABLE_PPROF == "true" either register pprof handlers
into that mux or switch to the default mux only then; pass the chosen mux to
http.ListenAndServe instead of nil. Also consider using an env
METRICS_LISTEN_ADDR (default 127.0.0.1:8005) to bind metrics locally unless
explicitly overridden.
var upstreamModel string | ||
if params.Other != nil && params.Other["upstream_model_name"] != nil { | ||
upstreamModel = params.Other["upstream_model_name"].(string) | ||
} | ||
metrics.ReportSuccess(params.ModelName, upstreamModel, params.Group, params.ChannelId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix potential panic: unsafe type assertion on upstream_model_name
params.Other["upstream_model_name"] is asserted to string without an ok check. If it’s present but not a string, this will panic. Safely assert and fall back to fmt.Sprint for non-strings.
Apply this diff:
- var upstreamModel string
- if params.Other != nil && params.Other["upstream_model_name"] != nil {
- upstreamModel = params.Other["upstream_model_name"].(string)
- }
+ var upstreamModel string
+ if params.Other != nil {
+ if v, ok := params.Other["upstream_model_name"]; ok && v != nil {
+ if s, ok := v.(string); ok {
+ upstreamModel = s
+ } else {
+ upstreamModel = fmt.Sprint(v)
+ }
+ }
+ }
metrics.ReportSuccess(params.ModelName, upstreamModel, params.Group, params.ChannelId)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
var upstreamModel string | |
if params.Other != nil && params.Other["upstream_model_name"] != nil { | |
upstreamModel = params.Other["upstream_model_name"].(string) | |
} | |
metrics.ReportSuccess(params.ModelName, upstreamModel, params.Group, params.ChannelId) | |
var upstreamModel string | |
if params.Other != nil { | |
if v, ok := params.Other["upstream_model_name"]; ok && v != nil { | |
if s, ok := v.(string); ok { | |
upstreamModel = s | |
} else { | |
upstreamModel = fmt.Sprint(v) | |
} | |
} | |
} | |
metrics.ReportSuccess(params.ModelName, upstreamModel, params.Group, params.ChannelId) |
🤖 Prompt for AI Agents
In model/log.go around lines 155-159, the code unsafely asserts
params.Other["upstream_model_name"].(string) which can panic if the value exists
but is not a string; change it to safely handle the lookup and types: ensure
params.Other != nil, fetch val := params.Other["upstream_model_name"], then if
s, ok := val.(string) { upstreamModel = s } else if val != nil { upstreamModel =
fmt.Sprint(val) } to gracefully convert non-string values; add an import for fmt
if not already present.
Summary by CodeRabbit
New Features
Chores