-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(telemetry)_: replace telemetry with prometheus metrics #6256
base: develop
Are you sure you want to change the base?
Conversation
Jenkins BuildsClick to see older builds (88)
|
7c4ff40
to
3763072
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6256 +/- ##
===========================================
- Coverage 61.92% 61.71% -0.22%
===========================================
Files 843 844 +1
Lines 111286 111103 -183
===========================================
- Hits 68918 68567 -351
- Misses 34388 34562 +174
+ Partials 7980 7974 -6
Flags with carried forward coverage won't be shown. Click here to find out more.
|
metrics/wakumetrics/metrics.go
Outdated
) | ||
|
||
var ( | ||
MessagesSentTotal = prometheus.NewHistogramVec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is SomethingSomethingTotal a histogram? Shouldn't it be just a normal counter? How do you plan to use it as histogram?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated all metrics to use gauge/counter as appropriate
metrics/wakumetrics/metrics.go
Outdated
}, | ||
) | ||
|
||
PeersByOrigin = prometheus.NewGaugeVec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these currently connected peers or overall number? Cause I'd expect PeersByOrigin to be only increasing as new are discovered (i.e. Counter)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on currently connected peers obtained by periodically looking at Waku's peer store
d31d899
to
5340c57
Compare
@@ -23,6 +23,9 @@ type InitializeApplication struct { | |||
LogEnabled bool `json:"logEnabled"` | |||
LogLevel string `json:"logLevel"` | |||
APILoggingEnabled bool `json:"apiLoggingEnabled"` | |||
|
|||
WakuMetricsEnabled bool `json:"wakuMetricsEnabled"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not only about waku metrics, right? This already includes libp2p metrics, and we could add more status-go stuff there later.
Maybe make it just metrics
and metrics-port
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same applies to desktop CLI options
@@ -23,6 +23,9 @@ type InitializeApplication struct { | |||
LogEnabled bool `json:"logEnabled"` | |||
LogLevel string `json:"logLevel"` | |||
APILoggingEnabled bool `json:"apiLoggingEnabled"` | |||
|
|||
WakuMetricsEnabled bool `json:"wakuMetricsEnabled"` | |||
WakuMetricsPort int `json:"wakuMetricsServerPort"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a full address
(<host>:<port>
) option would be better.
In theory you can run status-backend
in cloud (e.g. for community control node) and connect to metrics remotely, then we'd need to listen to 0.0.0.0
. Same needed when running in docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to include a README with some basic usage instructions, referencing https://github.com/waku-org/status-metrics.
Also, not sure why this repo not public?
Also, wouldn't it be better to store that docker compose directly in status-go
? Why store it as a separate repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repo can be made public
I think it makes more sense to keep it separate so if anyone wants to run metrics against a node they can just clone those files and not the entire status-go
repository. The other reason is to make it easy for people to push changes to the dashboards or make their own forks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, makes sense, let's keep it separate then 👍
But for sure we need to make it public then.
And, perhaps, move it to status-im
org? as it's for status-go
🤔 (not a big deal though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might make sense to have a fork in status-im in case waku/status team members use different dashboards
metrics/wakumetrics/metrics.go
Outdated
MessagesSentTotal = prometheus.NewCounterVec( | ||
prometheus.CounterOpts{ | ||
Name: "statusgo_waku_messages_sent_total", | ||
Help: "Frequency of Waku messages sent by this node", | ||
}, | ||
[]string{"publish_method"}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to display this in Grafana not as constantly growing value, but as derivative? Like the amount of messages sent in given time period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes grafana provides all sorts of aggregate and derivative functions to apply to the metric and visualize it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here, is it actually wakumetrics
? Not just any metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics in this file are specifically related to waku
wakuv2/waku.go
Outdated
@@ -1176,6 +1182,7 @@ func (w *Waku) Start() error { | |||
if err != nil { | |||
w.logger.Error("OnNewEnvelopes error", zap.Error(err)) | |||
} | |||
w.logger.Info("Got a missing message!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is Debug
or Warning
level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not mean to add this, probably a mistake when rebasing. Removed.
wakuv2/waku.go
Outdated
@@ -1176,6 +1182,7 @@ func (w *Waku) Start() error { | |||
if err != nil { | |||
w.logger.Error("OnNewEnvelopes error", zap.Error(err)) | |||
} | |||
w.logger.Info("Got a missing message!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a convention in Go to begin log messages with lower case letter.
w.logger.Info("Got a missing message!") | |
w.logger.Info("got a missing message!") |
v2protocol "github.com/waku-org/go-waku/waku/v2/protocol" | ||
) | ||
|
||
type ReceivedMessages struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose prometheus marshals these types to json?
Then we should explicitly define the json tags for each of these structs fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are just used for the function arguments in PushReceivedMetrics
, the data pushed to prometheus is a numerical value and string labels. Each metric defines its own keys for the labels.
metrics/wakumetrics/metrics.go
Outdated
) | ||
|
||
var ( | ||
MessagesSentTotal = prometheus.NewCounterVec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look good to have so many global (and public!) variables.
They don't seem to be used outside this package, so must at least not be exported.
In fact, they reason for them to be global is because they're used in a free function:
status-go/metrics/wakumetrics/metrics.go
Lines 125 to 127 in 5340c57
func RegisterMetrics() error { | |
collectors := []prometheus.Collector{ | |
MessagesSentTotal, |
But this one is only called from the Client
class.
So we could move all these global variables to the Client
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced with non-global variables
mobile/status.go
Outdated
@@ -55,6 +56,10 @@ import ( | |||
"github.com/status-im/status-go/signal" | |||
) | |||
|
|||
var ( | |||
metricsServer *metrics.Server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if it's possible to encapsulate this variable into GethStatusBackend
to avoid adding another global variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved metrics.Server
to GethStatusBackend
Replace telemetry with local metrics using prometheus client. Add parameters to InitializeApplication for enabling waku metrics over prometheus and specifying which port to use.
5340c57
to
2032df2
Compare
Replace telemetry with local metrics using prometheus. Add parameters to InitializeApplication for enabling waku metrics over prometheus and specifying which port to use.
This commit removes the telemetry functionality with a Prometheus client. Most of the metrics that were collected by telemetry now have their corresponding Prometheus gauges, counters, and histograms.
They still require a telemetry url to be set in order to be enabled. Additionally, the parameter
WakuMetricsEnabled
needs to be set as true in the request forInitializeApplication
in order to start Prometheus at port 9305 (can be changed usingWakuMetricsPort
).status-desktop
Dogfooding PR: status-im/status-desktop#17020