-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add prometheus format status output #430
Conversation
Thanks for doing this. Prometheus style metrics is far more useful than improvements on analyze like #348. Can you make this an HTTP endpoint, so prometheus can scrape the metrics? We would be able to make dashboards easily from many plotting machines connected over shared network space then. |
@drewwells I could, but I'd like to hear from @ericaltendorf that he wants that in his plotman :). Having an HTTP Server running is something that you must actively want to have in your program, in my opinion. And like I wrote, you'll probably want to measure other stuff on your plotting machine (CPU/RAM/Temps/Disk), so you'll probably have the node_exporter running on there anyways. And node_exporter can easily just pick up those metrics as well. |
I really like the idea of having prometheus metrics. However, it would be nice if there were a set of aggregate statistics, ie. avg duration per phase, total started/completed, etc. I suppose that could be calculated via promQL, I just wouldn't want to keep high retention on metrics w/ plot IDs as labels due to cardinality concerns. |
I'm on board with the idea and expect this to get merged. I also apologize for the delay here. I do appreciate the perspective that an HTTP server maybe doesn't belong here. I suspect it doesn't at this point, unless there is some particular need for it to be integrated. That said, I have started some TUI rework that will be using anyio (on asyncio, at least for now) so... I haven't run prometheus and while I am curious, I don't expect to get to it now. Could people that have used this code and verified it works please provide an approving review? I believe anyone can do this from the 'Files changed' tab of this PR. I will still do my own review reading the code. The code seems pretty straightforward and non-intrusive. I do wonder if it should be grouped with status though? I believe it is roughly outputting status but in a different format? (yeah, maybe somewhat different data as well but, mostly thinking about the concept here) We could do something like shifting the present |
Hey, thanks for getting around to this. I realize this projects gets a lot of attention as many people are using it, and you are doing this in your spare time :) Do I have to do something about that failing check? I am a bit unexperienced in open-source work, sorry. |
@dylanzr I agree and would also love aggregated metrics, however as I understand the code, plotman is stateless and always just a snapshot of the current reality. So once a plot passes from phase 1 to phase 2, plotman doesn't know how long it took for phase 1. You could do this by parsing the logfiles, but I think that should be another PR. Something like |
Just ignore the failing coverage check for now. It is not required yet because I haven't gotten around to achieving sufficient test coverage to mandate it of others. Maybe 'soon'... Sorry for the confusion. All of the progress information that plotman gets is based on parsing the log files. The If anyone could opine on how this PR and #549 relate, that would be great. |
#549 can be used to achieve the same thing as this PR by writing a bash script that parses the JSON. I think have the status output as JSON is still a very useful thing to have, as you can parse the output easier than But they should remain seperate PRs in my opinion. |
We wouldn't need host level metrics, there's many solutions already https://github.com/prometheus/node_exporter/.
Correct, you publish raw data to prometheus and create aggregate view from it.
I would personally close #549. We will get a lot more value from prometheus style metrics than custom ones that can not be consumed by standard metric collectors. HTTP is the preferred way to collect prometheus metrics. If we do not add HTTP support in this PR, it really should be done in a separate one. Without HTTP, we will require a separate exporter process just to take the cli output and send it to prometheus server. As an example, geth supports a metrics HTTP endpoint https://geth.ethereum.org/docs/interface/metrics |
As for HTTP support, I'm thinking similar to |
@drewwells you misunderstood me. The node_exporter will almost certainly already be running on any plotting machine being monitored by prometheus. |
So we just need a job to update the text file. Is this going to run as a daemon to do that or it needs an external job runner? |
An external job runner of your choice :) bash while true, crontab, systemd... there are many to choose from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this PR using my local prometheus + node_exporter setup and can confirm that it's working as desired 👍🏽
Alrighty. The PRs stand as separate things. HTTP server shouldn't be needed, and wouldn't be added 'soon' anyways. Maybe one grows later, but that would be later after #299 at least and would need significant justification I think. It does strike me as odd that every last monitored activity on a server would have to provide its own HTTP server.. But hey, I haven't gotten into this area yet, so what do I know about normal. So, could you go ahead and do an catch up here from |
I took the liberty of doing the catch up merge and adding type hints. I hope you don't mind. |
Thanks for the work here. Perhaps at some point I'll find time to setup prometheus for myself, but in the mean time it appears from all the attention here that you've already got a pack of people using this. |
Thank you for the additions and the merge :) |
@florianweyandt1337 Hi, could you share the grafana template? |
@tedzhang2891 Unfortunately no, sorry. I moved to madmax and the dashboard didn't really make sense any more. So I threw it out. |
I just today started fiddling with this so it's really rough, but hey. Also, it uses an extra field of major + minor/10. but anyways. |
I monitor my plotting machines with prometheus/grafana. This will add the subcommand
plotman prometheus
which will output the current status of all plots in a prometheus readable format with the numeric metrics als values and the alpha-numeric information as labels. It looks something like this:
This information can then be exported, either by a webserver or by a node-exporter running on the machine (thats how I have set it up) like this: