Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infrastructure monitoring #87

Open
rodecker opened this issue Jun 8, 2020 · 6 comments
Open

Infrastructure monitoring #87

rodecker opened this issue Jun 8, 2020 · 6 comments

Comments

@rodecker
Copy link
Member

rodecker commented Jun 8, 2020

Some kind of monitoring system that sends mails when ring infrastructure servers or services are down. Monitoring of hosts and services should be automatically configured when they are added to ansible.

@rodecker
Copy link
Member Author

rodecker commented Jun 8, 2020

Icinga, another nagios fork, or something else entirely?

@leoluk
Copy link
Contributor

leoluk commented Jun 8, 2020

Prometheus with Alertmanager :)

@isodude
Copy link
Contributor

isodude commented Jun 13, 2021

Telegraf + VictoriaMetrics was really nice to set up.
Either send Influx to Victoria or let Victoria fetch prometheus from Telegraf.

I also added MTR support to my Telegraf-fork which made it easy to get nice stats in grafana how hops are evolving over time. This could be useful for the Ring especially.

Let me know if it's of interest.

@leoluk
Copy link
Contributor

leoluk commented Jun 13, 2021

For monitoring (vs. telemetry), Prometheus, node_exporter and Alertmanager is hard to beat.

@isodude
Copy link
Contributor

isodude commented Jun 14, 2021

I tried node_exporter first, but the 'everything shall be run on a different port' theme did not sit well with me.

So how it works is that Telegraf, which btw has excellent support out of the box for most things and has support for executing custom binaries that exports different formats (influx, json, simple etc), exports data via a output plugin that exports in prometheus format. VictoriaMetrics pulls the data. You can still run Alertmanager as you would, or use their own https://docs.victoriametrics.com/vmalert.html.

At the same time you get the same features as Thanos with storage over time etc.

I did have a look and there's a fairly new victoriametrics available straight in the repo. I would need to compile a telegraf from my own fork if there should be MTR support however. I also made a bit better TLS client certificate support, which means you could use client certificates between all nodes for transporting data.

So in short node_exporter + Alertmanager is technically the same as telegraf + victoriametrics.

@isodude
Copy link
Contributor

isodude commented Sep 27, 2021

If people like running prometheus, maybe this is interesting? https://opensourcelibs.com/lib/network_exporter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants