Prometheus-alertmanager and graphana (especially graphana!) seem a bit too involved for monitoring my homelab (prometheus itself is fine: it does collect a lot of statistics I don’t care about, but it doesn’t require configuration so it doesn’t bother me).
Do you know of simpler alternatives?
My goals are relatively simple:
- get a notification when any systemd service fails
- get a notification if there is not much space left on a disk
- get a notification if one of the above can’t be determined (eg. server down, config error, …)
Seeing graphs with basic system metrics (eg. cpu/ram usage) would be nice, but it’s not super-important.
I am a dev so writing a script that checks for whatever I need is way simpler than learning/writing/testing yaml configuration (in fact, I was about to write a script to send heartbeats to something like Uptime Kuma or Tianji before I thought of asking you for a nicer solution).
The easiest solution I found and use is Beszel.
https://github.com/henrygd/beszel
Just a hub with the most important stats and some simple agents on the servers.
ICINGA/NAGIOS? you can even feed data already collected by Prometheus to it if you want.
Have you played around with Grafana? It really is quite simple if you have prometheus already working.
For a home lab environment you dont even need to use prometheus-alertmanager. Grafana can handle alerts as well.
Grafana also has hundreds of pre-made dashboards you can import. Node monitoring is quite straightforward.
Assuming you have prometheus good to go, all you need to do is go to Grafana - Datasources, create a new datasource, point to your prometheus instance.
Then you can import the dashboards you want.
Now you can setup your alerts - you can use SMTP, telegram, slack among others for your notifications.
SNMP does what you want. You just need a good monitoring solution that’s not as involved as Prometheus+grafana (I feel you, I’ve been there)
I really enjoy PRTG, but it’s way too expensive for a home lab, still throwing it out there if you feel like you have money to burn.
I hear good word about libreNMS, it’s next on my list when my PRTG licence runs out.
Be warned that monitoring is ultimately a fickle thing; what you don’t write in yaml config for grafana, you get to dig through obscure SNMP libs to find out (though I find that’s easier for me, ymmv) for other tools.
I recommend against: nagios (I like it but if you hate Prometheus it’s definitely not for you), checkmk (throw checkmk into the sun please it just fucking sucks), cacti (NO!), solar winds (why?)
if you feel like you want to become a datacenter admin: zabbix scales very very well, both in performance and ease of admin against hundreds of servers, but it’s overkill for a home lab, and it can get you lost in configs for hours.
Edit: found better resources
https://linuxhandbook.com/syslog-guide/
https://github.com/linuxserver/docker-syslog-ng
That should be a good place to start. Syslog will do what you want.
Syslog is considerable overkill for home lab monitoring.
Is there a self hosted OpenTelemetry consumer?
I’m currently using InfluxDB + Telegraf + Grafana combination to monitor Linux systems and k3s pods. It’s basically same as Prometheus, but InfluxDB uses push model, which makes it easier to develop tools for collecting custom time series data.
For alerts and dashboards, I think Grafana is the simplest and most hassle free solution available at the moment.