Deploying Prometheus for monitoring and stats collection

Prometheus is a monitoring, alerting and statistics collector tool [1]. It provides a multi-dimensional data model (time series identified by metric name and key/value pairs) and a query capabilities similar to Graphite. The collection happens via a pull model over HTTP which makes it a good fit for microservices environment. As long as the service exposes metrics over RESTful API, Prometheus can scrape them, store them, query them and alert on them. For graphing and visualisation Prometheus integrates with Grafana and the latter can be used to create dashboards etc.

In this post I'll deploy the Prometheus server, the alerting module called Alertmanager, the Node Collector module which exports various low level server stats, Grafana as a front-end and exim4 for sending email alerts.

Since Prometheus is a Go binary, let's install the dependencies, build the server binary and make a docker container to run the service in:
In order to monitor the general health of a node (cpu, memory, uptime, etc) Prometheus needs to contact an HTTP endpoint to collect the information for that node. One way to do this is by using a Node exporter [2] - a simple RESTful API that returns various server statistics.
Now that we have a target to connect to let's write a simple config that will connect to the Node exporter and collect the data:
We can now browse to port 9090 to use the UI:

Monitoring a custom service is pretty easy. I'll use netcat to simulate HTTP endpoint listening on port 9999 at the / URL, returning "test_metric 1" and reconfigure Prometheus for it:
By default Prometheus pulls from the /metrics URL, but in this case I configured the service to return stats on the root URL.
We can see from above Prometheus GET-ing the / on regular intervals.

To query for the newly exposed metric run:

All this data is being collected and stored in a similar way to Graphite. We can use Grafana to create graphs, dashboards etc.:
Browse to port 3000, click on the Grafana logo, then click on "Data Sources" in the sidebar, "Add New", select "Prometheus" as the type. Set the appropriate Prometheus endpoint e.g. http://localhost:9090/


Now that we have a server and a custom service that we monitor and collect data for, let's configure alerting based on that data. I'll use the Alertmanager service [3] to send an email if the test_metric key returns anything but 1:
Browse to port 9093:


In order for Prometheus to send alerts to Alertmanager we need to create an alert rule, and add it to Prometheus config file, then restart Prometheus specifying the Alertmanager endpoint for the integration to happen. The config should look like the following:
With this Prometheus server is integrated with Alertmanager and ready to send alerts. To trigger an alert kill the netcat session to simulate a failure or change the returned value to something different than 1.



Aside from the Prometheus UI we can query for metrics directly using the API:
To enable sending emails from Alertmanager I installed a Docker container with exim4 in it:
If the service we want to monitor does not provide an API we can use probing over HTTP, HTTPS, TCP and ICMP with the Blackbox exporter [4].

Resources:

[1]. https://prometheus.io/docs/introduction/overview/
[2]. https://github.com/prometheus/node_exporter
[3]. https://github.com/prometheus/alertmanager
[4]. https://github.com/prometheus/blackbox_exporter