Linux Administration: Deploying Prometheus for monitoring and stats collection

Prometheus is a monitoring, alerting and statistics collector tool [1]. It provides a multi-dimensional data model (time series identified by metric name and key/value pairs) and a query capabilities similar to Graphite. The collection happens via a pull model over HTTP which makes it a good fit for microservices environment. As long as the service exposes metrics over RESTful API, Prometheus can scrape them, store them, query them and alert on them. For graphing and visualisation Prometheus integrates with Grafana and the latter can be used to create dashboards etc.

In this post I'll deploy the Prometheus server, the alerting module called Alertmanager, the Node Collector module which exports various low level server stats, Grafana as a front-end and exim4 for sending email alerts.

Since Prometheus is a Go binary, let's install the dependencies, build the server binary and make a docker container to run the service in:

File: gistfile1.txt ------------------- [prometheus-server]$ add-apt-repository ppa:ubuntu-lxc/lxd-stable [prometheus-server]$ apt-get update && apt-get install golang [prometheus-server]$ export GOPATH=/usr/lib/go; export GOROOT=/usr/lib/go/ [prometheus-server]$ mkdir -p $GOPATH/src/github.com/prometheus [prometheus-server]$ cd $GOPATH/src/github.com/prometheus [prometheus-server]$ git clone https://github.com/prometheus/prometheus.git [prometheus-server]$ cd prometheus [prometheus-server]$ make build [prometheus-server]$ make docker [prometheus-server]$ docker images | grep prometheus prometheus master 8f24da86430e 1 minute ago 43.21 MB [prometheus-server]$

In order to monitor the general health of a node (cpu, memory, uptime, etc) Prometheus needs to contact an HTTP endpoint to collect the information for that node. One way to do this is by using a Node exporter [2] - a simple RESTful API that returns various server statistics.

File: gistfile1.txt ------------------- [prometheus-server]$ cd $GOPATH/src/github.com/prometheus [prometheus-server]$ git clone https://github.com/prometheus/node_exporter.git [prometheus-server]$ cd node_exporter [prometheus-server]$ make build [prometheus-server]$ make docker [prometheus-server]$ docker images | grep node-exporter node-exporter master 40a33f49d66f 1 minute ago 17 MB [prometheus-server]$ [prometheus-server]$ docker run -d -p 9100:9100 --net="host" node-exporter:master [prometheus-server]$ docker ps | grep node-exporter 8629a88bba36 node-exporter:master "/bin/node_exporter" 1 minutes ago Up 1 minutes desperate_mcclintock [prometheus-server]$ [prometheus-server]$ ss -o state listening '( sport = :9100 )' Netid Recv-Q Send-Q Local Address:Port Peer Address:Port tcp 0 128 :::9100 :::* [prometheus-server]$ [prometheus-server]$ curl localhost:9100/metrics ... # HELP node_disk_io_time_ms Milliseconds spent doing I/Os. # TYPE node_disk_io_time_ms counter node_disk_io_time_ms{device="dm-0"} 31188 node_disk_io_time_ms{device="dm-1"} 964 node_disk_io_time_ms{device="dm-10"} 788 ... [prometheus-server]$

Now that we have a target to connect to let's write a simple config that will connect to the Node exporter and collect the data:

File: gistfile1.txt ------------------- [prometheus-server]$ mkdir -p /etc/prometheus/prometheus-data [prometheus-server]$ vim prometheus.yml global: scrape_interval: 30s evaluation_interval: 5s scrape_configs: - job_name: prometheus target_groups: - targets: ['api-01.us-east-1.example.com:9090'] - job_name: node target_groups: - targets: ['api-01.us-east-1.example.com:9100'] [prometheus-server]$ [prometheus-server]$ docker run -d -p 9090:9090 -v /etc/prometheus/prometheus-data/prometheus.yml:/etc/prometheus/prometheus.yml prometheus:master -config.file=/etc/prometheus/prometheus.yml [prometheus-server]$ docker ps | grep prometheus 83a773b218db prometheus:master "/bin/prometheus -con" 1 minutes ago Up 1 minutes 0.0.0.0:9090->9090/tcp jovial_goldstine [prometheus-server]$

We can now browse to port 9090 to use the UI:

Monitoring a custom service is pretty easy. I'll use netcat to simulate HTTP endpoint listening on port 9999 at the / URL, returning "test_metric 1" and reconfigure Prometheus for it:

File: gistfile1.txt ------------------- [prometheus-server]$ cat prometheus.yml global: scrape_interval: 30s evaluation_interval: 5s scrape_configs: - job_name: prometheus target_groups: - targets: ['api-01.us-east-1.example.com:9090'] - job_name: node target_groups: - targets: ['api-01.us-east-1.example.com:9100'] - job_name: test metrics_path: / target_groups: - targets: ['api-01.us-east-1.example.com:9999'] labels: service_name: test_service [prometheus-server]$ docker kill 83a773b218db [prometheus-server]$ docker run -d -p 9090:9090 -v /etc/prometheus/prometheus-data/prometheus.yml:/etc/prometheus/prometheus.yml prometheus:master -config.file=/etc/prometheus/prometheus.yml [prometheus-server]$ echo "test_metric 1" | nc -k -l 9999 GET / HTTP/1.1 User-Agent: curl/7.35.0 Host: localhost:9999 Accept: */* ... GET / HTTP/1.1 User-Agent: curl/7.35.0 Host: localhost:9999 Accept: */* ^C [prometheus-server]$

By default Prometheus pulls from the /metrics URL, but in this case I configured the service to return stats on the root URL.
We can see from above Prometheus GET-ing the / on regular intervals.

To query for the newly exposed metric run:

All this data is being collected and stored in a similar way to Graphite. We can use Grafana to create graphs, dashboards etc.:

File: gistfile1.txt ------------------- [prometheus-server]$ curl -L -O https://grafanarel.s3.amazonaws.com/builds/grafana-2.5.0.linux-x64.tar.gz [prometheus-server]$ tar zxfv grafana-2.5.0.linux-x64.tar.gz && cd grafana-2.5.0 [prometheus-server]$ nohup ./bin/grafana-server web &

Browse to port 3000, click on the Grafana logo, then click on "Data Sources" in the sidebar, "Add New", select "Prometheus" as the type. Set the appropriate Prometheus endpoint e.g. http://localhost:9090/

Now that we have a server and a custom service that we monitor and collect data for, let's configure alerting based on that data. I'll use the Alertmanager service [3] to send an email if the test_metric key returns anything but 1:

File: gistfile1.txt ------------------- [prometheus-server]$ cd $GOPATH/src/github.com/prometheus [prometheus-server]$ git clone https://github.com/prometheus/alertmanager.git [prometheus-server]$ cd alertmanager [prometheus-server]$ make build [prometheus-server]$ make docker [prometheus-server]$ docker images | grep alertmanager alertmanager master b05b2acf17eb 1 minute ago 16.84 MB [prometheus-server]$ [prometheus-server]$ cat /etc/prometheus/prometheus-data/alertmanager.yml global: smtp_smarthost: 'localhost:4444' smtp_from: 'alertmanager@api-01.us-east-1.example.com' route: group_by: ['alertname', 'cluster', 'service'] group_wait: 30s group_interval: 1m repeat_interval: 5m receiver: team-mg-email routes: - match_re: service_name: ^(test_service)$ receiver: team-mg-email inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'cluster', 'service'] receivers: - name: 'team-mg-email' email_configs: - to: 'yourname@mailgun.com' require_tls: false [prometheus-server]$ [prometheus-server]$ docker run -d -p 9093:9093 -v /etc/prometheus/prometheus-data/alertmanager.yml:/etc/prometheus/alertmanager.yml alertmanager:master -config.file=/etc/prometheus/alertmanager.yml [prometheus-server]$ docker ps | grep alertmanager c8e317f31c32 alertmanager:master "/bin/alertmanager -c" 1 minute ago Up 1 minute 0.0.0.0:9093->9093/tcp sharp_thompson [prometheus-server]$ ss -o state listening '( sport = :9093 )' Netid Recv-Q Send-Q Local Address:Port Peer Address:Port tcp 0 128 :::9093 :::* [prometheus-server]$

Browse to port 9093:

In order for Prometheus to send alerts to Alertmanager we need to create an alert rule, and add it to Prometheus config file, then restart Prometheus specifying the Alertmanager endpoint for the integration to happen. The config should look like the following:

File: gistfile1.txt ------------------- [prometheus-server]$ cat /etc/prometheus/prometheus-data/test_service.rules ALERT TestServiceDown IF up{job="test"} != 1 FOR 5s LABELS { service_name = "test_service" } ANNOTATIONS { summary = "Instance {{ $labels.instance }} down", description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 seconds.", } [prometheus-server]$ cat /etc/prometheus/prometheus-data/prometheus.yml global: scrape_interval: 30s evaluation_interval: 5s rule_files: - "/etc/prometheus/test_service.rules" scrape_configs: - job_name: prometheus target_groups: - targets: ['api-01.us-east-1.example.com:9090'] - job_name: node target_groups: - targets: ['api-01.us-east-1.example.com:9100'] - job_name: test metrics_path: / target_groups: - targets: ['api-01.us-east-1.example.com:9999'] labels: service_name: test_service [prometheus-server]$ docker kill 83a773b218db [prometheus-server]$ docker run -d -p 9090:9090 -v /etc/prometheus/prometheus-data/prometheus.yml:/etc/prometheus/prometheus.yml -v /etc/prometheus/prometheus-data/alertmanager.conf:/etc/prometheus/alertmanager.conf -v /etc/prometheus/prometheus-data/test_service.rules:/etc/prometheus/test_service.rules prometheus:master -config.file=/etc/prometheus/prometheus.yml -alertmanager.url=http://localhost:9093 [prometheus-server]$

With this Prometheus server is integrated with Alertmanager and ready to send alerts. To trigger an alert kill the netcat session to simulate a failure or change the returned value to something different than 1.

Aside from the Prometheus UI we can query for metrics directly using the API:

File: gistfile1.txt ------------------- [prometheus-server]$ curl -s 'http://localhost:9090/api/v1/query?query=test_metric' | python -mjson.tool { "data": { "result": [ { "metric": { "__name__": "test_metric", "instance": "api-01.us-east-1.example.com:9999", "job": "test", "service_name": "test_service" }, "value": [ 1465933958.146, "1" ] } ], "resultType": "vector" }, "status": "success" } [prometheus-server]$

To enable sending emails from Alertmanager I installed a Docker container with exim4 in it:

File: gistfile1.txt ------------------- [prometheus-server]$ docker run -d -p 4444:25 -v /tmp/exim:/var/spool/exim4 -e PRIMARY_HOST=us-east-1.example.com -e ALLOWED_HOSTS="10.1.0.0/16" elsdoerfer/exim-sender [prometheus-server]$ docker ps | grep exim f98a1e7dc9cf elsdoerfer/exim-sender "/exim" 1 minutes ago Up 1 minute 0.0.0.0:4444->25/tcp elated_franklin [prometheus-server]$

If the service we want to monitor does not provide an API we can use probing over HTTP, HTTPS, TCP and ICMP with the Blackbox exporter [4].

Resources:

[1]. https://prometheus.io/docs/introduction/overview/
[2]. https://github.com/prometheus/node_exporter
[3]. https://github.com/prometheus/alertmanager
[4]. https://github.com/prometheus/blackbox_exporter

Pages

Deploying Prometheus for monitoring and stats collection