Introduction to prometheus monitoring

Prometheus is a monitoring system originally
developed by SoundCloud. Beforehand, I was playing around with
Riemann.io, because of the book «The art of monitoring».
I’ve set up riemann to run alongside our apps, but in the end I had to kill it
because it took too much memory. Furthermore, the docker image was huge, well, as
common with java apps. So I looked around for a lighter alternative and found
prometheus. The main difference, in my opinion is that riemann uses a
push approach, where all apps push data to riemann, which then in turn
analyzed the inflow of data, checks for inconsistencies, and sends out
notification if something is wrong (eg, no data coming from certain
sources, error messages received etc.). Prometheus, on the other
hand, has a pull approach, that is prometheus checks each service with
a «health» check every X seconds.

Overview

img

Prometheus application: Prometheus is the main application. It’s purpose is to query all services, check their
health status (or other), apply alert rules, and notify the alertmanager if someone
should be notified. Furthermore, you can query the database and create graphs.

Alertmanager: The alertmanager executes the notifications. It also has a mechanism to supress notifications
for a time period if they have the same source, so you don’t get spammed with the same alert.

Exporters: The exporters are interfaces to other programs to be monitored. There are, for instance,
exporters for mysql, consul, generic programming languages, as well as the blackbox
exporter. The blackbox exporter queries services via http or tcp queries and offers an
endpoint which is then periodically queried by prometheus. This is the exporter we will
use here in this demo.

So the basic workflow is the following:
Prometheus scrapes periodically the services to be observed. This can be directly an application
offering a health endpoint or an exporter. The results of these scrapes are run through the
alert rules. If one of the rules catches, a notification is called to the alertmanager. The
alertmanager then decides also according to rules who to contact and how (i.e., email, message).

Monitor a web application

Ok, so far the theory. Now we build a small example how you would monitor a web application. We
suppose you don’t have a health endpoint, so we need to simply query it by it’s web address.

Let’s start with the blackbox exporter, which creates a health endpoint for the web application.
In it’s simplest form you can use the examples provided, but most likely you want some
modifications. So we create a new `blackbox.yml` for a https connection to example.com:

    modules:
      https_our_webservice:
        prober: http
        timeout: 5s
        http:
          method: GET
          valid_status_codes: [200]  # Defaults to 2xx
          headers:
            Host: example.com
            Accept-Language: en-US
          no_follow_redirects: false
          fail_if_ssl: false
          fail_if_not_ssl: true
          fail_if_not_matches_regexp:
          - "Example.com"

We run our docker container with a volume sharing our blackbox.yml file:

docker run -d --name blackbox \
       --read-only \
       -p 9115:9115 \
       -v $(pwd)/blackbox_exporter/blackbox.yml:/etc/blackbox_exporter/config.yml \
       prom/blackbox-exporter

So our blackbox exporter is now running on port 9115, you can try it by typing http://localhost:9115.

Next we configure the prometheus instance to peridically query our blackbox exporter and
apply rules to the results. To do this, we use the following `prometheus.yml`:

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
rule_files:
  - "alert.rules"
scrape_configs:
  - job_name: "web-service"
    scrape_interval: "15s"
    scheme: "http"
    metrics_path: "/probe"
    params:
      module: ["https_our_webservice"]
      target: ["example.com"]
    static_configs:
      - targets: ['prometheus-blackbox:9115']

Start the prometheus container:

docker volume create --name prometheus-data
docker run -d --name prometheus \
       -v prometheus-data:/prometheus \
       -p 9090:9090 \
       -v $(pwd)/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
       -v $(pwd)/prometheus/alert.rules:/etc/prometheus/alert.rules \
       prom/prometheus:v1.0.1 \
       -config.file=/etc/prometheus/prometheus.yml \
       -alertmanager.url=http://prometheus-alert:9093 \
       -storage.local.memory-chunks=100000 \

If you go now to http://localhost:9090, you should be able to see in the sub-menu Status->Targets
the blackbox exporter under the name «web-service» with state «UP».

Finally, we also need the alertmanager, so we actually receive a notification if our
web-service is down. For that we need to define how we shall be notified, here an example:

global:
  # The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'smtp.whateveryoursmtpserveris.com:587'
  smtp_from: 'youremail@yourhost.com'
  smtp_auth_username: 'youremail@yourhost.com'
  smtp_auth_password: 'password'

route:
  # A default receiver
  receiver: team

receivers:
  - name: 'team'
    email_configs:
      - to: 'error@migadu.com'

Start also the alertmanager:

docker volume create --name prometheus-alert-data
docker run -d --name prometheus-alert \
       -p 9093:9093 \
       -v prometheus-alert-data:/alertmanager \
       -v $(pwd)/alertmanager.yml:/etc/alertmanager/alertmanager.yml \
       prom/alertmanager \
       -config.file=/etc/alertmanager/alertmanager.yml

In case example.com goes down, you will be notified. If you have any comments or suggestions
please write me an email.