Linux Administration: How to Install and use Graphite

For a quick and easy Graphite installation on Debian read my other post - Metrics visualisation and collection with Graphite, Grafana and python.

What is Graphite?

Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces.

Who should use Graphite?

Graphite is actually a bit of a niche application. Specifically, it is designed to handle numeric time-series data. For example, Graphite would be good at graphing stock prices because they are numbers that change over time. However Graphite is a complex system, and if you only have a few hundred distinct things you want to graph (stocks prices in the S&P 500) then Graphite is probably overkill. But if you need to graph a lot of different things (like dozens of performance metrics from thousands of servers) and you don't necessarily know the names of those things in advance (who wants to maintain such huge configuration?) then Graphite is for you.

How scalable is Graphite?

From a CPU perspective, Graphite scales horizontally on both the frontend and the backend, meaning you can simply add more machines to the mix to get more throughput. It is also fault tolerant in the sense that losing a backend machine will cause a minimal amount of data loss (whatever that machine had cached in memory) and will not disrupt the system if you have sufficient capacity remaining to handle the load.

From an I/O perspective, under load Graphite performs lots of tiny I/O operations on lots of different files very rapidly. This is because each distinct metric sent to Graphite is stored in its own database file, similar to how many tools (drraw, Cacti, Centreon, etc) built on top of RRD work. In fact, Graphite originally did use RRD for storage until fundamental limitations arose that required a new storage engine.

High volume (a few thousand distinct metrics updating minutely) pretty much requires a good RAID array. Graphite's backend caches incoming data if the disks cannot keep up with the large number of small write operations that occur (each data point is only a few bytes, but most disks cannot do more than a few thousand I/O operations per second, even if they are tiny). When this occurs, Graphite's database engine, whisper, allows carbon to write multiple data points at once, thus increasing overall throughput only at the cost of keeping excess data cached in memory until it can be written.

^-- from http://graphite.wikidot.com/faq

What I really like about Graphite is the fact that you can push data to it, instead of using a poller, like Cacti for example.

Here is a step-by-step guide on how to install and configure Graphite on Ubuntu Server:

File: gistfile1.sh ------------------ [root@server1 ~] apt-get update [root@server1 ~] apt-get upgrade [root@server1 ~] wget http://launchpad.net/graphite/0.9/0.9.9/+download/graphite-web-0.9.9.tar.gz [root@server1 ~] wget http://launchpad.net/graphite/0.9/0.9.9/+download/carbon-0.9.9.tar.gz [root@server1 ~] wget http://launchpad.net/graphite/0.9/0.9.9/+download/whisper-0.9.9.tar.gz [root@server1 ~] tar -zxvf graphite-web-0.9.9.tar.gz [root@server1 ~] tar -zxvf carbon-0.9.9.tar.gz [root@server1 ~] tar -zxvf whisper-0.9.9.tar.gz [root@server1 ~] mv graphite-web-0.9.9 graphite [root@server1 ~] mv carbon-0.9.9 carbon [root@server1 ~] mv whisper-0.9.9 whisper [root@server1 ~] rm carbon-0.9.9.tar.gz [root@server1 ~] rm graphite-web-0.9.9.tar.gz [root@server1 ~] rm whisper-0.9.9.tar.gz [root@server1 ~] apt-get install --assume-yes apache2 apache2-mpm-worker apache2-utils apache2.2-bin apache2.2-common libapr1 libaprutil1 libaprutil1-dbd-sqlite3 python3.1 libpython3.1 python3.1-minimal libapache2-mod-wsgi libaprutil1-ldap memcached python-cairo-dev python-django python-ldap python-memcache python-pysqlite2 sqlite3 erlang-os-mon erlang-snmp rabbitmq-server bzr expect ssh libapache2-mod-python python-setuptools [root@server1 ~] apt-get install build-essential python2.6-dev [root@server1 ~] easy_install zope.interface [root@server1 ~] easy_install twisted [root@server1 ~] easy_install txamqp [root@server1 ~] easy_install django-tagging

Now lets install WHISPER

Whisper is a fixed-size database, similar in design to RRD (round-robin-database). It provides fast, reliable storage of numeric data over time.

Time to install CARBON

Graphite is comprised of two components, a webapp frontend, and a backend (Carbon) storage application. Data collection agents connect to carbon and send their data, and carbon's job is to make that data available for real-time graphing immediately and try to get it stored on disk as fast as possible. Carbon is made of up three processes: carbon-agent.py, carbon-cache.py, and carbon-persister.py. The primary process is carbon-agent.py, which starts up the other two processes in a pipeline. Carbon-agent accepts connections and receives time series data in the appropriate format. This data is sent through the pipeline to carbon-cache, who stores the data in cache where data points are grouped by their associated metric. Carbon-cache constantly attempts to write the largest such group of data points down the pipeline to carbon-persister. Carbon-persister reads these data points and writes them to disk using Whisper. The reason carbon is split into three processes is actually because of Python's threading problems. Originally carbon was a single application where these distinct functions were performed by threads, but alas Python's GIL prevents multiple threads from actually running concurrently. Since the initial deployment of Graphite was done on a machine with lots of rather slow CPU's, we needed true concurrency for performance reasons. Thus it was split into three processes connected via pipes.

Configure CARBON

Graphite is built on fixed-size databases (see Whisper) so we have to configure in advance how much data we intend to store and at what level of precision. For instance you could store your data with 1-minute precision (meaning you will have one data point for each minute) for say 2 hours. Additionally you could store your data with 10-minute precision for 2 weeks, etc. The idea is that the storage cost is determined by the number of data points you want to store, the less fine your precision, the more time you can cover with fewer points.

Once you have picked your naming scheme you need to create a schema by creating/editing the /opt/graphite/conf/storage-schemas.conf file.

Let's say we want to store data with minutely precision for 30 days, then at 15 minute precision for 10 years. Here are the entries in the schemas file:

File: gistfile1.sh ------------------ [root@server1 ~] cd /opt/graphite/conf [root@server1 ~] cp carbon.conf.example carbon.conf [root@server1 ~] cp storage-schemas.conf.example storage-schemas.conf [root@server1 ~] vim storage-schemas.conf [server_load] priority = 100 pattern = ^servers\. retentions = 60:43200,900:350400

Basically, when carbon receives a metric, it determines where on the filesystem the whisper data file should be for that metric. If the data file does not exist, carbon knows it has to create it, but since whisper is a fixed size database, some parameters must be determined at the time of file creation (this is the reason we're making a schema). Carbon looks at the schemas file, and in order of priority (highest to lowest) looks for the first schema whose pattern matches the metric name. If no schema matches the default schema (2 hours of minutely data) is used. Once the appropriate schema is determined, carbon uses the retention configuration for the schema to create the whisper data file appropriately.

Now back to our schema entry. The server_load stanza is just a name for our schema, it doesn't really matter what you call it. The first parameter below that is priority, this is an integer (I usually just use 100) that tells carbon what order to evaluate the schemas in (highest to lowest). The purpose of priority is two-fold. First it is faster to test the more commonly used schemas first. Second, priorities provide a way to have different retention for a metric name that would have matched another schema. The pattern parameter is a regular expression that is used to match a new metric name to find what schema applies to it. In our example, the pattern will match any metric that starts with servers.. The retentions parameter is a little more complicated, here's how it works:

retentions is a comma separated list of retention configurations. Each retention configuration is of the form seconds_per_data_point:data_points_to_store. So in our example, the first retention configuration is 60 seconds per data point (so minutely data), and we want to store 43,200 of those (43,200 minutes is 30 days). The second retention configuration is 900 seconds per data point (15 minutes), and we want to store 350,400 of those (there are 350,400 15-minute intervals in 10 years).

^-- from http://graphite.wikidot.com/getting-your-data-into-graphite

Lets configure GRAPHITE (webapp)

Time to configure APACHE

File: gistfile1.sh ------------------ [root@server1 ~] cd ~/graphite/examples [root@server1 ~] cp example-graphite-vhost.conf /etc/apache2/sites-available/default [root@server1 ~] cp /opt/graphite/conf/graphite.wsgi.example /opt/graphite/conf/graphite.wsgi [root@server1 ~] vim /etc/apache2/sites-available/default # XXX You need to set this up! # Read http://code.google.com/p/modwsgi/wiki/ConfigurationDirectives#WSGISocketPrefix WSGISocketPrefix /etc/httpd/wsgi/ <VirtualHost *:80> ServerName graphite DocumentRoot "/opt/graphite/webapp" ErrorLog /opt/graphite/storage/log/webapp/error.log CustomLog /opt/graphite/storage/log/webapp/access.log common # I've found that an equal number of processes & threads tends # to show the best performance for Graphite (ymmv). WSGIDaemonProcess graphite processes=5 threads=5 display-name='%{GROUP}' inactivity-timeout=120 WSGIProcessGroup graphite WSGIApplicationGroup %{GLOBAL} WSGIImportScript /opt/graphite/conf/graphite.wsgi process-group=graphite application-group=%{GLOBAL} # XXX You will need to create this file! There is a graphite.wsgi.example # file in this directory that you can safely use, just copy it to graphite.wgsi WSGIScriptAlias / /opt/graphite/conf/graphite.wsgi Alias /content/ /opt/graphite/webapp/content/ <Location "/content/"> SetHandler None </Location> # XXX In order for the django admin site media to work you # must change @DJANGO_ROOT@ to be the path to your django # installation, which is probably something like: # /usr/lib/python2.6/site-packages/django Alias /media/ "@DJANGO_ROOT@/contrib/admin/media/" <Location "/media/"> SetHandler None </Location> # The graphite.wsgi file has to be accessible by apache. It won't # be visible to clients because of the DocumentRoot though. <Directory /opt/graphite/conf/> Order deny,allow Allow from all </Directory> </VirtualHost><virtualhost><directory conf="" graphite="" opt=""> </directory> </virtualhost> [root@server1 ~] mkdir /etc/httpd [root@server1 ~] mkdir /etc/httpd/wsgi [root@server1 ~] /etc/init.d/apache2 reload

Initial DATABASE CREATION

File: gistfile1.sh ------------------ [root@server1 ~] cd /opt/graphite/webapp/graphite/ [root@server1 ~] python manage.py syncdb [root@server1 ~] chown -R www-data:www-data /opt/graphite/storage/ [root@server1 ~] /etc/init.d/apache2 restart [root@server1 ~] cd /opt/graphite/webapp/graphite [root@server1 ~] cp local_settings.py.example local_settings.py

Time to START CARBON

Now that we have Graphite up and running you can fire up a browser and connect to it.

Let's SEND DATA

All graphite messages are of the following form:

metric_path value timestamp\n

So for example, "foo.bar.baz 42 74857843" where the last number is a UNIX epoch time.

To send some data in bash run:

VIEW RETENTION POLICY for a metric

To see what the actual retention policy for the metric that we've just created is, we can use whisper-info.py:

File: gistfile1.sh ------------------ [root@server1 ~] /usr/local/bin/whisper-info.py /opt/graphite/storage/whisper/servers/prod/server1/metric.wsp maxRetention: 315360000 xFilesFactor: 0.5 aggregationMethod: average fileSize: 4723240 Archive 0 retention: 2592000 secondsPerPoint: 60 points: 43200 size: 518400 offset: 40 Archive 1 retention: 315360000 secondsPerPoint: 900 points: 350400 size: 4204800 offset: 518440

CHANGE RETENTION POLICY for a metic

To manually change the retention policy to one datapoint per minute for a year:

QUERY Whisper

To see the historical data that is stored in the database execute:

File: gistfile1.sh ------------------ [root@server1 ~] /usr/local/bin/whisper-fetch.py /opt/graphite/storage/whisper/servers/prod/server1/metric.wsp ... 1328715480 3909105676.000000 1328715540 3910515089.000000 1328715600 3911885805.000000 1328715660 3913242008.000000 1328715720 3914590106.000000 ...

There you have it! Now you can graph almost anything and create custom dashboards using the web GUI.

Resources:

http://graphite.wikidot.com/

Pages

How to Install and use Graphite