Puppet is a tool designed to manage the configuration of systems declaratively. The user describes system resources and their state, either using Puppet's declarative language or a Ruby DSL (domain-specific language). This information is stored in files called "Puppet manifests". Puppet discovers the system information via a utility called Facter, and compiles the Puppet manifests into a system-specific catalog containing resources and resource dependency, which are applied against the target systems. Any actions taken by Puppet are then reported.
This all works great for a relatively small number of servers. After about 3000 servers, a single Puppet Master starts to slow down, as most of the Puppet operations are CPU bound. In some cases spreading the requests from the Puppet Agents randomly in time helps, but with thousands of servers this might not be a good option, as it will take too much time for a change to propagate to all clients.
There are many ways of scaling the Puppet infrastructure, because scaling Puppet is a problem of scaling HTTP. One can use hardware or software load balancers like F5 BigIP, Zeus/Stingray, HAProxy, Pound etc. For the purpose of this writing I'll use Apache with mod_passenger, mod_proxy and separate dedicated Active/Passive Certificate Authority.
The infrastructure will consist of the following servers:
* Server puppetlb.example.com, that will be the dedicated LB, running Apache with mod_proxy, mod_proxy_balancer, mod_headers and mod_ssl, which will terminate the SSL traffic, perform a certificate validation for the Puppet Agents and forward request to a cluster of CA's and Puppet Masters, depending on the requests. There will be a passive LB, with both running keepalived with a floating VIP.
* Servers puppetca1.example.com and puppetca2.example.com (active/passive) that will be responsible for signing the certificate requests from the Puppet Agents and exporting the CA to be used by the load balancer for validation.
* Servers puppetmaster1.example.com and puppetmaster2.example.com are the Puppet Masters that will host all manifests. They need to share all catalogs, by either pulling from a git repository, rsyncing between each other or by any other way of shared storage.
It is important to understand that when your master is running behind an Apache proxy the proxy is the SSL endpoint. It does all the validation and authentication of the node and traffic between the proxy and the masters happens in clear text. The master knows the client has been authenticated because the proxy adds an HTTP header that says so (usually X-Client-Verify for Apache/Passenger).
1. If the client runs for the 1st time, it generates a Certificate Signing Request and a private key. The former is an x509 certificate that is self-signed.
2. The client connects to the master (at this time the client is not authenticated) and sends its CSR, it will also receives the CA certificate and the CRL in return.
3. The master stores locally the CSR.
4. The administrator checks the CSR and can eventually sign it (this process can be automated with autosigning).
5. The client is then waiting for his signed certificate, which the master ultimately sends.
6. All next communications will use this client certificate. Both the master and client will authenticate each others by virtue of sharing the same CA.
The infrastructure will look something like this:
Configuring the Load balancer
On puppetlb.example.com install Apache and enable the following modules:
Create a virtual host file that will define the load balancer:
Create the cert directory that will be mounted via NFS from the active CA server puppetca1.example.com later on.
Do not start Apache just yet, as it will complain about the missing SSL cert.
All this config does is define two sets of back-end workers that will be responsible for handling either the cert requests or delivering the compiled manifests from the Puppet Masters. In other words if a request from a Puppet agent comes to the LB that contains the "certificate" word in it's URL then mod_proxy will forward it to the puppetca1, for signing, etc. All other requests will be forwarded to the Puppet Masters, puppetmaster1 and puppetmaster2 in a round-robin fashion. If one of the Puppet Masters fails, mod_proxy will remove it from rotation. The option status=+H tells the front end that the second CA member is a hot standby and will not receive any requests until puppetca1 fails.
Configuring the Puppet CA nodes
On puppetca1 and puppetca2 install apache, mod_passenger, puppet and ruby. Since Puppet will be run by Apache Passenger make sure puppetmasterd does not start at boot, as it will use WeBrick by default, and conflict with Apache Passenger.
puppetmasterd will start by default, generating a CA with the name of the hostname - puppetca1. This is not what we want, as the cert name needs to have the load balancer name in it, because the certs will be validated on the load balancer. Make sure you stop puppetmasterd and change the puppet config file:
After the config change, run puppetmasterd to generate the new CA with the puppetlb.example.com name, that will match the hostname of the load balancer:
This will generate the new CA cert, private and public key in /var/lib/puppet/ssl with the name puppetlb.example.com, which is the hostname of the load balancer.
Now let's export the CA directory via NFS, so that the load balancer can mount it and use it to validate the puppet agents:
Time to enable the Apache modules:
Edit the passenger config file:
Let's define the Apache virtual host that will do all the work. Notice that the CA does not use SSL anymore, as it is terminated on the load balancer:
Configure the rack file that Passenger uses and restart Apache:
On the load balancer puppetlb mount the exported SSL directory from puppetca1, make it persistent in fstab and start Apache:
Repeat the above steps for puppetca2.example.com, ensuring that /var/lib/puppet/ssl is being synced from puppetca1 (using rsync or whatever other method you desire). Also it's not a bad idea to have a separate IP for the NFS export, that can float between puppetca1 and puppetca2 using heartbeat, or some other implementation of VRRP, so that the load balancer can re-mount the SSL directory in the event of the active CA server going offline. In fact I use NFS here just as a quick hack, in a production environment you might consider using iSCSI or a HA NAS. Also it's important to note that you don't really have to export the entire CA SSL dir from the CA to the LB. It's sufficient to just copy the CA cert, but not the private key, so the load balancer can verify the authenticity of connecting agents. With SSL certificates, the server doesn't have a copy of the client's public key. So we need some other way to verify the client is who they say they are. This is where the third party comes into the picture. The third party (the CA) uses it's private key to digitally sign the public key of the client. This is a certificate. The CA's private key is not transferred to anyone, but the signed public key is transfered back. When the client connects to a sever, it presents it's signed public key. The server uses the public key of the CA (NOT the private key) to verify the public key of the client is actually signed by the CA. At this point trust is established.
Configuring the Puppet Master nodes
Configuring the Puppet Master nodes is pretty much identical to how we configured the Puppet CA, with the exception that we don't have to deal with any certificates and exports, etc.
Configuring the Puppet Agent nodes
Setting up the puppet agents requires that we install the puppet package.
The server = puppetlb.example.com points the agent to the load balancer. To request a CERT run:
On the puppetca1 list and sign the cert:
To verify that all is working, run tcpdump on all servers:
Then look for the validated headers: