Tag Archives: Ubuntu

Monitoring Dæmons with CFEngine 3

I’ve been looking for a nice, simple method for verifying that all key services and dæmons are running on my UNIX servers. It’s pretty rare for a service to die, but, when it does, I want it restarted as soon as possible and I want to be notified about it! I’ve looked at process supervisors like daemontools and runit that are designed to run continuously and monitor the dæmons they start, but they tend to require a little more effort to maintain than I like. Normally, the sysadmin has to write the start-up script that sets up the environment and starts the dæmon, but it also must somehow coerce the dæmon to not do it’s normal double-fork and disassociate from it’s parent process and foreground terminal. I mainly just want a way to check for a process that’s not running and run the appropriate command to restart it.

CFEngine 3 seems to be a little closer to what I want and after reading through the excellent Learning CFEngine 3 book from O’Reilly, I think I’ve finally figured out the right recipe. All I want to do is to specify a process to look for and, if that process is not running, to specify a command to run that will restart the process. I would also like a report if a process ever needs to be restarted since that normally represents an abnormal event. Here’s the basic configuration I have to monitor a few services:

  any::
    "services[ssh][name]"           string => "OpenSSH";
    "services[ssh][process]"        string => "/usr/sbin/sshd";
    "services[ssh][restart]"        string => "/usr/sbin/invoke-rc.d ssh restart";

  web|dns::
    "services[ldap][name]"          string => "OpenLDAP";
    "services[ldap][process]"       string => "/usr/sbin/slapd";
    "services[ldap][restart]"       string => "/usr/sbin/invoke-rc.d slapd restart";
    "services[bind][name]"          string => "BIND";
    "services[bind][process]"       string => "/usr/sbin/named";
    "services[bind][restart]"       string => "/usr/sbin/invoke-rc.d bind9 restart";

  web::
    "services[apache][name]"        string => "Apache";
    "services[apache][process]"     string => "/usr/sbin/apache2";
    "services[apache][restart]"     string => "/usr/sbin/invoke-rc.d apache2 restart";

And that’s it! The above says that everyone must be running OpenSSH, servers web and dns must be running LDAP and BIND, and server web must be running Apache.  It also gives the name of the process to look for and the command necessary to restart the process if it’s not running.  I just repeat the same three lines for each service that I want to monitor and place the correct classes in front to select which servers will run those services.  Here’s the full file to run this:

body common control
{
    bundlesequence => { "services" };
    inputs => { "cfengine_stdlib.cf" };
}

bundle agent services
{
vars:
  any::
    "services[ssh][name]"           string => "OpenSSH";
    "services[ssh][process]"        string => "/usr/sbin/sshd";
    "services[ssh][restart]"        string => "/usr/sbin/invoke-rc.d ssh restart";

  web|dns::
    "services[ldap][name]"          string => "OpenLDAP";
    "services[ldap][process]"       string => "/usr/sbin/slapd";
    "services[ldap][restart]"       string => "/usr/sbin/invoke-rc.d slapd restart";
    "services[bind][name]"          string => "BIND";
    "services[bind][process]"       string => "/usr/sbin/named";
    "services[bind][restart]"       string => "/usr/sbin/invoke-rc.d bind9 restart";

  web::
    "services[apache][name]"        string => "Apache";
    "services[apache][process]"     string => "/usr/sbin/apache2";
    "services[apache][restart]"     string => "/usr/sbin/invoke-rc.d apache2 restart";

  any::
    "services" slist => getindices("services");

processes:
    "$(services[$(services)][process])"
        restart_class => "service_$(services)_restart";

commands:
    "$(services[$(services)][restart])"
        classes => if_notkept("service_$(services)_failed"),
        ifvarclass => "service_$(services)_restart";

reports:
    "$(services[$(services)][name]) is not running, restarting..."
        ifvarclass => "service_$(services)_restart";

    "$(services[$(services)][name]) failed to start!"
        ifvarclass => "service_$(services)_failed";
}

All the configuration of each service takes place entirely in the vars: section.  The processes: section takes care of searching for each process from the services array and declaring a class if the process was not found.  The commands: section runs the appropriate command to restart the service if the corresponding class was set by processes:.  Normally, CFEngine is silent, but the reports: section will generate output that goes into an email and log file if the process needed to be restarted and if there were any errors restarting the process.

This solution isn’t perfect, unlike a real process supervisor, CFEngine does not get immediate notification when a process dies, much less knowing the cause of death, but it does offer reasonable response time.  My example also doesn’t handle any rate limiting or limiting how many attempts there are to restart a dæmon, but CFEngine does have some amount of built in rate limiting.  Overall, I’ve found this solution simple to maintain and scale, and it’s now live and running on my servers.

Making System Administration Easy for the UNIX Sysadmin

I think it’s about time I start collecting my random thoughts and arranging them into something useful.  I’ve been collecting various scripts, configuration templates, and procedures I’ve written into an internal Wiki for my own benefit, but I’d love to publish it in blog format and get some feedback.  To start off this, I’m planning on starting a series on making system administration easier for the UNIX sysadmin.  My first post will be on automating the installation of a Linux distribution.

I’ve been working on making it easier to deploy additional Linux and Windows workstations with a goal of treating them as disposable machines.  After seeing what my users can do to their machines in about 5 minutes, I’m thinking just replacing it may be better for both of us.  Other topics will include automating Windows installation, deploying software updates, and managing system configuration.  Later on I’d like to post some guides to setting up LDAP, Samba, NFS, and Kerberos.  There are also some less common topics I can go over like setting up OpenAFS, deploying WPA2 Enterprise with FreeRadius, and managing a private Public-Key Infrastructure within an organization without shelling out money or relying on self-signed certificates.  I also have some experience with a few different backup systems and have done two bare-metal restorations plus a number of selective restores when a user decides to experiment a little.

If you have any requests or recommendations on topics I should cover, please leave a comment below.