Have you ever needed to put a critical website online? Ensure that it’s up & running 24/7? If you did, you know that it can be a real pain to check that everything is ok, that all services are running, that no process is eating too much resources (CPU / memory).
Here at Synbioz we have to ensure services reliability for our customers. There are many ways to do that, you can write your own shell scripts, play with some crontabs, send email on failure, … But it is kind of difficult to write effective scripts, ensure it’s working well and more over, most of the time your homemade scripts will not be portable and will only work with a specific application. So what is the best way to handle this?
Here comes God which is a monitoring framework you can rely on to keep your processes and tasks running well. God is written in Ruby and aims to be a simple, powerful and flexible way to write monitoring tasks.
Before going deeper into God, you must know that it will only work on Unix-like systems. Sorry Windows users but hey I know you never ever want to deploy a production app on a Windows server…
God config files are written in Ruby, so you can do basically everything Ruby allows you to do, and it’s a lot of stuff.
God features are:
You must first start by installing God on your system, I mean the production server:
$ sudo gem install god
or add it to your Gemfile like so:
gem "god"
You can now create a God configuration file for the deamon you want to monitor:
$ touch config/unicorn.god
Naming config file with .god extension is a convention but this file is in fact a plain Ruby file.
RAILS_ROOT = File.dirname(File.dirname(__FILE__))
God.watch do |w|
pid_file = File.join(RAILS_ROOT, "tmp/pids/unicorn.pid")
w.name = "unicorn"
w.dir = RAILS_ROOT
w.interval = 60.seconds
w.start = "unicorn -c #{RAILS_ROOT}/config/unicorn.rb -D"
w.stop = "kill -s QUIT $(cat #{pid_file})"
w.restart = "kill -s HUP $(cat #{pid_file})"
w.start_grace = 20.seconds
w.restart_grace = 20.seconds
w.pid_file = pid_file
w.uid = 'nico'
w.gid = 'team'
w.env = { 'RAILS_ENV' = "production" }
w.behavior(:clean_pid_file)
end
We’re now going to enhance our config file to add real process monitoring. Monitoring will allow us to check CPU and memory usages by process:
RAILS_ROOT = File.dirname(File.dirname(__FILE__))
God.watch do |w|
pid_file = File.join(RAILS_ROOT, "tmp/pids/unicorn.pid")
w.name = "unicorn"
w.interval = 60.seconds
w.start = "unicorn -c #{RAILS_ROOT}/config/unicorn.rb -D"
w.stop = "kill -s QUIT $(cat #{pid_file})"
w.restart = "kill -s HUP $(cat #{pid_file})"
w.start_grace = 20.seconds
w.restart_grace = 20.seconds
w.pid_file = pid_file
w.behavior(:clean_pid_file)
# When to start?
w.start_if do |start|
start.condition(:process_running) do |c|
# We want to check if deamon is running every ten seconds
# and start it if itsn't running
c.interval = 10.seconds
c.running = false
end
end
# When to restart a running deamon?
w.restart_if do |restart|
restart.condition(:memory_usage) do |c|
# Pick five memory usage at different times
# if three of them are above memory limit (100Mb)
# then we restart the deamon
c.above = 100.megabytes
c.times = [3, 5]
end
restart.condition(:cpu_usage) do |c|
# Restart deamon if cpu usage goes
# above 90% at least five times
c.above = 90.percent
c.times = 5
end
end
w.lifecycle do |on|
# Handle edge cases where deamon
# can't start for some reason
on.condition(:flapping) do |c|
c.to_state = [:start, :restart] # If God tries to start or restart
c.times = 5 # five times
c.within = 5.minute # within five minutes
c.transition = :unmonitored # we want to stop monitoring
c.retry_in = 10.minutes # for 10 minutes and monitor again
c.retry_times = 5 # we'll loop over this five times
c.retry_within = 2.hours # and give up if flapping occured five times in two hours
end
end
end
You can repeat the God watch
block as much as you need to handle other deamons your application makes us of.
Now that your config file is ready, you can check current God status
$ god status
which will tell you that unicorn is down. So we’re going to start it:
$ god -c config/unicorn.god
Same but not deamonized:
$ god -c config/unicorn.god -D
God status should now tell you that unicorn is up and running.
$ god log unicorn
will show you what God did with the deamon and will also show you monitoring results such as last memory and CPU usages in real-time.
If you like to play you can now try to kill unicorn process from another shell and look at what happen in God logs:
$ kill $(cat tmp/pid/unicorn.pid)
You should see that God detected that unicorn isn’t running anymore, deleted pid file if it existed and started unicorn deamon again.
Now if you need to stop all God monitorings:
$ god terminate
or a given one:
$ god stop unicorn
Great we’re happy with our monitoring system, but how do I start this thing when server starts or reboots? You have to write an init script! But relax, I have one for you:
#!/bin/bash
#
# God
#
# chkconfig: - 85 15
# description: start, stop, restart, status for God
#
RETVAL=0
case "$1" in
start)
god -P /var/run/god.pid -l /var/log/god.log
god load /etc/god.conf
RETVAL=$?
;;
stop)
kill `cat /var/run/god.pid`
RETVAL=$?
;;
restart)
kill `cat /var/run/god.pid`
god -P /var/run/god.pid -l /var/log/god.log
god load /etc/god.conf
RETVAL=$?
;;
status)
/usr/bin/god status
RETVAL=$?
;;
*)
echo "Usage: god {start|stop|restart|status}"
exit 1
;;
esac
exit $RETVAL
As you can see, the above script makes use of a file named /etc/god.conf. This file has only one simple purpose, load a bunch of God config files at once:
God.load "/etc/god/*.god"
This trick allows you to create a symlink of your app God config files into /etc/god/ directory to ensure it will be loaded on server boot. This is very similar to the technique used for Mongrel.
Now you can do:
$ /etc/init.d/god start
$ /etc/init.d/god status
$ /etc/init.d/god stop
Let’s say you want to be notified everytime a process exits, you can add this to your God configuration file:
w.transition(:up, :start) do |on|
on.condition(:process_exits) do |c|
c.notify = 'devteam'
end
end
Now god knows that everytime our process exits when starting it should send a notification to “devteam”. You can use notify
in any condition
block.
But what is “devteam” and how the hell are notification sent?!
First solution is to send email.
We’ll first start by defining some default for email in our God config file:
God::Contacts::Email.defaults do |d|
d.from_email = 'god@synbioz.com'
d.from_name = 'God'
d.delivery_method = :sendmail
end
Then we need to define a contact:
God.contact(:email) do |c|
c.name = 'Dev Team'
c.group = 'devteam'
c.to_email = 'team@synbioz.com'
end
You can define as much contacts as you need but be sure “name” attribute is unique! Now our dev team will receive email notification when there’s such a problem.
You don’t like emails and want XMPP notifications? No problem:
God::Contacts::Jabber.defaults do |d|
d.host = "jabber.synbioz.com"
d.from_jid = "foo@synbioz.com"
d.password = "bar"
end
God.contact(:jabber) do |c|
d.to_jid = "baz@synbioz.com"
end
You can also use Campfire, Prowl, Scout, Twitter and WebHook to send notifications. They are part of God core.
You can easily extends notifications if you need to use your own system, maybe an internal tracking system.
I hope this quick introduction to God will be helpful for those of you who want to monitor their applications. Don’t think God is only for Rails apps or even Ruby apps. You can use God for anything you want to monitor, Rails projects or not!
Synbioz Team.