skip navigation

Here you will find ideas and code straight from the Software Development Team at SportsEngine. Our focus is on building great software products for the world of youth and amateur sports. We are fortunate to be able to combine our love of sports with our passion for writing code.

The SportsEngine application originated in 2006 as a single Ruby on Rails 1.2 application. Today the SportsEngine Platform is composed of more than 20 applications built on Rails and Node.js, forming a service oriented architecture that is poised to scale for the future.

About Us
Home

OK Computer: Application Health Monitoring with No Surprises

06/28/2013, 9:15am CDT
By Patrick Byrne

We built OK Computer for configurable application health checks. We've been using it in production for a while now and think you'll find it useful.

TL;DR

We built OK Computer for configurable application health checks, useful for load balancers or external monitoring (like Pingdom or New Relic). We've been using it in production for a while now and think you'll find it useful.

Why Did We Make It?

In the past, we've used fitter_happier for this purpose. Then, we added MongoDB to one of our applications. To check the connection to that database, we had to monkey-patch fitter_happier, which didn't sit well with us. We also wanted to check that our Resque queues weren't backed up, which necessitated further monkey-patching.

Obviously, we needed something more customizable to the needs of any given app. The idea for OK Computer was born from these frustrations.

Our needs were fairly straightforward and, we believe, universal:

  • Lightning-fast application up-check
  • Drop-in for common case (ActiveRecord database connection), but easily replacable for whichever data connections you use
  • A small set of common checks useful to many applications, like Resque or Delayed Job
  • Easily create checks unique to your application
  • Machine-parsable output, like HTTP status codes and JSON

Tell Me More

A complete overview is listed in the README, but I want to take a moment to explain how this can make your life a little easier.

The common case can't be any simpler. Include gem 'okcomputer' in your Rails app's Gemfile, and you automatically get a few useful endpoints:

  • /okcomputer - Hit this to prove that the app is running. Nothing more. We use this in our HAProxy configuration to automatically remove app servers from the load balancer if something goes wrong.
  • /okcomputer/all - Run all configured checks (by default, just a database check and the above-mentioned "can the app render anything?" check).

If you need anything else, add config/initializers/okcomputer.rb and register one of our built-in checks or easily create one of your own. It will show up in /okcomputer/all, or can be accessed directly by the name you register it with.

For example, let's say that you wanted to check that your Resque queue named "critical" doesn't get above 10 jobs, you'd add the following to get the /okcomputer/resque_critical endpoint.

OKComputer::Registry.register "resque_critical", OKComputer::ResqueBackedUpCheck.new("critical", 10)

How Does Sport Ngin Use OK Computer?

  • As mentioned above, we use the default up-check in HAProxy to remove any unhealthy servers from the load balancer.
  • We monitor the uptime of all of our applications and services with Pingdom, which feeds into PagerDuty to alert us of problems.
  • As an extra layer of assurance that our apps are running, New Relic also uses the default up-check to monitor that the app is running and available.
  • During testing of deploy scripts or when performing maintenance in which we want to confirm that there is no downtime, we often use a tool like siege (or curl in a while loop) to make constant requests to the up-check and alert us immediately if something goes wrong.

We think OK Computer can be useful to you as well. Check it out.

Tag(s): Home  Ruby  DevOps  High Availability