skip navigation

Here you will find ideas and code straight from the Software Development Team at SportsEngine. Our focus is on building great software products for the world of youth and amateur sports. We are fortunate to be able to combine our love of sports with our passion for writing code.

The SportsEngine application originated in 2006 as a single Ruby on Rails 1.2 application. Today the SportsEngine Platform is composed of more than 20 applications built on Rails and Node.js, forming a service oriented architecture that is poised to scale for the future.

About Us
Home

Swap it Like it's Hot

11/12/2013, 2:30pm CST
By Andy Fleener

Perform fast zero downtime deploys with almost any app server you want using the hot-swap deployment algorithm.

At Sport Ngin we take downtime seriously. Ensuring that the platform is up and running at all times is imperative. Even seconds of downtime deploying an application is unacceptable. We've used a technique on our ruby apps for quite some time called a rolling deploy. The rolling deploy algorithm is great and we still use it for several of our applications. But as we've moved to a more Service Oriented Architecture rolling deploys left us wanting more.

Like a Rolling Stone

So what's wrong with rolling deploys? Rolling deploys are slow, a typical rolling deploy will restart a single server at a time, which is fine for a few servers but doesn't scale. Fast deploys are a critical piece of a development methodology that we believe in at Sport Ngin, Continous Delivery. So we have developed a new deployment technique we've called the hot-swap that will allow us to restart a single application server with 100% uptime during that restart.

Another pain point we have with rolling deploys is that you need more than one server. We always need more than one server in production from a redundancy standpoint. But with our staging environments redundancy isn't that important, if a server goes down we can replace it fairly quick. A small amount of downtime on occasion for a staging environment is fine.

We've been running two servers for our staging environments simply because we want zero downtime deploys. Our developers are deploying constantly to our staging servers. A single period of downtime on rare occasions is fine, but frequent moments of downtime are unacceptable. Our developers rely heavily on these staging servers for final QA on bug fixes and new features and such. The application must work consistently or our QA processes will suffer.

Swap Till You Drop

The hot swap was developed with these pain points in mind. We needed an algorithm that was agnostic to application servers so that we could use it for both ruby and node applications. When we started using rolling deploys we were only using one application server(Phusion Passenger) but now we use 4 different app servers between our front end node.js services and our backend ruby services. Two of these new app servers, node.js and the ruby app server puma run in a similar way.

These application servers run on a given port and then we use Nginx for SSL termination and as a proxy pass to the app server behind our Haproxy load balancer. The great thing about this approach is we can load balance with even a single app server by running an application on more than one port. We set two upstreams in Nginx on different ports.

http {
    ...
    upstream your_app_upstream {
      server 127.0.0.1:4040;
      server 127.0.0.1:4041;
      keepalive 64;
    }

    server {

      listen 81;
        ...
        location / {
          ...
          proxy_pass http://your_app_upstream;
        }
    }
}

By commenting out a single upstream with one of our favorite Unix tools, sed, and reloading Nginx we have completely stopped it from serving traffic.

sudo sed -i -r 's/(server 127.0.0.1:4040)/#&/g' /etc/nginx/nginx.conf
sudo /etc/init.d/nginx reload

Now we can safely restart the app on that port and wait for it to spin up again.

# Find out if the app is running
function app_up {
  port=$1

  # Give the app a minute to boot
  for i in {1..60}
  do
    # Use the app health check url
    curl -fsI 127.0.0.1:$port/okcomputer
    status=$?
    if test "$status" != "0"
    then
      sleep 1
    else
      break
    fi
  done

  return $status
}

/your/app/control restart 4040
app_up 4040

Once the app is up and running again we simply uncomment the upstream reload nginx and start over with the app on the other port.

sudo sed -i -r 's/#(server 127.0.0.1:4040)/server 127.0.0.1:4040/g' /etc/nginx/nginx.conf
sudo sed -i -r 's/(server 127.0.0.1:4041)/#&/g' /etc/nginx/nginx.conf
sudo /etc/init.d/nginx reload
/your/app/control restart 4041
app_up 4041
sudo sed -i -r 's/#(server 127.0.0.1:4041)/server 127.0.0.1:4041/g' /etc/nginx/nginx.conf
sudo /etc/init.d/nginx reload

Then both processes have been restarted and the app is ready to go.

The beauty of this algorithm is that it will work with many different application servers. It can be used on any application server that can easily be run on a specific port. We've found that this is an effective way to accomplish a zero down time deploy for a single application server.

Not only is the hot swap an effective way to restart a single app server, but it can be used to restart an entire cluster all at once with zero downtime. This can make a serious impact on the speed of deploys compared to a rolling approach that has to restart each server one at a time.

We've been using the hot-swap algorithm with both node.js and rails applications in production for a few months now and it has turned out great. Our production deploys are faster and we only need to run one app server for our staging environments which has saved us a decent amount of money.

A gist of the full version of the hot swap script can be found here https://gist.github.com/anfleene/6718088

Tag(s): Home  DevOps  High Availability