Here you will find ideas and code straight from the Software Development Team at SportsEngine. Our focus is on building great software products for the world of youth and amateur sports. We are fortunate to be able to combine our love of sports with our passion for writing code.
The SportsEngine application originated in 2006 as a single Ruby on Rails 1.2 application. Today the SportsEngine Platform is composed of more than 20 applications built on Rails and Node.js, forming a service oriented architecture that is poised to scale for the future.
Getting Ngin up and running on ruby 1.9.3 with all of the tests passing was only one step towards being comfortable with deploying it to our production servers. With our test coverage only totalling about 54% of our code, we wanted a way to avoid a ton of manual QA. Another major concern was what we could expect for performance in 1.9.3. We were able to get some basic metrics showing that our test suite would run 30%-40% faster, but it’s hard to expect those kind of gains in a real production environment.
The best way to see how anything will perform in production is to send production traffic to it, so that’s exactly what we did. Using a gem called em-proxy, which we learned about from an Engine Yard support article, we were able to duplex traffic from our production servers to a staging setup that had very similar hardware. Duplexing allows you to send traffic to multiple endpoints, but only return a response from the one you want.
This turned out to be a much more appealing approach to us than using a load testing service, like Load Impact. Load testing services are great for stress testing an environment by hitting a few defined urls, but not so great for verifying real production traffic. With this technique we could truly see how our environment would behave when we released it to production.
We went about this in such a way that we are now able to run a capistrano task to turn em-proxy on in a rolling fashion (much like our rolling restarts) and begin duplexing traffic to a specified IP within minutes. And the best part is that we can do this with no downtime.
We’ll now go into detail of exactly how we set up the configuration to make this happen. Keep in mind that we’re running on Engine Yard’s Cloud, so our chef recipes and scripts are specific to that setup. However, it should work in most cases with some minor tweaking.
The most important piece is obviously em-proxy itself. First, we need to make sure the em-proxy gem is installed on the server that we want to duplex from. In our case, the root user needed the gem installed.
$ sudo gem install em-proxy
The next step is to set up a ruby script that will set up the duplexing process on a specified port.
These variables should be replaced with their appropriate values:
We also set this file up as a template using Chef as seen in this gist. This was awesomely simple to set up for us. Em-proxy also has the option to start this up from the command line, which looks like this:
$ em-proxy -l 8080 -r 127.0.0.1:81 -d 10.10.10.10:80
We went with the script approach to make it easier to manage via chef.
The gem is installed and the script is ready to run, but we don’t want to run it manually. And what happens if the process just dies for some unknown reason? We set up monitrc and runner (similar to init.d) scripts to make it easy to start and stop the process as well as to make sure it stays up 100% of the time.
Here is the init.d script we set up. You would need to change most of those paths to get it to work in your own environment, but it’s a pretty straightforward init.d-like script.
With that, monit needs a script as well to tell it to monitor this process.
With that file in place and a quick monit reload, monit will keep track of our em-proxy process for us. We can start it with monit start emproxy_ngin and stop it with monit stop emproxy_ngin.
Now that we have em-proxy monitored and ready to duplex traffic, we need to actually start routing traffic through it. As we mentioned in our previous blog post, we have haproxy sitting in front of everything proxying (go figure) traffic to nginx on all of our application servers. Haproxy accepts traffic on ports 80 and 443 (http,https), which sends it to nginx's ports of 81 and 444.
We don't want to have to change our nginx config when setting this up because reloading nginx causes our passenger instances to restart resulting in some extremely slow requests. Instead, we can tell haproxy to send traffic to the port that we have em-proxy running on - 8080 in our case. To do this, we’ve written more scripts to handle all of this for us.
Here's what this does, in plain english:
The proxy_down.sh script is basically just the reverse.
And here are our haproxy_remove and haproxy_add scripts.
These variables should be replaced with the appropriate values:
What we’re doing with these is just using ssh to run a sed command on our master application server to modify the haproxy.cfg to comment out/uncomment a server from the configuration. It’s a very simple way to start or stop receiving traffic to a specific server using haproxy.
Again, we set all of these up as Chef templates: gist.
Using this technique, we were able to send ~90% of our production traffic on to our staging server (which was running on very similar hardware) running ruby 1.9.3. The traffic that we missed out on was our ssl traffic. There's no reason that em-proxy can't duplex ssl traffic as well, but we ran into issues getting it to actually process this traffic. We suspect it had something to do with certificates and encryption, but didn’t spend enough time to figure out the problem.
Now that we had our staging environment receiving production traffic, we were able to find several more rather obscure bugs by watching NewRelic. Most importantly, however, we were able to see what kind of performance we could expect once running in production. This allowed us to effectively tune our memory and garbage collection settings in a way that was optimal for our app running on 1.9.3. We were also able to figure out the optimal number of passenger processes on each server, and effectively drop ~35% of our hardware as a result. We’ll go into greater detail of that whole process in a future blog post.
Using em-proxy to duplex our traffic to a production-like staging environment was integral for us in upgrading Ruby. We let this service run for almost 2 weeks, turning it on and off periodically, before finally deploying to production. Being able to see production traffic on our staging server greatly increased our confidence in deploying. And the best part is that we can very easily turn this back on in the future whenever we feel the need!