skip navigation

Here you will find ideas and code straight from the Software Development Team at SportsEngine. Our focus is on building great software products for the world of youth and amateur sports. We are fortunate to be able to combine our love of sports with our passion for writing code.

The SportsEngine application originated in 2006 as a single Ruby on Rails 1.2 application. Today the SportsEngine Platform is composed of more than 20 applications built on Rails and Node.js, forming a service oriented architecture that is poised to scale for the future.

About Us
Home

SendGrid Outage Drives Home the Importance of Redundancy

04/01/2013, 2:45pm CDT
By Luke Ludwig

Starting now the Sport Ngin platform is sending mail through two services: SendGrid and our new provider Mailgun.

On Thursday March 21st the Sport Ngin platform was unable to send email for several hours in the morning. We use SendGrid to send our email and SendGrid was the unfortunate recipient of a distributed denial of service (DDOS) attack with an unusual story.

Without functional email new user accounts can't be activated, passwords can't be reset, messages to teams can't be delivered, and other essential aspects of managing sports organizations are not possible. We maintain a queue within Sport Ngin of all outgoing mail. Our queue continued processing once SendGrid was able to send email again, such that no emails were entirely lost. Regardless, delaying email for several hours is not acceptable. Email is a critical aspect of user interaction on most websites including the Sport Ngin platform.

We take special care to build resiliency into the Sport Ngin platform. "Zeroing in on Zero Downtime" details how we do this, most notably through redundancy. The impact of a failing application or database server usually goes unnoticed. We can lose an entire data center's worth of servers with little to no impact to our customers. What decisions can we make to ensure that our ability to send email is as resilient as other aspects of our architecture? What transactional email sending service should we use? SendGrid has many competitors including Amazon's SES, Mailjet, Mailgun, Postmark, and more. This chart on Social Compare has a great comparison of these services.

Transactional email sending services come in two forms. Some such as SendGrid provide the customer with a dedicated IP address to send mail from, whereas others such as Amazon's SES maintain a shared pool of IP addresses. Either form is challenging to get up and running quickly with sending a large amount of email. Email service providers such as Gmail assign a reputation to IP addresses as a way of protecting the world against spammers.

This means that you can't simply take a new IP address and expect to send lots of mail through it immediately. Instead you must gradually build the reputation of that IP address over time. Email sending services that use a shared pool of IP addresses put their own max limits in place, requiring you to build a reputation with them before they will let you send a large amount of mail.

These issues surrounding IP address reputation make it rather difficult to quickly change email sending services when one happens to be down. To achieve true resiliency with email, redundancy is a requirement. Starting now the Sport Ngin platform is sending mail through two services: SendGrid and our new provider Mailgun. The next time one of these services experiences downtime we will simply switch all outgoing email to the working provider. What other critical services does your system rely on? Are they redundant?

Tag(s): Home  High Availability