Here you will find ideas and code straight from the Software Development Team at SportsEngine. Our focus is on building great software products for the world of youth and amateur sports. We are fortunate to be able to combine our love of sports with our passion for writing code.
The SportsEngine application originated in 2006 as a single Ruby on Rails 1.2 application. Today the SportsEngine Platform is composed of more than 20 applications built on Rails and Node.js, forming a service oriented architecture that is poised to scale for the future.
At SportsEngine, we understand that organizational culture heavily impacts the way we engineer our systems. We believe that improving the culture of our sociotechnical system must be a part of our day to day work. "Culture is a Team Sport" is our way of sharing how we are redesigning the system at SportsEngine to generate culture that positively impacts our work and lives.
This is part 2 of 3 of our sub-series on mentoring at SportsEngine.
When I joined the platform operations team at SportsEngine as a full-time platform operations engineer, I was pretty overwhelmed by the complexity of cloud system administration. My Computer Science background helped me understand the overall platform architecture conceptually, but understanding the ins and outs of infrastructure built on top of AWS services is quite different from understanding different implementations of sorting algorithms. Of course, there are many resources online, including video series and blog posts (Alice Goldfuss’ blog post is a really good one), and I can also pair up with various senior engineers occasionally. However, I knew that I would need some guidance beyond 15-minute pairings or reading a book about Site Reliability Engineering during my free time. Fortunately, after chatting with my manager, Andy Fleener, I had an opportunity to be a part of the SportsEngine mentorship program.
The SportsEngine mentorship program pairs a senior engineer with a junior or intermediate level engineer so that they meet up regularly and work on projects that are not necessarily related to their work (check out part 1 of this sub-series to learn more!). At the time I started, the platform operations team was the only team that had participated in the program at SportsEngine. There were two mentor-mentee pairs before me, and I became the third as a mentee of Zach Newman, one of our senior platform operations engineers who started the mentorship program with Andy.
After briefly brainstorming with Andy and Zach about what would be the best format to get the most out of our pairing sessions, we decided to meet once a week for two hours and to work on various projects exploring new tools and patterns. It’s important to note that we didn’t pick a project just for the sake of learning. We wanted to use our pairing time to work on automating daily chores and reducing toil. In other words, we picked projects that had a real impact on our platform and fixed actual pain points.
Our first project was improving and exploring our Chef cookbook management process. Back then, we had one monolith github repository containing 20+ different Chef cookbooks for each of our services. This caused a lot of headaches when we would make any change because of the interdependencies between them. Merge conflicts were very common and it slowed down our ability to quickly update a cookbook attribute or recipe. We extracted one cookbook as a standalone repository from the monolith repository. Then, we made a deploy happen automatically upon merging a pull request by running a bash script during the Travis CI run. This later became a part of a bigger project to create fully automated deployment processes. Although we started this project due to our own frustrations, it led us to pilot the future of how we do deploys at SportsEngine.
The second sizable project was utilizing AWS Lambda to provide more context when an incident alert is triggered. The definition of ‘context’ is subjective in nature, but in this case, we defined it as the surrounding information like server metrics (CPU usage, process memory usage, etc.) and application health. When we get paged by PagerDuty, we almost always check at least three monitoring tools:
New Relic for application performance management (APM)
Datadog for infrastructure monitoring
load balancer status for server health
Imagine a system down alert wakes you up in the middle of the night. You’re still half awake and navigating the monitoring service UIs can be pretty tedious and slow. These few minutes can greatly impact the customer experience even during midnight in our time because customers living in different time zones use SportsEngine products. So, we decided to generate links to those monitoring service dashboards and attach them to the triggered alert. We chose AWS Lambda because its on-demand execution model fit our purpose well. We created the AWS infrastructure using Terraform and wrote a simple Ruby script to receive a hook from PagerDuty and post links back to the alert. Because both Zach and I were new to Lambda, there was a little bit of a learning curve in the beginning. But now, our familiarity of AWS Lambda helps us better understand our serverless applications and make good decisions when designing the serverless deploy pipeline.
We recently started to work on our third project to simplify existing complex redirect logic. SportsEngine Inc has many different products across different types of sports including soccer, volleyball, hockey, swimming, wrestling, and more. For the last few years, we have grown significantly by acquisitions, new product launches, and product rebranding. This led different web servers to have many unrelated redirects added for marketing and other purposes. By completion of this project, we hope to have one unified redirect service running fully on containers in AWS EKS. Also, this will be a great opportunity for us to become familiarized with containers as SportsEngine is moving toward containers from traditional VMs.
Since we started the pairing, our journey has been real smooth. The only major change was transitioning from in person pairing to fully remote pairing as Zach went fully remote. At first, I was a little worried that remote pairing may be less effective, but surprisingly, that was not the case. The productivity stayed the same thanks to amazing collaboration tools such as Slack call and VS Code live share. These tools allow us to look at each other’s screen and edit the same file at the same time. The meeting venue changed from a lounge table to our own desks, but this actually can be better because you can utilize your extra monitors on your desk for multi-tasking. Overall, it was a good opportunity for us to learn and improve the remote experience.
On the top of invaluable technical lessons about AWS, Chef, and everything that I learned from pairings, the real lesson for me was more abstract. I learned how to approach and plan out possible solutions for any technical problem. You can find an answer for almost any problem by a quick Google search, but understanding what it does and applying it to your specific use case requires more than mere copying and pasting. You need to know how all the pieces come together. My mentorship journey with Zach was the key to give me this foundation. I’m hoping to continue the mentorship and solve more fun and challenging problems with Zach.
Thank you to the platform operations team for their edits and feedback!